giftwrap
GIFTwrap: A Python package for analyzing GIFT-seq data.
The package provides both a CLI for transforming FASTQ files to counts matrices, as well as a Python API for analysis.
Modules:
-
analysis–This module provides a collection of Python APIs for analyzing processed GIFT-seq data.
-
pipeline– -
pl–This module provides plotting functions for visualizing gapfill and genotype data in AnnData objects.
-
pp–This module provides functions to handle basic preprocessing tasks of GIFT-seq data including
-
sp–This module contains functions for spatial analysis of Visium GIFT-seq data.
-
step1_count_gapfills– -
step3_correct_gapfill– -
step4_collect_counts– -
tl–This module provides various tools for analyzing and manipulating a GIFT-seq dataset. Most true analysis tools live
-
utils–
Classes:
-
TechnologyFormatInfo–Generic class to hold metadata related to parsing Read1 and Read2.
Functions:
-
filter_h5_file_by_barcodes–Given a counts h5 file and a list of barcodes, filter the barcodes to only include the ones in the list.
-
read_h5_file–Read a generated h5 file and return an AnnData object.
-
sequence_saturation_curve–Compute the sequencing saturation curve.
-
sequencing_saturation–Sequencing saturation is 1 - (n_deduped_reads / n_reads)
TechnologyFormatInfo
TechnologyFormatInfo(barcode_dir: Optional[str | Path] = None, read1_length: Optional[int] = None, read2_length: Optional[int] = None)
Bases: ABC
Generic class to hold metadata related to parsing Read1 and Read2.
Methods:
-
barcode2coordinates–Returns the X and Y coordinates of a barcode.
-
correct_barcode–Given a probable barcode string, attempt to correct the sequence.
-
make_barcode_string–Format a cell barcode into a string.
-
probe_barcode_index–Convert a probe barcode to an index.
Attributes:
-
barcode_coordinates(dict[str, tuple[int, int]]) –The x and y coordinates of the barcode in the read.
-
barcode_tree(PrefixTrie) –Return a prefix tree (trie) of the cell barcodes for fast mismatch searches.
-
cell_barcode_start(int) –The start position of the cell barcode in the read.
-
cell_barcodes(list[str]) –The list of potential barcodes.
-
constant_sequence(str) –The constant sequence that is expected in the read.
-
constant_sequence_start(int) –The start position of the constant sequence in the read. Note that this should be relative to the end of the read
-
has_constant_sequence(bool) –Whether the read has a constant sequence.
-
has_probe_barcode(bool) –Whether the read has a probe barcode.
-
is_spatial(bool) –Whether the technology is spatial. If true, then barcode_coordinates() must be defined.
-
max_cell_barcode_length(int) –Returns the maximum length of a cell barcode.
-
probe_barcode_R1(bool) –If true, the probe sample barcode is on R1 instead of R2.
-
probe_barcode_length(int) –The length of the probe barcode.
-
probe_barcode_start(int) –The start position of the probe barcode in the read. Note that this should be relative to the end of the constant
-
probe_barcodes(dict[str, str]) –The list of potential probe barcodes.
-
read1_length(Optional[int]) –This is the expected length of each R1 read, if defined the pipeline can improve performance.
-
read2_length(Optional[int]) –This is the expected length of each R2 read, if defined the pipeline can improve performance.
-
umi_length(int) –The length of the UMI sequence on R1.
-
umi_start(int) –The start position of the UMI sequence in R1.
Source code in src/giftwrap/utils.py
barcode_coordinates
abstractmethod
property
The x and y coordinates of the barcode in the read.
barcode_tree
cached
property
Return a prefix tree (trie) of the cell barcodes for fast mismatch searches.
Returns:
-
PrefixTrie–The tree.
cell_barcode_start
abstractmethod
property
The start position of the cell barcode in the read.
constant_sequence
abstractmethod
property
The constant sequence that is expected in the read.
constant_sequence_start
abstractmethod
property
The start position of the constant sequence in the read. Note that this should be relative to the end of the read insert. For example, in 10X flex, 0 would be the first base after the LHS + gapfill + RHS.
has_constant_sequence
abstractmethod
property
Whether the read has a constant sequence.
has_probe_barcode
abstractmethod
property
Whether the read has a probe barcode.
is_spatial
abstractmethod
property
Whether the technology is spatial. If true, then barcode_coordinates() must be defined.
max_cell_barcode_length
cached
property
Returns the maximum length of a cell barcode.
probe_barcode_R1
property
If true, the probe sample barcode is on R1 instead of R2.
probe_barcode_length
abstractmethod
property
The length of the probe barcode.
probe_barcode_start
abstractmethod
property
The start position of the probe barcode in the read. Note that this should be relative to the end of the constant sequence insert. For example, in 10X flex, 2 would be the first base after the constant sequence+NN.
probe_barcodes
abstractmethod
property
The list of potential probe barcodes.
read1_length
property
read2_length
property
barcode2coordinates
cached
correct_barcode
cached
correct_barcode(read: str, max_mismatches: int, start_idx: int, end_idx: int) -> tuple[Optional[str], int]
Given a probable barcode string, attempt to correct the sequence.
Parameters:
-
(readstr) –The barcode-containing sequence.
-
(max_mismatchesint) –The maximum number of mismatches to allow.
-
(start_idxint) –The start index of the barcode in the read.
-
(end_idxint) –The end index of the barcode in the read.
Returns:
-
tuple[Optional[str], int]–The corrected barcode, or None if no match was found and the number of corrections required.
Source code in src/giftwrap/utils.py
make_barcode_string
make_barcode_string(cell_barcode: str, plex: str = '1', x_coord: Optional[int] = None, y_coord: Optional[int] = None, is_multiplexed: bool = False) -> str
Format a cell barcode into a string.
Parameters:
-
(cell_barcodestr) –The barcode.
-
(plexstr, default:'1') –The bc index for representing demultiplexed cells.
-
(x_coordOptional[int], default:None) –The x coordinate.
-
(y_coordOptional[int], default:None) –The y coordinate.
-
(is_multiplexedbool, default:False) –Whether the data is multiplexed.
Source code in src/giftwrap/utils.py
probe_barcode_index
abstractmethod
filter_h5_file_by_barcodes
filter_h5_file_by_barcodes(input_file: Path, output_file: Path, barcodes_list: ArrayLike, pad_matrix: bool = True)
Given a counts h5 file and a list of barcodes, filter the barcodes to only include the ones in the list.
Parameters:
-
(input_filePath) –The input h5 file.
-
(output_filePath) –The output h5 file.
-
(barcodes_listArrayLike) –The barcodes list.
-
(pad_matrixbool, default:True) –Whether to pad the matrix with zeros if there are barcodes provided that don't exist in the file.
Source code in src/giftwrap/utils.py
1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 | |
read_h5_file
Read a generated h5 file and return an AnnData object.
Parameters:
Returns:
-
AnnData–The AnnData object.
Source code in src/giftwrap/utils.py
1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 | |
sequence_saturation_curve
Compute the sequencing saturation curve.
Parameters:
-
–full_countsThe cell x feature matrix where each count = # of reads..
-
(n_pointsint, default:1000) –The number of points to compute the curve at. Note that this is computed on a log scale.
Returns:
-
array–The saturation curve.
Source code in src/giftwrap/utils.py
sequencing_saturation
Sequencing saturation is 1 - (n_deduped_reads / n_reads) where n_deduped_reads is the number of valid cell bc/valid umi/gene combinations and n_reads is the total number of reads with a valid mapping to a valid cell barcode and umi.
Parameters:
-
(countsarray) –Counts should be the number of reads rather than UMIs.
Returns:
-
float–The saturation.
Source code in src/giftwrap/utils.py
GIFTwrap API Documentation
This section provides comprehensive documentation of the giftwrap Python API, including functions and classes available for GIFT-seq data analysis. The API is designed to be user-friendly and integrates seamlessly with scverse-based workflows.
See the GIFTwrap analysis tutorial for a practical guide on using the API to process GIFT-seq data.