Top-level package for stLearn.
API¶
Import stLearn as:
import stlearn as st
Wrapper functions: wrapper¶
|
Read Visium data from 10X (wrap read_visium from scanpy) |
|
Read Old Spatial Transcriptomics data |
|
Read Slide-seq data |
|
Read MERFISH data |
|
Read SeqFish data |
|
|
|
Create AnnData object for stLearn |
Add: add¶
|
Adding image data to the Anndata object |
|
Adding spatial information into the Anndata object |
|
Parsing the old spaital transcriptomics data |
|
Add significant Ligand-Receptor pairs into AnnData object |
|
Add label transfer results into AnnData object |
|
Adding annotation for cluster |
|
Adding label transfered from Seurat |
|
Adding binary mask image to the Anndata object |
|
Parsing the old spaital transcriptomics data |
|
Adding label transfered from Seurat |
Preprocessing: pp¶
|
Wrap function scanpy.pp.filter_genes |
|
Wrap function of scanpy.pp.log1p Copyright (c) 2017 F. |
|
Wrap function from scanpy.pp.log1p Normalize counts per cell. If choosing target_sum=1e6, this is CPM normalization. If exclude_highly_expressed=True, very highly expressed genes are excluded from the computation of the normalization factor (size factor) for each cell. This is meaningful as these can strongly influence the resulting normalized values for all other genes [Weinreb17]. Similar functions are used, for example, by Seurat [Satija15], Cell Ranger [Zheng17] or SPRING [Weinreb17]. :param adata: The annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes. :param target_sum: If None, after normalization, each observation (cell) has a total count equal to the median of total counts for observations (cells) before normalization. :param exclude_highly_expressed: Exclude (very) highly expressed genes for the computation of the normalization factor (size factor) for each cell. A gene is considered highly expressed, if it has more than max_fraction of the total counts in at least one cell. The not-excluded genes will sum up to target_sum. :param max_fraction: If exclude_highly_expressed=True, consider cells as highly expressed that have more counts than max_fraction of the original total counts in at least one cell. :param key_added: Name of the field in adata.obs where the normalization factor is stored. :param layers: List of layers to normalize. Set to 'all' to normalize all layers. :param layer_norm: Specifies how to normalize layers: * If None, after normalization, for each layer in layers each cell has a total count equal to the median of the counts_per_cell before normalization of the layer. * If 'after', for each layer in layers each cell has a total count equal to target_sum. * If 'X', for each layer in layers each cell has a total count equal to the median of total counts for observations (cells) of adata.X before normalization. :param inplace: Whether to update adata or return dictionary with normalized copies of adata.X and adata.layers. |
|
Wrap function of scanpy.pp.scale |
|
Compute a neighborhood graph of observations [McInnes18]. The neighbor search efficiency of this heavily relies on UMAP [McInnes18], which also provides a method for estimating connectivities of data points - the connectivity of the manifold (method=='umap'). If method=='gauss', connectivities are computed according to [Coifman05], in the adaption of [Haghverdi16]. :param adata: Annotated data matrix. :param n_neighbors: The size of local neighborhood (in terms of number of neighboring data points) used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved. In general values should be in the range 2 to 100. If knn is True, number of nearest neighbors to be searched. If knn is False, a Gaussian kernel width is set to the distance of the n_neighbors neighbor. :param {n_pcs}: :param {use_rep}: :param knn: If True, use a hard threshold to restrict the number of neighbors to n_neighbors, that is, consider a knn graph. Otherwise, use a Gaussian Kernel to assign low weights to neighbors more distant than the n_neighbors nearest neighbor. :param random_state: A numpy random seed. :param method: Use 'umap' [McInnes18] or 'gauss' (Gauss kernel following [Coifman05] with adaptive width [Haghverdi16]) for computing connectivities. Use 'rapids' for the RAPIDS implementation of UMAP (experimental, GPU only). :param metric: A known metric’s name or a callable that returns a distance. :param metric_kwds: Options for the metric. :param copy: Return a copy instead of writing to adata. |
|
Tiling H&E images to small tiles based on spot spatial location |
|
Extract latent morphological features from H&E images using pre-trained convolutional neural network base |
Embedding: em¶
|
Wrap function scanpy.pp.pca Principal component analysis [Pedregosa11]. Computes PCA coordinates, loadings and variance decomposition. Uses the implementation of scikit-learn [Pedregosa11]. :param data: The (annotated) data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes. :param n_comps: Number of principal components to compute. :param zero_center: If True, compute standard PCA from covariance matrix. If False, omit zero-centering variables (uses |
|
Wrap function scanpy.pp.umap Embed the neighborhood graph using UMAP [McInnes18]. UMAP (Uniform Manifold Approximation and Projection) is a manifold learning technique suitable for visualizing high-dimensional data. Besides tending to be faster than tSNE, it optimizes the embedding such that it best reflects the topology of the data, which we represent throughout Scanpy using a neighborhood graph. tSNE, by contrast, optimizes the distribution of nearest-neighbor distances in the embedding such that these best match the distribution of distances in the high-dimensional space. We use the implementation of umap-learn [McInnes18]. For a few comparisons of UMAP with tSNE, see this preprint. :param adata: Annotated data matrix. :param n_components: The number of dimensions of the embedding. :param random_state: If int, random_state is the seed used by the random number generator; If RandomState, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. |
|
FastICA: a fast algorithm for Independent Component Analysis. |
|
Factor Analysis (FA) A simple linear generative model with Gaussian latent variables. |
|
Diffusion Maps [Coifman05] [Haghverdi15] [Wolf18]. |
Spatial: spatial¶
|
Perform local cluster by using DBSCAN. |
|
Perform pseudotime analysis. |
|
Perform pseudo-time-space analysis with global level. |
|
Perform pseudo-time-space analysis with local level. |
|
Compare transition markers between two clades |
|
Transition markers detection of a clade. |
|
Transition markers detection of a branch. |
|
Automatically set the root index. |
|
SME normalisation: Using spot location information and tissue morphological features to correct spot gene expression |
|
using spatial location (S), tissue morphological feature (M) and gene expression (E) information to impute missing values |
|
using spatial location (S), tissue morphological feature (M) and gene expression (E) information to impute gap between spots and increase resolution for gene detection |
|
using spatial location (S), tissue morphological feature (M) and gene expression (E) information to normalize data. |
Tools: tl¶
|
Perform kmeans cluster for spatial transcriptomics data |
|
Wrap function scanpy.tl.louvain Cluster cells into subgroups [Blondel08] [Levine15] [Traag17]. Cluster cells using the Louvain algorithm [Blondel08] in the implementation of [Traag17]. The Louvain algorithm has been proposed for single-cell analysis by [Levine15]. This requires having ran |
|
Loads inputted LR database, & concatenates into consistent database set of pairs without duplicates. |
|
Creates a new anndata representing a gridded version of the data; can be |
|
Performs stLearn LR analysis. |
|
Performs p-value adjustment and determination of significant spots. |
|
Runs a basic GO analysis on the genes in the top ranked LR pairs. |
|
Calls significant celltype-celltype interactions based on cell-type data randomisation. |
Plot: pl¶
|
QC plot for sptial transcriptomics data. |
|
Allows the visualization of a single gene or multiple genes as the values of dot points or contour in the Spatial transcriptomics array. |
|
|
|
Allows the visualization of a cluster results as the discretes values of dot points in the Spatial transcriptomics array. |
|
|
|
Allows the visualization of a subclustering results as the discretes values of dot points in the Spatial transcriptomics array. |
|
Allows the visualization of a subclustering results as the discretes values of dot points in the Spatial transcriptomics array. |
|
A wrap function to plot all the non-spatial plot from scanpy. |
|
Clustering plot for sptial transcriptomics data. |
|
mask plot for sptial transcriptomics data. |
|
Plotting the top LRs ranked by number of significant spots. |
|
Diagnostic plot looking at relationship between technical features of lrs and lr rank. |
|
Bar plot showing for each LR no. |
|
Plots the results from the LR GO analysis. |
|
Plots the per spot statistics for given LR. |
|
Creates different kinds of spatial visualisations for the LR analysis results. |
|
Checks relationship between no. |
|
Circular celltype-celltype interaction network based on LR-CCI analysis. |
|
Chord diagram of interactions between cell types. |
|
Heatmap of interaction counts. |
|
Heatmap visualising sender->receivers of cell type interactions. |
|
Plots the LR scores for significant spots interatively using Bokeh. |
Plots the significant CCI in the spatial context interactively using Bokeh. |
|
Global trajectory inference plot (Only DPT). |
|
Local spatial trajectory inference plot. |
|
Hierarchical tree plot represent for the global spatial trajectory inference. |
|
Plot transition marker. |
|
Differential expression between transition markers. |