Top-level package for stLearn.

API

Import stLearn as:

import stlearn as st

Wrapper functions: wrapper

Read10X(path[, genome, count_file, ...])

Read Visium data from 10X (wrap read_visium from scanpy)

ReadOldST([count_matrix_file, spatial_file, ...])

Read Old Spatial Transcriptomics data

ReadSlideSeq(count_matrix_file, spatial_file)

Read Slide-seq data

ReadMERFISH(count_matrix_file, spatial_file)

Read MERFISH data

ReadSeqFish(count_matrix_file, spatial_file)

Read SeqFish data

convert_scanpy(adata[, use_quality])

create_stlearn(count, spatial, library_id[, ...])

Create AnnData object for stLearn

Add: add

add.image(adata, imgpath, library_id[, ...])

Adding image data to the Anndata object

add.positions(adata[, position_filepath, ...])

Adding spatial information into the Anndata object

add.parsing(adata, coordinates_file[, copy])

Parsing the old spaital transcriptomics data

add.lr(adata[, db_filepath, sep, source, copy])

Add significant Ligand-Receptor pairs into AnnData object

add.labels(adata[, label_filepath, ...])

Add label transfer results into AnnData object

add.annotation(adata, label_list[, ...])

Adding annotation for cluster

add.add_loupe_clusters(adata, loupe_path[, ...])

Adding label transfered from Seurat

add.add_mask(adata, imgpath[, key, copy])

Adding binary mask image to the Anndata object

add.apply_mask(adata[, masks, select, cmap, ...])

Parsing the old spaital transcriptomics data

add.add_deconvolution(adata, annotation_path)

Adding label transfered from Seurat

Preprocessing: pp

pp.filter_genes(adata[, min_counts, ...])

Wrap function scanpy.pp.filter_genes

pp.log1p(adata[, copy, chunked, chunk_size, ...])

Wrap function of scanpy.pp.log1p Copyright (c) 2017 F.

pp.normalize_total(adata[, target_sum, ...])

Wrap function from scanpy.pp.log1p Normalize counts per cell. If choosing target_sum=1e6, this is CPM normalization. If exclude_highly_expressed=True, very highly expressed genes are excluded from the computation of the normalization factor (size factor) for each cell. This is meaningful as these can strongly influence the resulting normalized values for all other genes [Weinreb17]. Similar functions are used, for example, by Seurat [Satija15], Cell Ranger [Zheng17] or SPRING [Weinreb17]. :param adata: The annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes. :param target_sum: If None, after normalization, each observation (cell) has a total count equal to the median of total counts for observations (cells) before normalization. :param exclude_highly_expressed: Exclude (very) highly expressed genes for the computation of the normalization factor (size factor) for each cell. A gene is considered highly expressed, if it has more than max_fraction of the total counts in at least one cell. The not-excluded genes will sum up to target_sum. :param max_fraction: If exclude_highly_expressed=True, consider cells as highly expressed that have more counts than max_fraction of the original total counts in at least one cell. :param key_added: Name of the field in adata.obs where the normalization factor is stored. :param layers: List of layers to normalize. Set to 'all' to normalize all layers. :param layer_norm: Specifies how to normalize layers: * If None, after normalization, for each layer in layers each cell has a total count equal to the median of the counts_per_cell before normalization of the layer. * If 'after', for each layer in layers each cell has a total count equal to target_sum. * If 'X', for each layer in layers each cell has a total count equal to the median of total counts for observations (cells) of adata.X before normalization. :param inplace: Whether to update adata or return dictionary with normalized copies of adata.X and adata.layers.

pp.scale(adata[, zero_center, max_value, copy])

Wrap function of scanpy.pp.scale

pp.neighbors(adata[, n_neighbors, n_pcs, ...])

Compute a neighborhood graph of observations [McInnes18]. The neighbor search efficiency of this heavily relies on UMAP [McInnes18], which also provides a method for estimating connectivities of data points - the connectivity of the manifold (method=='umap'). If method=='gauss', connectivities are computed according to [Coifman05], in the adaption of [Haghverdi16]. :param adata: Annotated data matrix. :param n_neighbors: The size of local neighborhood (in terms of number of neighboring data points) used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved. In general values should be in the range 2 to 100. If knn is True, number of nearest neighbors to be searched. If knn is False, a Gaussian kernel width is set to the distance of the n_neighbors neighbor. :param {n_pcs}: :param {use_rep}: :param knn: If True, use a hard threshold to restrict the number of neighbors to n_neighbors, that is, consider a knn graph. Otherwise, use a Gaussian Kernel to assign low weights to neighbors more distant than the n_neighbors nearest neighbor. :param random_state: A numpy random seed. :param method: Use 'umap' [McInnes18] or 'gauss' (Gauss kernel following [Coifman05] with adaptive width [Haghverdi16]) for computing connectivities. Use 'rapids' for the RAPIDS implementation of UMAP (experimental, GPU only). :param metric: A known metric’s name or a callable that returns a distance. :param metric_kwds: Options for the metric. :param copy: Return a copy instead of writing to adata.

pp.tiling(adata[, out_path, library_id, ...])

Tiling H&E images to small tiles based on spot spatial location

pp.extract_feature(adata[, cnn_base, ...])

Extract latent morphological features from H&E images using pre-trained convolutional neural network base

Embedding: em

em.run_pca(data[, n_comps, zero_center, ...])

Wrap function scanpy.pp.pca Principal component analysis [Pedregosa11]. Computes PCA coordinates, loadings and variance decomposition. Uses the implementation of scikit-learn [Pedregosa11]. :param data: The (annotated) data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes. :param n_comps: Number of principal components to compute. :param zero_center: If True, compute standard PCA from covariance matrix. If False, omit zero-centering variables (uses TruncatedSVD), which allows to handle sparse input efficiently. Passing None decides automatically based on sparseness of the data. :param svd_solver: SVD solver to use: 'arpack' for the ARPACK wrapper in SciPy (svds()) 'randomized' for the randomized algorithm due to Halko (2009). 'auto' (the default) chooses automatically depending on the size of the problem. :param random_state: Change to use different initial states for the optimization. :param return_info: Only relevant when not passing an AnnData: see “Returns”. :param use_highly_variable: Whether to use highly variable genes only, stored in .var['highly_variable']. By default uses them if they have been determined beforehand. :param dtype: Numpy data type string to which to convert the result. :param copy: If an AnnData is passed, determines whether a copy is returned. Is ignored otherwise. :param chunked: If True, perform an incremental PCA on segments of chunk_size. The incremental PCA automatically zero centers and ignores settings of random_seed and svd_solver. If False, perform a full PCA. :param chunk_size: Number of observations to include in each chunk. Required if chunked=True was passed.

em.run_umap(adata[, min_dist, spread, ...])

Wrap function scanpy.pp.umap Embed the neighborhood graph using UMAP [McInnes18]. UMAP (Uniform Manifold Approximation and Projection) is a manifold learning technique suitable for visualizing high-dimensional data. Besides tending to be faster than tSNE, it optimizes the embedding such that it best reflects the topology of the data, which we represent throughout Scanpy using a neighborhood graph. tSNE, by contrast, optimizes the distribution of nearest-neighbor distances in the embedding such that these best match the distribution of distances in the high-dimensional space. We use the implementation of umap-learn [McInnes18]. For a few comparisons of UMAP with tSNE, see this preprint. :param adata: Annotated data matrix. :param n_components: The number of dimensions of the embedding. :param random_state: If int, random_state is the seed used by the random number generator; If RandomState, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

em.run_ica(adata[, n_factors, fun, tol, ...])

FastICA: a fast algorithm for Independent Component Analysis.

em.run_fa(adata[, n_factors, tol, max_iter, ...])

Factor Analysis (FA) A simple linear generative model with Gaussian latent variables.

em.run_diffmap(adata[, n_comps, copy])

Diffusion Maps [Coifman05] [Haghverdi15] [Wolf18].

Spatial: spatial

spatial.clustering.localization(adata[, ...])

Perform local cluster by using DBSCAN.

spatial.trajectory.pseudotime(adata[, ...])

Perform pseudotime analysis.

spatial.trajectory.pseudotimespace_global(adata)

Perform pseudo-time-space analysis with global level.

spatial.trajectory.pseudotimespace_local(adata)

Perform pseudo-time-space analysis with local level.

spatial.trajectory.compare_transitions(...)

Compare transition markers between two clades

spatial.trajectory.detect_transition_markers_clades(...)

Transition markers detection of a clade.

spatial.trajectory.detect_transition_markers_branches(...)

Transition markers detection of a branch.

spatial.trajectory.set_root(adata, ...[, ...])

Automatically set the root index.

spatial.morphology.adjust(adata[, use_data, ...])

SME normalisation: Using spot location information and tissue morphological features to correct spot gene expression

spatial.SME.SME_impute0(adata[, use_data, ...])

using spatial location (S), tissue morphological feature (M) and gene expression (E) information to impute missing values

spatial.SME.pseudo_spot(adata[, tile_path, ...])

using spatial location (S), tissue morphological feature (M) and gene expression (E) information to impute gap between spots and increase resolution for gene detection

spatial.SME.SME_normalize(adata[, use_data, ...])

using spatial location (S), tissue morphological feature (M) and gene expression (E) information to normalize data.

Tools: tl

tl.clustering.kmeans(adata[, n_clusters, ...])

Perform kmeans cluster for spatial transcriptomics data

tl.clustering.louvain(adata[, resolution, ...])

Wrap function scanpy.tl.louvain Cluster cells into subgroups [Blondel08] [Levine15] [Traag17]. Cluster cells using the Louvain algorithm [Blondel08] in the implementation of [Traag17]. The Louvain algorithm has been proposed for single-cell analysis by [Levine15]. This requires having ran neighbors() or bbknn() first, or explicitly passing a adjacency matrix. :param adata: The annotated data matrix. :param resolution: For the default flavor ('vtraag'), you can provide a resolution (higher resolution means finding more and smaller clusters), which defaults to 1.0. See “Time as a resolution parameter” in [Lambiotte09]. :param random_state: Change the initialization of the optimization. :param restrict_to: Restrict the cluster to the categories within the key for sample annotation, tuple needs to contain (obs_key, list_of_categories). :param key_added: Key under which to add the cluster labels. (default: 'louvain') :param adjacency: Sparse adjacency matrix of the graph, defaults to adata.uns['neighbors']['connectivities']. :param flavor: Choose between to packages for computing the cluster. 'vtraag' is much more powerful, and the default. :param directed: Interpret the adjacency matrix as directed graph? :param use_weights: Use weights from knn graph. :param partition_type: Type of partition to use. Only a valid argument if flavor is 'vtraag'. :param partition_kwargs: Key word arguments to pass to partitioning, if vtraag method is being used. :param copy: Copy adata or modify it inplace.

tl.cci.load_lrs([names, species])

Loads inputted LR database, & concatenates into consistent database set of pairs without duplicates.

tl.cci.grid(adata[, n_row, n_col, ...])

Creates a new anndata representing a gridded version of the data; can be

tl.cci.run(adata, lrs[, min_spots, ...])

Performs stLearn LR analysis.

tl.cci.adj_pvals(adata[, pval_adj_cutoff, ...])

Performs p-value adjustment and determination of significant spots.

tl.cci.run_lr_go(adata, r_path[, n_top, ...])

Runs a basic GO analysis on the genes in the top ranked LR pairs.

tl.cci.run_cci(adata, use_label[, ...])

Calls significant celltype-celltype interactions based on cell-type data randomisation.

Plot: pl

pl.QC_plot(adata[, library_id, name, ...])

QC plot for sptial transcriptomics data.

pl.gene_plot(adata[, gene_symbols, ...])

Allows the visualization of a single gene or multiple genes as the values of dot points or contour in the Spatial transcriptomics array.

pl.gene_plot_interactive(adata)

pl.cluster_plot(adata[, title, figsize, ...])

Allows the visualization of a cluster results as the discretes values of dot points in the Spatial transcriptomics array.

pl.cluster_plot_interactive(adata)

pl.subcluster_plot(adata[, title, figsize, ...])

Allows the visualization of a subclustering results as the discretes values of dot points in the Spatial transcriptomics array.

pl.subcluster_plot(adata[, title, figsize, ...])

Allows the visualization of a subclustering results as the discretes values of dot points in the Spatial transcriptomics array.

pl.non_spatial_plot(adata[, use_label])

A wrap function to plot all the non-spatial plot from scanpy.

pl.deconvolution_plot(adata[, library_id, ...])

Clustering plot for sptial transcriptomics data.

pl.plot_mask(adata[, library_id, show_spot, ...])

mask plot for sptial transcriptomics data.

pl.lr_summary(adata[, n_top, highlight_lrs, ...])

Plotting the top LRs ranked by number of significant spots.

pl.lr_diagnostics(adata[, highlight_lrs, ...])

Diagnostic plot looking at relationship between technical features of lrs and lr rank.

pl.lr_n_spots(adata[, n_top, font_dict, ...])

Bar plot showing for each LR no.

pl.lr_go(adata[, n_top, highlight_go, ...])

Plots the results from the LR GO analysis.

pl.lr_result_plot(adata[, use_lr, ...])

Plots the per spot statistics for given LR.

pl.lr_plot(adata, lr[, min_expr, sig_spots, ...])

Creates different kinds of spatial visualisations for the LR analysis results.

pl.cci_check(adata, use_label[, figsize, ...])

Checks relationship between no.

pl.ccinet_plot(adata, use_label[, lr, pos, ...])

Circular celltype-celltype interaction network based on LR-CCI analysis.

pl.lr_chord_plot(adata, use_label[, lr, ...])

Chord diagram of interactions between cell types.

pl.lr_cci_map(adata, use_label[, lrs, ...])

Heatmap of interaction counts.

pl.cci_map(adata, use_label[, lr, ax, show, ...])

Heatmap visualising sender->receivers of cell type interactions.

pl.lr_plot_interactive(adata)

Plots the LR scores for significant spots interatively using Bokeh.

pl.spatialcci_plot_interactive(adata)

Plots the significant CCI in the spatial context interactively using Bokeh.

pl.trajectory.pseudotime_plot(adata[, ...])

Global trajectory inference plot (Only DPT).

pl.trajectory.local_plot(adata[, use_label, ...])

Local spatial trajectory inference plot.

pl.trajectory.tree_plot(adata[, library_id, ...])

Hierarchical tree plot represent for the global spatial trajectory inference.

pl.trajectory.transition_markers_plot(adata)

Plot transition marker.

pl.trajectory.DE_transition_plot(adata[, ...])

Differential expression between transition markers.

Tools: datasets