Integrate multiple spatial transcriptomics datasets

Import library

[1]:
import stlearn as st
import scanpy as sc
import numpy as np

st.settings.set_figure_params(dpi=150)

Read data

In this tutorial, we are using the Breast cancer datasets with 2 sections of block A.

Source: https://support.10xgenomics.com/spatial-gene-expression/datasets

[2]:
block1 = st.Read10X("SECTION_A_PATH")
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
[3]:
block2 = st.Read10X("SECTION_B_PATH")
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.

Processing data

[4]:
# concatenate 2 samples
adata_concat = block1.concatenate(block2)
[5]:
# Preprocessing
# Filter genes
sc.pp.filter_genes(adata_concat, min_cells=3)
# Normalize data
sc.pp.normalize_total(adata_concat, target_sum=1e4)
# Log transformation
sc.pp.log1p(adata_concat)
# Store raw data
adata_concat.raw = adata_concat
# Extract top highly variable genes
sc.pp.highly_variable_genes(adata_concat, min_mean=0.0125, max_mean=3, min_disp=0.5)
adata_concat = adata_concat[:, adata_concat.var.highly_variable]
# Scale data
sc.pp.scale(adata_concat, max_value=10)
/home/uqdpham4/.conda/envs/stlearn/lib/python3.8/site-packages/scanpy/preprocessing/_simple.py:810: UserWarning: Revieved a view of an AnnData. Making a copy.
  view_to_actual(adata)
[6]:
# Run dimensionality reduction
sc.pp.pca(adata_concat, n_comps=30, svd_solver='arpack')

Run integration with harmony

[7]:
# Prepare metadata and PCA
meta_data = adata_concat.obs
data_mat = adata_concat.obsm["X_pca"]
[8]:
# Import and run harmony
import harmonypy as hm
ho = hm.run_harmony(data_mat, meta_data, "batch")
2022-01-31 19:15:28,174 - harmonypy - INFO - Iteration 1 of 10
2022-01-31 19:15:30,522 - harmonypy - INFO - Iteration 2 of 10
2022-01-31 19:15:33,014 - harmonypy - INFO - Iteration 3 of 10
2022-01-31 19:15:35,330 - harmonypy - INFO - Iteration 4 of 10
2022-01-31 19:15:37,102 - harmonypy - INFO - Iteration 5 of 10
2022-01-31 19:15:38,008 - harmonypy - INFO - Iteration 6 of 10
2022-01-31 19:15:38,805 - harmonypy - INFO - Iteration 7 of 10
2022-01-31 19:15:39,602 - harmonypy - INFO - Iteration 8 of 10
2022-01-31 19:15:40,416 - harmonypy - INFO - Iteration 9 of 10
2022-01-31 19:15:41,113 - harmonypy - INFO - Iteration 10 of 10
2022-01-31 19:15:41,800 - harmonypy - INFO - Stopped before convergence
[9]:
# Mapping back the result to the adata object
adata_concat.obsm["X_pca"] = ho.Z_corr.T

Perform clustering and visualize the results by UMAP

[10]:
# Build KNN and run UMAP
sc.pp.neighbors(adata_concat, n_pcs =30)
sc.tl.umap(adata_concat)
[11]:
# Run clustering with leiden
sc.tl.leiden(adata_concat, resolution=0.4)
[12]:
# Plotting UMAP
sc.pl.umap(adata_concat, color=["batch","leiden"])
... storing 'feature_types' as categorical
... storing 'genome' as categorical
../_images/tutorials_Integration_multiple_datasets_16_1.png

Map the result back to the original samples

[13]:
st.settings.set_figure_params(dpi=150)
# Map leiden clusteirng result to block A section 1
block1.obs["leiden"] = adata_concat.obs[adata_concat.obs.batch=="0"].leiden.values
[14]:
# Plotting the clusteirng result
st.pl.cluster_plot(block1,use_label="leiden")
../_images/tutorials_Integration_multiple_datasets_19_0.png
[15]:
# Map leiden clusteirng result to block A section 2
block2.obs["leiden"] = adata_concat.obs[adata_concat.obs.batch=="1"].leiden.values
[16]:
# Plotting the clusteirng result
st.pl.cluster_plot(block2,use_label="leiden")
../_images/tutorials_Integration_multiple_datasets_21_0.png

Manually combine the images and change the coordinates

This is a way to plotting 2 samples in the same adata object.

[17]:
# Initialize the spatial
adata_concat.uns["spatial"] = block1.uns["spatial"]
[18]:
# Horizontally stack 2 images from section 1 and section 2 datasets
combined = np.hstack([block1.uns["spatial"]["V1_Breast_Cancer_Block_A_Section_1"]["images"]["hires"],
                      block2.uns["spatial"]["V1_Breast_Cancer_Block_A_Section_2"]["images"]["hires"]])
[19]:
# Map the image to the concatnated adata object
adata_concat.uns["spatial"]["V1_Breast_Cancer_Block_A_Section_1"]["images"]["hires"] = combined
[20]:
# Manually change the coordinate of spots to the right
adata_concat.obs.loc[adata_concat.obs.batch == "1","imagecol"] = adata_concat.obs.loc[adata_concat.obs.batch == "1","imagecol"].values + 2000
[21]:
# Change to the .obsm["spatial"]
factor = adata_concat.uns["spatial"]["V1_Breast_Cancer_Block_A_Section_1"]["scalefactors"]["tissue_hires_scalef"]
adata_concat.obsm["spatial"] = adata_concat.obs[["imagecol","imagerow"]].values / factor
[22]:
st.settings.set_figure_params(dpi=200)
[23]:
# Plot the gene
st.pl.gene_plot(adata_concat, gene_symbols="KRT5",crop=False, size=1.4,cell_alpha=1)
/home/uqdpham4/.conda/envs/stlearn/lib/python3.8/site-packages/stlearn/plotting/classes.py:165: MatplotlibDeprecationWarning: The 'cmap' parameter to Colorbar has no effect because it is overridden by the mappable; it is deprecated since 3.3 and will be removed two minor releases later.
  cb = plt.colorbar(plot, aspect=10, shrink=0.5, cmap=self.cmap)
../_images/tutorials_Integration_multiple_datasets_29_1.png
[24]:
# Plot the clusters
st.pl.cluster_plot(adata_concat, use_label="leiden", crop=False, size=1.4,cell_alpha=1)
../_images/tutorials_Integration_multiple_datasets_30_0.png