%matplotlib inline

Analyze MERFISH ileum data

This tutorial shows how to apply Bering to MERFISH ileum data.

MERFISH ileum [Petukhov et al., 2022] data was derived from: https://datadryad.org/stash/dataset/doi:10.5061/dryad.jm63xsjb2. The data include spatial distribution of transcripts from 241 genes, and DAPI and membrane staining images.

Import packages & data

import random
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt 

import Bering as br

mpl.rcParams['figure.figsize'] = [3.5, 3.5]

# load data
df_spots_all = br.datasets.merfish_ileum_petukhov()
df_spots_seg = df_spots_all[df_spots_all['labels'] != 'background'] # foreground nodes
df_spots_unseg = df_spots_all[df_spots_all['labels'] == 'background'] # background nodes
img = None # image-free
channels = None

df_spots_seg.head()

Downloading dataset `merfish_ileum_petukhov` from `https://figshare.com/ndownloader/files/41409102` as `None.tsv`

	x	y	z	features	segmented	labels
mol_id
3048145	1705.0	1271.0	0.0	Maoa	84	Smooth Muscle
3048147	1725.0	1922.0	0.0	Maoa	231	Myenteric Plexus
3048148	1753.0	1863.0	0.0	Maoa	231	Myenteric Plexus
3048149	1760.0	1865.0	0.0	Maoa	231	Myenteric Plexus
3048153	1904.0	794.0	0.0	Maoa	53	Smooth Muscle

df_spots_unseg.head() # visualize unsegmented data

	x	y	z	features	segmented	labels
mol_id
3048188	1781.0	1101.0	55.072763	Maoa	-1	background
3048192	1896.0	660.0	55.072763	Maoa	-1	background
3048255	1682.0	1382.0	13.768191	Txndc5	-1	background
3048374	1893.0	1563.0	0.000000	Slc12a2	-1	background
3048376	1910.0	1472.0	0.000000	Slc12a2	-1	background

br.pl.Plot_Spots(df_spots_seg = df_spots_seg, df_spots_unseg = df_spots_unseg)

../../_images/dafa004c6b914d52e90d11091a8e80fd37347e100efde8023497e28f2a83b919.png

Create Bering object

# image-dependent segmentation
bg = br.BrGraph(df_spots_seg, df_spots_unseg, img, channels)
bg

<Bering.objects.bering.Bering_Graph at 0x2b17f4d52310>

bg.segmented.head() # summary of cells

	cx	cy	cz	dx	dy	dz	d	labels
segmented
0	1710.0	1263.0	55.072763	77.0	76.0	96.377334	96.377334	Smooth Muscle
1	1742.0	1895.0	27.536381	105.0	121.0	82.609144	121.000000	Myenteric Plexus
2	1928.0	837.0	27.536381	156.0	120.0	96.377334	156.000000	Smooth Muscle
3	1899.5	1414.0	27.536381	139.0	89.0	82.609144	139.000000	Smooth Muscle
4	1812.5	1849.0	27.536381	74.0	54.0	55.072763	74.000000	Myenteric Plexus

Create training data

# Build graphs for GCN training purpose
br.graphs.BuildWindowGraphs(
    bg, 
    n_cells_perClass = 5, 
    window_width = 150.0, 
    window_height = 150.0, 
    n_neighbors = 10, 
)

br.graphs.CreateData(
    bg, 
    batch_size = 16, 
    training_ratio = 0.8, 
)

Training

br.train.Training(
    bg,
    node_epoches = 50,
    edge_epoches = 15,
    edge_rbf_start = 0,
    edge_rbf_stop = 384,
    edge_rbf_n_kernels = 128,
)

Training node classifier:  98%|█████████████████████████████████████████████████████████████████████████████████████████████████  | 49/50 [00:49<00:00,  1.32it/s]

../../_images/72245ef09b47175eae49c8be560a0714363a50bf584f19ece85e1c0b20d6cc1d.png

Training node classifier: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:53<00:00,  1.07s/it]
Training edge classifier:  93%|████████████████████████████████████████████████████████████████████████████████████████████▍      | 14/15 [00:12<00:00,  1.15it/s]

../../_images/fe1878e80950a94c09e41f368ed3815284e9c117eff0d1c760063a82166e6c83.png

Training edge classifier: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:15<00:00,  1.05s/it]

Visualizing model

# randomly select a cell
random_cell = cells = random.sample(bg.segmented.index.values.tolist(), 2)[1]

# plot node classification
df_window_raw, df_window_pred, predictions = br.pl.Plot_Classification(
    bg, 
    cell_name = random_cell,
    n_neighbors = 10, 
    zoomout_scale = 4,
)

../../_images/e4e71df7a4d1739dca31abc33868e3f6105d0e9a749ef5dc3a532d24469f198e.png

# plot cell segmentation
br.pl.Plot_Segmentation(
    bg, 
    cell_name = random_cell,
    df_window_raw = df_window_raw, 
    df_window_pred = df_window_pred,
    predictions = predictions,
    n_neighbors = 10, 
    zoomout_scale = 4,
    use_image = True,
    pos_thresh = 0.6,
    resolution = 0.5,
    num_edges_perSpot = 100,
    min_prob_nodeclf = 0.3,
    n_iters = 20,
)

../../_images/c2f7d89987b70c241fe68cf0304263083d6364b75eedcbc606309a6aa8fceb82.png

After the model is trained, we can use the trained model to predict the cell types and segment all spots on the whole slice. After node classification and cell segmentation is completed, we generate single cell matrix in the end.

Node classification

Conduct node classification on the whole slice.

br.tl.node_classification(
    bg, bg.spots_all.copy(), 
    n_neighbors = 10, 
)
bg.spots_all.to_csv('spots_all.txt', sep = '\t')

Cell segmentation

br.tl.cell_segmentation(bg)

Get single cells

df_results, adata_ensembl, adata_segmented = br.tl.cell_annotation(bg)
df_results.to_csv('results.txt', sep = '\t')
adata_ensembl.write('results_cells_ensembled.h5ad')
adata_segmented.write('results_cells_segmented.h5ad')

print(f'Ensembled anndata: {adata_ensembl.shape}')
print(f'Segmented anndata: {adata_segmented.shape}')

... storing 'predicted_labels' as categorical

Ensembled anndata: (478, 241)
Segmented anndata: (951, 241)

Single cell data analysis

import scanpy as sc
sc.settings.set_figure_params(dpi=80)

# run standard analysis on the ensembled anndata
adata_ensembl = br.tl.cell_analyze(adata_ensembl)

# first check the data quality of segmented cells
sc.pl.umap(adata_ensembl, color = ['n_counts','n_genes'])

2023-08-22 14:40:54.079668: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-08-22 14:40:54.299465: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.

../../_images/53b7966942871f5ffe8aa5d6d81862974a17102940096341f0a8dcad9cda1093.png

# plot leiden clustering and predicted labels from Bering
fig, axes = plt.subplots(1, 2, figsize = (15, 5))
axes[0] = sc.pl.umap(adata_ensembl, color = ['leiden'], ax = axes[0], show = False)
axes[1] = sc.pl.umap(adata_ensembl, color = ['predicted_labels'], ax = axes[1], show = False)

plt.subplots_adjust(wspace = 0.5)
plt.tight_layout()
plt.show()

../../_images/765efb12634b84b633b85b03c0a2d0481fffacc37a2b9d41e9f128ede0d59e02.png