%matplotlib inline

Analyze Nanostring CosMx non-small-cell lung cancer (NSCLC) data

This tutorial shows how to apply Bering to Nanostring CosMx data.

Nanostring CosMx NSCLC [He et al., 2022] data was generated on eight FFPE non-small cell non cancer tissue samples. The original data can be found in this link: https://nanostring.com/products/cosmx-spatial-molecular-imager/nsclc-ffpe-dataset/. We took one of the filed of view (FOV 10) as an example in this tutorial.

Import packages & data

import random
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt 

import Bering as br

mpl.rcParams['figure.figsize'] = [5, 5]

# load data
df_spots_all = br.datasets.cosmx_nsclc_he()
df_spots_seg = df_spots_all[df_spots_all['labels'] != 'background'] # foreground nodes
df_spots_unseg = df_spots_all[df_spots_all['labels'] == 'background'] # background nodes

img = None 
channels = None

df_spots_seg.head()

Downloading dataset `cosmx_nsclc_he` from `https://figshare.com/ndownloader/files/41409093` as `None.tsv`

	x	y	z	features	segmented	components	fov	labels
13342297	347.82858	48.571430	4	S100P	1	Cytoplasm	10	tumor
13342298	325.10000	42.125000	4	RAC1	1	Membrane	10	tumor
13342299	344.16248	43.912502	4	ITGB1	1	Cytoplasm	10	tumor
13342300	327.53000	48.960000	4	GSTP1	1	Cytoplasm	10	tumor
13342301	353.96250	19.949999	4	CLDN4	1	Membrane	10	tumor

df_spots_unseg.head() # visualize unsegmented data

	x	y	z	features	segmented	components	fov	labels
13131760	4667.9624	225.46251	4	S100A10	-1	Membrane	10	background
13131761	5063.0000	974.55005	4	FGR	-1	0	10	background
13131762	3770.3710	1280.65720	4	IL4R	-1	0	10	background
13131763	4671.2170	1350.08340	4	S100A10	-1	Membrane	10	background
13131764	2105.9167	2135.41670	4	IGHA1	-1	0	10	background

# visualize segmented and unsegmented data
br.pl.Plot_Spots(df_spots_seg = df_spots_seg, df_spots_unseg = df_spots_unseg)

../../_images/9d6fd4016017041690fc07e84e289f9d7ee480309e80c121177a353d346e4a11.png

Create Bering object

# image-free segmentation
bg = br.BrGraph(df_spots_seg, df_spots_unseg, img, channels)
bg

<Bering.objects.bering.Bering_Graph at 0x2b88d5b7ca00>

bg.segmented.head() # summary of cells

	cx	cy	cz	dx	dy	dz	d	labels
segmented
0	330.050018	36.464283	5.0	78.366302	61.139999	8	78.366302	tumor
1	630.057770	30.633572	4.0	113.074950	53.514284	9	113.074950	endothelial
2	773.127380	35.350002	5.0	61.014648	48.891116	8	61.014648	epithelial
3	1950.387451	21.155557	1.0	59.865234	25.408331	8	59.865234	myeloid
4	2082.986694	23.441666	5.0	60.033447	27.858335	7	60.033447	myeloid

Create training data

# Build graphs for GCN training purpose
br.graphs.BuildWindowGraphs(
    bg, 
    n_cells_perClass = 8, 
    window_width = 100.0, 
    window_height = 100.0, 
    n_neighbors = 10, 
)

br.graphs.CreateData(
    bg, 
    batch_size = 16, 
    training_ratio = 0.8, 
)

Training

br.train.Training(
    bg,
    edge_rbf_stop = 128,
    edge_rbf_n_kernels = 64,
    node_epoches = 180,
)

Training node classifier:  99%|████████████████████████████████████████████████████████████████████████████████████████████████▍| 179/180 [02:31<00:00,  1.42it/s]

../../_images/0c14424230a75d73339e281eeba2adf4c0755d25574e26ce8a7d1ffebdf3d76a.png

Training node classifier: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 180/180 [02:34<00:00,  1.16it/s]
Training edge classifier:  98%|█████████████████████████████████████████████████████████████████████████████████████████████████  | 49/50 [00:36<00:00,  1.44it/s]

../../_images/dca9fc0827289a9976e50ddef69a60bca253b1047219dd536a8893725e12728c.png

Training edge classifier: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:39<00:00,  1.27it/s]

# save the trained model
import pickle
with open('cosmx_nsclc_he.pkl', 'wb') as f:
    pickle.dump(bg, f)

Visualizing model

# randomly select a cell
random_cell = cells = random.sample(bg.segmented.index.values.tolist(), 5)[3]

# plot node classification
df_window_raw, df_window_pred, predictions = br.pl.Plot_Classification(
    bg, 
    cell_name = random_cell,
    n_neighbors = 10, 
    zoomout_scale = 4,
)

../../_images/c33ebb009166c37e3911785fb0cc1fbf8d4c4cb9e2a78a9311c056ac6c4e0cd4.png

# plot cell segmentation
br.pl.Plot_Segmentation(
    bg, 
    cell_name = random_cell,
    n_neighbors = 10, 
    zoomout_scale = 4,
    use_image = True,
    pos_thresh = 0.6,
    resolution = 0.15,
    num_edges_perSpot = 100,
    min_prob_nodeclf = 0.3,
    n_iters = 20,
)

../../_images/994dc8fa9c8d84aff5c6a8bed3c82c6fea056551ff7cc1d660301fad5992db79.png

After the model is trained, we can use the trained model to predict the cell types and segment all spots on the whole slice. After node classification and cell segmentation is completed, we generate single cell matrix in the end.

Node classification

Conduct node classification on the whole slice.

br.tl.node_classification(
    bg, bg.spots_all.copy(), 
    n_neighbors = 10, 
)
bg.spots_all.to_csv('spots_all.txt', sep = '\t')

Cell segmentation

br.tl.cell_segmentation(bg)

Get single cells

df_results, adata_ensembl, adata_segmented = br.tl.cell_annotation(bg)
df_results.to_csv('results.txt', sep = '\t')
adata_ensembl.write('results_cells_ensembled.h5ad')
adata_segmented.write('results_cells_segmented.h5ad')

print(f'Ensembled anndata: {adata_ensembl.shape}')
print(f'Segmented anndata: {adata_segmented.shape}')

... storing 'predicted_labels' as categorical

Ensembled anndata: (741, 980)
Segmented anndata: (1708, 980)

Single cell data analysis

import scanpy as sc
sc.settings.set_figure_params(dpi=80)

# run standard analysis on the ensembled anndata
adata_ensembl = br.tl.cell_analyze(adata_ensembl)

# first check the data quality of segmented cells
sc.pl.umap(adata_ensembl, color = ['n_counts','n_genes'])

2023-08-22 14:01:33.701161: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-08-22 14:01:33.921550: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.

../../_images/b54e94dd702dacce6296b0c5ac8e0454d1cd66ed0c1dda343b340126b9cde595.png

# plot leiden clustering and predicted labels from Bering
fig, axes = plt.subplots(1, 2, figsize = (10, 5))
axes[0] = sc.pl.umap(adata_ensembl, color = ['leiden'], ax = axes[0], show = False)
axes[1] = sc.pl.umap(adata_ensembl, color = ['predicted_labels'], ax = axes[1], show = False)

plt.subplots_adjust(wspace = 0.5)
plt.tight_layout()
plt.show()

../../_images/6ab1f531fde16ec78a09cbe2defd75be3695bbf1948902f5126c035af331d28c.png