%matplotlib inline

Understand Bering object

This tutorial shows the data structure of Bering object. We will use Nanostring CosMx NSCLC (He et al., 2022) data as an example.

Import packages & data

import numpy as np
import pandas as pd
import tifffile as tiff
import matplotlib.pyplot as plt 

import Bering as br

# load data
df_spots_seg = pd.read_csv('spots_seg.txt', sep='\t', header=0, index_col=0)
df_spots_unseg = pd.read_csv('spots_unseg.txt', sep='\t', header=0, index_col=0)
img = tiff.imread('image.tif')
channels = ['Nuclei', 'PanCK', 'Membrane']

Visualize spots

Spots data include spots that are segmented (annotated) and unsegmented (without cell annotations).

df_spots_seg.head() # visualize the segmented data

	x	y	z	features	segmented	components	fov	labels
13342297	347.82858	48.571430	4	S100P	1	Cytoplasm	10	tumor
13342298	325.10000	42.125000	4	RAC1	1	Membrane	10	tumor
13342299	344.16248	43.912502	4	ITGB1	1	Cytoplasm	10	tumor
13342300	327.53000	48.960000	4	GSTP1	1	Cytoplasm	10	tumor
13342301	353.96250	19.949999	4	CLDN4	1	Membrane	10	tumor

# x, y, z are 3d coordinates of the transcripts on the image. 
print(f'min x, y, z: {df_spots_seg[["x", "y", "z"]].min().values}')
print(f'max x, y, z: {df_spots_seg[["x", "y", "z"]].max().values}')

min x, y, z: [ 9.51666641 10.775      -1.        ]
max x, y, z: [5461.38769531 3637.5612793     8.        ]

# columm "features" contains the names of the genes for individual transcripts
df_spots_seg['features'].value_counts()

MZT2A       58704
DUSP5       55854
MALAT1      29428
HSPA1A      18811
OLFM4       17501
            ...  
CHGA           95
ARG1           91
ADGRE1         88
NegPrb19       87
NegPrb15       87
Name: features, Length: 980, dtype: int64

# columns "segmented" and "labels" represent the segmented cell ids and cell type annotations. 
# This is the ground truth for the cell segmention and cell type classification task.
print(df_spots_seg['segmented'].value_counts())
print(df_spots_seg['labels'].value_counts())

2669    2002
975     1917
1330    1889
934     1615
559     1490
        ... 
2445      20
2779      20
3133      20
2647      20
630       20
Name: segmented, Length: 3129, dtype: int64
tumor          480533
myeloid        246703
fibroblast     102280
endothelial     92674
epithelial      75477
B               62654
T               29920
pDC             17912
NK              12644
Name: labels, dtype: int64

df_spots_unseg.head() # visualize unsegmented data

	components	features	fov	x	y	z
13131760	Membrane	S100A10	10	4667.9624	225.46251	4
13131761	0	FGR	10	5063.0000	974.55005	4
13131762	0	IL4R	10	3770.3710	1280.65720	4
13131763	Membrane	S100A10	10	4671.2170	1350.08340	4
13131764	0	IGHA1	10	2105.9167	2135.41670	4

Create Bering object

bg = br.BrGraph(df_spots_seg, df_spots_unseg, img, channels)
bg

<Bering.objects.bering.Bering_Graph at 0x2ba02b692460>

Cell metadata

bg.segmented.head() # visualize the metadata of the segmented cells

	cx	cy	dx	dy	d	labels
segmented
0	330.050018	36.464283	78.366302	61.139999	78.366302	tumor
1	630.057770	30.633572	113.074950	53.514284	113.074950	endothelial
2	773.127380	35.350002	61.014648	48.891116	61.014648	epithelial
3	1950.387451	21.155557	59.865234	25.408331	59.865234	myeloid
4	2082.986694	23.441666	60.033447	27.858335	60.033447	myeloid

Feature (gene) metadata

bg.features.head() # visualize the metadata of the features (genes)

	counts
features
AATK	340
ABL1	327
ABL2	319
ACE	301
ACE2	233

Other information

print(f'Number of transcripts: {bg.n_transcripts}')
print(f'Number of features: {bg.n_features}')
print(f'Number of cells: {bg.n_segmented}')
print(f'Number of labels: {bg.n_labels}')

print(f'minimal x, y coordinates: {bg.XMIN}, {bg.YMIN}')
print(f'maximal x, y coordinates: {bg.XMAX}, {bg.YMAX}')

print(f'Device of Bering object: {bg.device}')

Number of transcripts: 1331334
Number of features: 980
Number of cells: 3129
Number of labels: 10
minimal x, y coordinates: 9.51666641235352, 10.775000000000093
maximal x, y coordinates: 5461.54296875, 3637.561279296875
Device of Bering object: cuda