%matplotlib inline

Understand Bering object

This tutorial shows the data structure of Bering object. We will use Nanostring CosMx NSCLC (He et al., 2022) data as an example.

Import packages & data

import numpy as np
import pandas as pd
import tifffile as tiff
import matplotlib.pyplot as plt 

import Bering as br
# load data
df_spots_seg = pd.read_csv('spots_seg.txt', sep='\t', header=0, index_col=0)
df_spots_unseg = pd.read_csv('spots_unseg.txt', sep='\t', header=0, index_col=0)
img = tiff.imread('image.tif')
channels = ['Nuclei', 'PanCK', 'Membrane']

Visualize spots

Spots data include spots that are segmented (annotated) and unsegmented (without cell annotations).

df_spots_seg.head() # visualize the segmented data
x y z features segmented components fov labels
13342297 347.82858 48.571430 4 S100P 1 Cytoplasm 10 tumor
13342298 325.10000 42.125000 4 RAC1 1 Membrane 10 tumor
13342299 344.16248 43.912502 4 ITGB1 1 Cytoplasm 10 tumor
13342300 327.53000 48.960000 4 GSTP1 1 Cytoplasm 10 tumor
13342301 353.96250 19.949999 4 CLDN4 1 Membrane 10 tumor
# x, y, z are 3d coordinates of the transcripts on the image. 
print(f'min x, y, z: {df_spots_seg[["x", "y", "z"]].min().values}')
print(f'max x, y, z: {df_spots_seg[["x", "y", "z"]].max().values}')
min x, y, z: [ 9.51666641 10.775      -1.        ]
max x, y, z: [5461.38769531 3637.5612793     8.        ]
# columm "features" contains the names of the genes for individual transcripts
df_spots_seg['features'].value_counts()
MZT2A       58704
DUSP5       55854
MALAT1      29428
HSPA1A      18811
OLFM4       17501
            ...  
CHGA           95
ARG1           91
ADGRE1         88
NegPrb19       87
NegPrb15       87
Name: features, Length: 980, dtype: int64
# columns "segmented" and "labels" represent the segmented cell ids and cell type annotations. 
# This is the ground truth for the cell segmention and cell type classification task.
print(df_spots_seg['segmented'].value_counts())
print(df_spots_seg['labels'].value_counts())
2669    2002
975     1917
1330    1889
934     1615
559     1490
        ... 
2445      20
2779      20
3133      20
2647      20
630       20
Name: segmented, Length: 3129, dtype: int64
tumor          480533
myeloid        246703
fibroblast     102280
endothelial     92674
epithelial      75477
B               62654
T               29920
pDC             17912
NK              12644
Name: labels, dtype: int64
df_spots_unseg.head() # visualize unsegmented data
components features fov x y z
13131760 Membrane S100A10 10 4667.9624 225.46251 4
13131761 0 FGR 10 5063.0000 974.55005 4
13131762 0 IL4R 10 3770.3710 1280.65720 4
13131763 Membrane S100A10 10 4671.2170 1350.08340 4
13131764 0 IGHA1 10 2105.9167 2135.41670 4

Create Bering object

bg = br.BrGraph(df_spots_seg, df_spots_unseg, img, channels)
bg
<Bering.objects.bering.Bering_Graph at 0x2ba02b692460>

Cell metadata

bg.segmented.head() # visualize the metadata of the segmented cells
cx cy dx dy d labels
segmented
0 330.050018 36.464283 78.366302 61.139999 78.366302 tumor
1 630.057770 30.633572 113.074950 53.514284 113.074950 endothelial
2 773.127380 35.350002 61.014648 48.891116 61.014648 epithelial
3 1950.387451 21.155557 59.865234 25.408331 59.865234 myeloid
4 2082.986694 23.441666 60.033447 27.858335 60.033447 myeloid

Feature (gene) metadata

bg.features.head() # visualize the metadata of the features (genes)
counts
features
AATK 340
ABL1 327
ABL2 319
ACE 301
ACE2 233

Other information

print(f'Number of transcripts: {bg.n_transcripts}')
print(f'Number of features: {bg.n_features}')
print(f'Number of cells: {bg.n_segmented}')
print(f'Number of labels: {bg.n_labels}')

print(f'minimal x, y coordinates: {bg.XMIN}, {bg.YMIN}')
print(f'maximal x, y coordinates: {bg.XMAX}, {bg.YMAX}')

print(f'Device of Bering object: {bg.device}')
Number of transcripts: 1331334
Number of features: 980
Number of cells: 3129
Number of labels: 10
minimal x, y coordinates: 9.51666641235352, 10.775000000000093
maximal x, y coordinates: 5461.54296875, 3637.561279296875
Device of Bering object: cuda