%matplotlib inline

Understand Neighbor Gene Component and Co-localization Graphs

This tutorial shows the disentanglement of neighbor gene components (NGC) and colocalization graphs, which are used as the input for training. We will use Nanostring CosMx NSCLC (He et al., 2022) data as an example.

Import packages & data

import sys 
import numpy as np
import pandas as pd
import tifffile as tiff
import matplotlib.pyplot as plt 

import Bering as br

# load data
df_spots_seg = pd.read_csv('spots_seg.txt', sep='\t', header=0, index_col=0)
df_spots_unseg = pd.read_csv('spots_unseg.txt', sep='\t', header=0, index_col=0)
img = tiff.imread('image.tif')
channels = ['Nuclei', 'PanCK', 'Membrane']

Create Bering object

bg = br.BrGraph(df_spots_seg, df_spots_unseg, img, channels)
bg

<Bering.objects.bering.Bering_Graph at 0x2af8439f4580>

Construct graphs

# Build graphs for GCN training purpose
br.graphs.BuildWindowGraphs(
    bg, 
    n_cells_perClass = 10, 
    window_width = 100.0, 
    window_height = 100.0, 
    n_neighbors = 10, 
)

print(f'Number of node features: {bg.n_node_features}')

Number of node features: 981

graphs = bg.Graphs_golden
print(f'Number of graphs: {len(graphs)}')

graph = graphs[0].cpu()
print('Type of graph:', type(graph))
graph

Number of graphs: 436
Type of graph: <class 'torch_geometric.data.data.Data'>

Data(x=[994, 981], edge_index=[2, 9940], edge_attr=[9940], y=[994, 10], pos=[994, 4])

cocalization graphs

graph.x

tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 8.4660],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 8.4303],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 8.7936],
        ...,
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 4.7216],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 3.7851],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 3.6468]],
       dtype=torch.float64)

Edges of the graph

graph.edge_index

tensor([[  0,   0,   0,  ..., 993, 993, 993],
        [  2,   1,   4,  ..., 704, 593, 291]])

Labels of nodes

Labels of nodes are the cell type indices for individual transcripts

graph.y

tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 1],
        [0, 0, 0,  ..., 0, 0, 1]])

Node position matrix

pos_mtx = graph.pos.numpy()
df_pos = pd.DataFrame(pos_mtx, columns = ['molecule_id', 'x', 'y', 'cell_id'])
print(df_pos.shape)
df_pos.head()

(994, 4)

	molecule_id	x	y	cell_id
0	100012.0	1258.950100	609.375000	292.0
1	100034.0	1265.075000	605.100000	292.0
2	100038.0	1258.450000	608.375000	292.0
3	100268.0	1265.899902	603.123108	292.0
4	100293.0	1265.083374	604.583313	292.0