%matplotlib inline

Understand Neighbor Gene Component and Co-localization Graphs

This tutorial shows the disentanglement of neighbor gene components (NGC) and colocalization graphs, which are used as the input for training. We will use Nanostring CosMx NSCLC (He et al., 2022) data as an example.

Import packages & data

import sys 
import numpy as np
import pandas as pd
import tifffile as tiff
import matplotlib.pyplot as plt 

import Bering as br
# load data
df_spots_seg = pd.read_csv('spots_seg.txt', sep='\t', header=0, index_col=0)
df_spots_unseg = pd.read_csv('spots_unseg.txt', sep='\t', header=0, index_col=0)
img = tiff.imread('image.tif')
channels = ['Nuclei', 'PanCK', 'Membrane']

Create Bering object

bg = br.BrGraph(df_spots_seg, df_spots_unseg, img, channels)
bg
<Bering.objects.bering.Bering_Graph at 0x2af8439f4580>

Construct graphs

# Build graphs for GCN training purpose
br.graphs.BuildWindowGraphs(
    bg, 
    n_cells_perClass = 10, 
    window_width = 100.0, 
    window_height = 100.0, 
    n_neighbors = 10, 
)
print(f'Number of node features: {bg.n_node_features}')
Number of node features: 981
graphs = bg.Graphs_golden
print(f'Number of graphs: {len(graphs)}')

graph = graphs[0].cpu()
print('Type of graph:', type(graph))
graph
Number of graphs: 436
Type of graph: <class 'torch_geometric.data.data.Data'>
Data(x=[994, 981], edge_index=[2, 9940], edge_attr=[9940], y=[994, 10], pos=[994, 4])

cocalization graphs

graph.x
tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 8.4660],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 8.4303],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 8.7936],
        ...,
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 4.7216],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 3.7851],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 3.6468]],
       dtype=torch.float64)

Edges of the graph

graph.edge_index
tensor([[  0,   0,   0,  ..., 993, 993, 993],
        [  2,   1,   4,  ..., 704, 593, 291]])

Labels of nodes

Labels of nodes are the cell type indices for individual transcripts

graph.y
tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 1],
        [0, 0, 0,  ..., 0, 0, 1]])

Node position matrix

pos_mtx = graph.pos.numpy()
df_pos = pd.DataFrame(pos_mtx, columns = ['molecule_id', 'x', 'y', 'cell_id'])
print(df_pos.shape)
df_pos.head()
(994, 4)
molecule_id x y cell_id
0 100012.0 1258.950100 609.375000 292.0
1 100034.0 1265.075000 605.100000 292.0
2 100038.0 1258.450000 608.375000 292.0
3 100268.0 1265.899902 603.123108 292.0
4 100293.0 1265.083374 604.583313 292.0