NYU Dataset

Mapping the Landscape of Histomorphological Cancer Phenotypes Using Self-Supervised Learning on Unannotated Pathology Slides

UID: 10701
* Corresponding Author
Description
Histomorphological Phenotype Learning (HPL) was developed as a self-supervised methodology to discriminatory features in microscopy images to aid in cancer diagnosis and management. The methodology partitions whole slide images (WSIs) into meaningful Histomorphological Phenotype Clusters (HPCs) that can be used to define and quantify morphological phenotypes which recur within and between cases. HPL employs the following approach: (1) WSI pre-processing, (2) self-supervised learning of tissue tiles, (3) tissue tile representation clustering into Histomorphological Phenotype Clusters (HPCs), and (4) HPC characterization. This process has been applied to images from 276 patients at NYU Langone and a multi-cancer cohort of patients from The Cancer Genome Atlas (TCGA) to identify and study histomorphological phenotypes associated with lung adenocarcinoma (LUAD). HPL was able to identify LUAD types and growth patterns that are comparable to the World Health Organization (WHO) classification, correlate with patient survival (mean concordance-index for recurrence-free survival= 0.74), and align with transcriptomic measures of immunophenotype.
Subject of Study
Subject Domain
Population Age
Adult (19 years - 64 years)
Senior (65 years - 79 years)
Aged (80 years and over)
Keywords

Access

Restrictions
Free to All
Author Approval Required
Instructions
Source data for manuscript figures are included with the publication in Nature Communications. Code and data (i.e., pre-trained LUAD/LUSC model checkpoints, multi-cancer model checkpoints; tile vector representations; HPC configurations; whole slide image and patient vector representations; and jupyter notebook) are available through the GitHub and Zenodo repositories. The project utilized slide images for 10 cancer types (IDs: TCGA-LUAD, TCGA-LUSC, TCGA-BLCA, TCGA-BRCA, TCGA-CESC, TCGA-COAD, TCGA-PRAD, TCGA-SKCM, TCGA-STAD) from the Genomic Data Commons (GDC). Requests for data from the NYU patient cohort can be submitted to the corresponding author and will be subject to a data transfer agreement.
Access via GitHub

Code and data

Access via Zenodo

Code and data
Accession #: 10718821

Access via Genomic Data Commons

Data portal
Accession #: TCGA-LUAD, TCGA-LUSC, TCGA-BLCA, TCGA-BRCA, TCGA-CESC, TCGA-COAD, TCGA-PRAD, TCGA-SKCM, TCGA-STAD

Associated Publications
Data Type
Software Used
DeepPATH
Study Type
Observational
Grant Support
EP/R018634/1/Engineering and Physical Sciences Research Council (EPSRC)
BB/V016067/1/Engineering and Physical Sciences Research Council (EPSRC)
2019-06360/Swedish Research Council