NYU Dataset
Mapping the Landscape of Histomorphological Cancer Phenotypes Using Self-Supervised Learning on Unannotated Pathology Slides
UID: 10701
- Description
- Histomorphological Phenotype Learning (HPL) was developed as a self-supervised methodology to discriminatory features in microscopy images to aid in cancer diagnosis and management. The methodology partitions whole slide images (WSIs) into meaningful Histomorphological Phenotype Clusters (HPCs) that can be used to define and quantify morphological phenotypes which recur within and between cases. HPL employs the following approach: (1) WSI pre-processing, (2) self-supervised learning of tissue tiles, (3) tissue tile representation clustering into Histomorphological Phenotype Clusters (HPCs), and (4) HPC characterization. This process has been applied to images from 276 patients at NYU Langone and a multi-cancer cohort of patients from The Cancer Genome Atlas (TCGA) to identify and study histomorphological phenotypes associated with lung adenocarcinoma (LUAD). HPL was able to identify LUAD types and growth patterns that are comparable to the World Health Organization (WHO) classification, correlate with patient survival (mean concordance-index for recurrence-free survival= 0.74), and align with transcriptomic measures of immunophenotype.
Homo sapiens
Subject Domain
Population Age
Keywords
Adult (19 years - 64 years)
Senior (65 years - 79 years)
Aged (80 years and over)
Access
- Restrictions
-
Free to AllAuthor Approval Required
- Instructions
- Source data for manuscript figures are included with the publication in Nature Communications. Code and data (i.e., pre-trained LUAD/LUSC model checkpoints, multi-cancer model checkpoints; tile vector representations; HPC configurations; whole slide image and patient vector representations; and jupyter notebook) are available through the GitHub and Zenodo repositories. The project utilized slide images for 10 cancer types (IDs: TCGA-LUAD, TCGA-LUSC, TCGA-BLCA, TCGA-BRCA, TCGA-CESC, TCGA-COAD, TCGA-PRAD, TCGA-SKCM, TCGA-STAD) from the Genomic Data Commons (GDC). Requests for data from the NYU patient cohort can be submitted to the corresponding author and will be subject to a data transfer agreement.
Observational
- Grant Support
-
EP/R018634/1/Engineering and Physical Sciences Research Council (EPSRC)BB/V016067/1/Engineering and Physical Sciences Research Council (EPSRC)2019-06360/Swedish Research Council
Do you have or know of a dataset that should be added to the catalog? Let us know!