The dataset includes 864 genomic sequences obtained from COVID-19 patients aged 18 years old and above within the NYU Langone Health system between March 12 and May 10, 2020. Patients reflected the health system's catchment area (i.e., the greater New York City metropolitan area).

rRNA-depleted total RNA was sequenced, underwent quality control, and analyzed to produce phylogenetic trees. Further details on materials and workflow utilized for sequence extraction can be found in the Methods section of the associated article.

Geographic Coverage
New York (State)
New York (State) - New York City
Subject of Study
Subject Domain
Population Age
Adult (19 years - 64 years)
Senior (65 years - 79 years)
Aged (80 years and over)


Free to All
Application Required

Raw COVID-19 sequencing data produced in this study may be accessed through the NCBI BioProject repository.

Genetic sequences may also be found in the GISAID repository by searching for "NYUMC" in the nCov project. A GISAID account is required to access the genetic sequences in the GISAID repository.

Access via BioProject

Genetic sequences
Accession #: PRJNA650245

Access via GISAID

Genetic sequences
Accession #: NYUMC

Associated Publications
Data Type
Equipment Used
Agilent 2200 TapeStation System
Applied Biosystems 7500 Fast Dx
Biometra TRobot
Cepheid GeneXpert Xpress
cobas 6800 System
IDT xGen COVID Capture Panel
Illumina NextSeq 500
Illumina NovaSeq 6000
Kapa Biosystems qPCR KK4824
KAPA RiboErase Kit (HMR)
Software Used
Augur v7.0.2
BCFtools v1.9
bcl2fastq v2.20
BWA v0.7.17
IQ-TREE v1.6.12
MAFFT v7.450
MAFFT v7.453
samblaster v0.1.24
TreeTime v.0.7.4
Trimmomatic v0.39
Study Type
Grant Support
P50 CA016087/NCI
MR/R015600/1/Medical Research Council Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London
Other Resources

Visualization of COVID-19 evolution

Maurano Lab Github

Script for sequencing data processing