Search here to find large public and licensed datasets

About the Data Catalog

The NYU Data Catalog facilitates researchers’ discovery of data by providing a searchable and browsable online collection of datasets. Rather than functioning as a data repository, the catalog is a digital way-finder for researchers looking for datasets relevant to their work. It includes datasets generated by NYU researchers as well as publically available and licensed datasets that are generated at external organizations, e.g. the Bureau of Labor Statistics.

The NYU Data Catalog is designed to:

  • Increase the visibility of research data generated by NYU researchers
  • Facilitate collaboration across departments and institutes at NYU
  • Help NYU researchers locate and understand datasets generated at external organizations
  • Support the process of re-using research data

If you are interested in submitting a dataset to the NYU Data Catalog, have a suggestion for additional datasets to add, or are willing to serve as a local expert, please use the Contact Us form.

The code used to create the NYU Data Catalog is open source and available via GitHub. Documentation and further information is available via OSF. If you would like to create a similar catalog, please use the Contact Us form to learn more about the multi-institution Data Catalog Collaboration Project.

Meet the Team

Nicole

Nicole Contaxis

Data Catalog Coordinator

Nicole Contaxis, MLIS is the Data Catalog Coordinator at the NYU Health Sciences Library. She works alongside researchers to make research data discoverable through the NYU Data Catalog. Her areas of interest include data sharing, data ethics, and community engagement. Nicole is a former National Digital Stewardship Resident at the National Library of Medicine. She received her MLIS from UCLA, and is currently working on her M.A. in Bioethics at NYU.

Ian

Ian Lamb

Senior Solutions Developer

Ian Lamb is a full-stack web developer at the NYU Health Sciences Library and is the principal developer of our data catalog. He focuses on building friendly and usable systems to advance the institution’s clinical, educational, and research goals.

Debbie

Debbie Peters

Executive Assistant to the Chair

Debbie works closely with library administration on human resource administration, finance policy and procedures and managing the day-to-day daily operations. She serves as meeting coordinator and administrative support for the NYU Data Catalog and the Data Catalog Collaboration Project (DCCP).

Kevin

Kevin Read

Lead of Data Discovery and Data Services Librarian

Kevin is the Lead of the NYU Data Catalog and the Data Catalog Collaboration Project (DCCP), an initiative of 8 academic medical centers working together to improve research data discovery using the NYU Data Catalog's software. Before he came to NYU, Kevin worked as a fellow at the NIH to lay the groundwork for developing a PubMed for datasets. Since then, he has worked on the NIH bioCADDIE initiative to help develop systems for data discovery, developed a common metadata schema to describe biomedical research data, and established the DCCP. Kevin came to the U.S. from Canada in 2013 where he studied at the University of British Columbia in Vancouver.

Alisa

Alisa Surkis

Vice Chair for Research and Assistant Director, Research Data and Metrics

Alisa Surkis, PhD, MLS is the Vice Chair for Research and Assistant Director, Research Data and Metrics for the Health Sciences Library. She serves as the senior advisor for the NYU Data Catalog and as a member of the Data Catalog Collaboration Project.

The Data Catalog Collaboration Project

The Data Catalog Collaboration Project (DCCP) was created to facilitate the discovery of biomedical research data that are hard to find. The DCCP consists of academic health science libraries that have implemented local instances of an open source data catalog to index biomedical research data. This collaboration brings a cross-institutional perspective to addressing usability, data sharing workflows, metadata, and outreach for improving data discovery efforts in biomedical research.

The DCCP serves as the first step for researchers to make their data more discoverable. As a low-barrier entry into data sharing, researchers can describe their data using our metadata schema, while retaining control of how they share it.

The Goals of the DCCP:

  • Increase the visibility of institutional biomedical research data
  • Assess the reuse of biomedical research data to inform how to devote resources for data curation, storage and sharing use cases of high-value data
  • Align DCCP efforts with emerging national data discovery initiatives from the NIH and others
  • Learn from users of the DCCP local data catalogs by investigating use cases and data sharing workflows
  • Inform institutional data sharing policies to improve data sharing workflows and reduce the burden on the biomedical research community
To learn more about our accomplishments, our publications, and how to join, please visit the DCCP website.