Skip to main content
601 West Lombard Street
Baltimore MD 21201-1512
What is open data?
A succinct definition of open data from Open Knowledge International is as follows:
“Open data and content can be freely used, modified, and shared by anyone for any purpose”1
The idea is that greater accessibility of data will lead to an improved research landscape that is more transparent, efficient, effective, and engaging. Read the full Open Definition here.
1. Open Knowledge International, licensed under CC-BY 4.0 and available at https://opendefinition.org/.
Strategies and Best Practices for Sharing Data
Data Repository Directories
Data Repositories by Subject
This list from the Open Access Directory allows you to browse data repositories by discipline, including medicine.
re3data.org is a registry of research data repositories. It is not a repository itself, but you can search it or browse by subject to find a data repository that matches your research needs.
Selected List of Repositories
Nature has compiled a list of recommended data repositories, as they require their authors to submit data along with their publications. The list can be browsed by subject area.
Selected Data Repositories
The following list contains some selected data repositories. All repositories are free to submit to unless otherwise indicated.
Biological Magnetic Resonance Data Bank
BMRB is a repository for nuclear magnetic resonance (NMR) imaging data.
The Cancer Imaging Archive (TCIA)
TCIA is an archive of medical images relating to cancer. TCIA provides support for de-identification of the images as part of their submission process.
Coherent X-ray Imaging Data Bank
CXIDB is a repository for coherent x-ray imaging data.
Dryad is a repository for scientific and medical data. Payment is required to submit data but not to search for or use data already in the repository.
Figshare is a general repository that allows you to upload and share any type of data in any file format. You can also search it to find existing materials. Up to 20 GB of storage is included in the free version.
Harvard Dataverse is a general data repository. You do not have to be affiliated with Harvard to create an account and submit data.
ImmPort is a repository for data relating to bioinformatics in immunology.
OpenNeuro is a repository for neuroimaging data.
PhysioBank is a collection of repositories for physiological data including recordings of waveform signals such as ECG.
UMB Data Catalog
UMB Data Catalog
The UMB Data Catalog is a searchable collection of records of data sets from UMB researchers. It does not store any actual data sets, but provides information about them as well as how to get access. Many of the data sets are freely available, but some require permission. You can search existing records or submit details of your own data for inclusion in the catalog.
Data.gov is a catalog of open data from the U.S. government. Like the UMB Data Catalog mentioned above, it describes data sets rather than storing them. If you find a data set you want to use, the Data.gov record will link you to where it is stored or connect you with the person or entity who can give it to you.
data.HRSA.gov is a collection of data and other resources from the Health Resources and Services Administration. You can search and download data sets as well as create visualizations using a map builder tool.
Federal Article and Data Sharing Policies
This resource from SPARC allows you to browse the article and data sharing requirements of various federal agencies.
HHS Guidance on De-Identifying Health Data
This page is the Department of Health and Human Service's guide on de-identifying protected health information (PHI) from data in accordance with HIPAA rules.