Skip to Main Content
The HSHSL is a part of the University of Maryland, Baltimore | My UMB The Elm UM Shuttle Blackboard
Library Logo

601 West Lombard Street
Baltimore MD 21201-1512

Reference: 410-706-7996
Circulation: 410-706-7928

Finding Data: Home

Finding and using data and datasets from various sources. Browse datasets by topic.

Introduction

Sharing data is important as it allows other researchers to find and reuse data, which, in turn, supports new discoveries and accelerating research.

This guide highlights resources for researchers on finding and using data and datasets. Use the associated tabs to navigate through helpful information on browsing datasets by topic, searching for data, types of health data, evaluating datasets and data sources, and citing data.

Re-Using Existing Data

Data reuse is a broad concept that incorporates many different activities, such as:

  • Returning to one’s own data for later comparisons
  • Acquiring datasets from public or private sources to compare to newly collected data
  • Surveying available datasets as background research for a new project
  • Conducting reanalyses of one or more datasets to address new research questions

These activities vary widely in their implications for scientific practice, for the design of data archives, for public policy, and for data science.

Among the most essential aspects of data reuse is the ability to trust data collected by others. This trust is dependent on the researcher's ability to:

  • Evaluate the quality of data
  • Evaluate the reputations of the archives that host relevant datasets
  • Evaluate the organizations responsible for the data curation process

Reference the Evaluating Datasets and Data Sources section of this guide for more helpful information.

Reference:

"Uses and Reuses of Scientific Data: The Data Creators’ Advantage" by Irene V. Pasquetto, Christine L. Borgman, and Morgan F. Wofford is licensed under CC-BY-4.0

Data vs. Statistics

Data and statistics are often confused and used interchangeably, however it is important to understand the differences when working in research.

Data: Data are the raw, unprocessed numbers that result from original research. These numbers have often not by analyzed or interpreted yet.

  • Use data when you want to conduct an in-depth analysis. 

Statistics: Statistics are the result of an analysis of the the raw data. They are used to summarize or interpret data, often looking for trends or patterns. 

  • Use statistics when you need access to facts that will support your argument or to make a point.

Remember, statistics can be used to make any point and can contain bias. Always do your due diligence when finding and using statistics.

Data Licensing

Data Licensing is an important part of sharing and re-using data. In order to facilitate reuse of research data, re-users need to know the terms of use for the database and the data content.

Creative Commons Licensing

  • Most datasets shared on public repositories are accompanied by Creative Commons licenses. CC licenses allow material to be shared and reused under terms that are flexible and legally sound.
  • Creative Commons offers a six copyright licenses. Because there is no single CC license, it is important to which of the six licenses has been applied to the data that you intend to re-use.
  • Note that CC0 is not a license; it is a public domain dedication. When CC0 is applied to a work, copyright no longer applies to the work in most jurisdictions around the world.

An infographic describing the differences between Creative Commons Licenses.

Data Repositories - Levels of Access

  1. Public access: Public datasets are available to all without restriction. This option is commonly used for animal studies or data without privacy concerns.
  2. Controlled access: In a controlled-access repository, researchers must verify their identity before they are allowed to download and analyze data. This can take the form of verifying a university-associated email address, signing a data-use agreement, or sending in an application before access is granted. Some repositories, such as Vivli, which specializes in clinical trial data, require that sensitive data be analyzed in a controlled cloud-computing environment.
  3. Embargoes: Most repositories allow for datasets to be embargoed. Datasets may be embargoed for a number of reasons. For example, the researchers may not wish to publish their data until the accompanying article is available, or they may be pursuing a patent based on their discoveries.

Looking for data through a magnifying glass.

Data Services Librarian

Profile Photo
Amy Yarnell
she/her
Contact:
ayarnell@hshsl.umaryland.edu

Acknowledgements

This guide was originally created by Ashley Zeidler with guidance from Amy Yarnell as part of the Data Services Continuing Professional Education program.