Skip to Main Content
The HSHSL is a part of the University of Maryland, Baltimore | My UMB The Elm UM Shuttle Blackboard
Library Logo

601 West Lombard Street
Baltimore MD 21201-1512

Reference: 410-706-7996
Circulation: 410-706-7928

Finding Data: Evaluating Datasets and Data Sources

Finding and using data and datasets from various sources. Browse datasets by topic.

What is Data Quality?

Data Quality

Data quality refers to the ability of a dataset to be used to research specific questions. High quality data is accurate, complete, timely, consistent, accessible, and contains the data needed to answer a specific question or set of questions.

  • Completeness: The extent to which data are of sufficient breadth, depth and scope for the task at hand.
  • Accuracy: The extent to which data are correct, reliable and certified.
  • Timeliness: The extent to which the age of the data is appropriate for the task at hand.
  • Consistency: The extent to which data are presented in the same format and compatible with previous data.
  • Accessibility: The extent to which information is available, or easily and quickly retrievable.

Reference:

R. Y. Wang and D. M. Strong. Beyond accuracy: What data quality means to data consumersJ. Manage. Inf. Syst. 1996;12(4):5-33

Evaluating Data Sources

It is important to evaluate the dataset you are using for your research to ensure data reliability and reproducibility. Incomplete or inconsistent data can lead to misinformation and poor clinical decision making. Work through the checklist below to ensure data integrity.

Data Quality Checklist

1.  Credibility

  • Who created the data?
  • Who published the data?
  • Who contributed to the data?
  • Is the contact information available?

4.  Timeliness

  • Are the data outdated?
  • When are the data captured and updated?
  • Is version control implemented to track revisions of the dataset?

2.  Completeness

  • Are there missing elements in the data records?
  • Are there missing data records?
  • Is all information captured for it's intended uses?
  • Are the data described to be findable and reusable?

5.  Consistency

  • Are the data formats consistent?
  • Are the data unit measurements consistent?
  • Are the types of data consistent?
  • Are the data synched within and across platforms?

3.  Accuracy

  • Are there data typos?
  • Are the data formats correct?
  • Are the data outliers which may not be recorded accurately?
  • Do the data represent the information we intend to capture?

6.  Accessibility

  • Are the data available in a repository?
  • Are the data license clearly?
  • Are there any barriers to accessing the data?
  • Are the data machine-readable?

Data Quality Checklist Graphic. (All content is reflected in the table above).

“Data Quality Checklist” adapted from: “Data Quality Checking Guide.” Wei Zakharov. CC-BY 4.0