Skip to Main Content
The HSHSL is a part of the University of Maryland, Baltimore | My UMB The Elm UM Shuttle Blackboard
Library Logo

601 West Lombard Street
Baltimore MD 21201-1512

Reference: 410-706-7996
Circulation: 410-706-7928

Open Access Week Challenge 2022: Day 4 - Data Sharing

Welcome to Day 4!

Welcome to Day 4 of the HSHSL Open Access Week Challenge! Today we'll explore the benefits of open data and data sharing.

What is Open Data?

A definition of open data from SPARC, a non-profit advocacy organization for Open Content is as follows:

Open Data is research data that:

  1. Is freely available on the internet;
  2. Permits any user to download, copy, analyze, re-process, pass to software or use for any other purpose; and
  3. Is without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself.

Open Data typically applies to a range of non-textual materials, including datasets, statistics, transcripts, survey results, and the metadata associated with these objects. The data is, in essence, the factual information that is necessary to replicate and verify research results. Open Data policies usually encompass the notion that machine extraction, manipulation, and meta-analysis of data should be permissible.1


1.  SPARC, licensed under CC-BY 4.0 and available at  https://sparcopen.org/open-data/

Day 4 Challenge Option 1: Share Data

Complete this challenge if you have a dataset you want to share or have shared already!  These steps have been adapted from the Open Knowledge Foundation's guide How to Open Data1.

  1. Choose a dataset(s) that you own to complete the tasks for this challenge.
  2. Apply an open license to that dataset.
  3. Make the data available by depositing a machine-readable version in an open repository.
  4. Make the data more discoverable by submitting it to the UMB Data Catalog.

Keep reading below for more details on each step.


1.  Open Knowledge Foundation, licensed under CC-BY 4.0 and available at  https://okfn.org/opendata/how-to-open-data/

Steps to Open Data

Step 1: Choose a Dataset

Have you ever submitted a dataset along with a journal publication, or do you plan to do so soon? Some journals like PLOS and Nature require you to make the data underlying your study available.

  1. If you have submitted data to an open repository, skip to Step 4.
  2. If you have submitted data as supplementary material with a publication, skip to Step 3.
  3. Otherwise, choose a dataset that you own and that does not contain any personally identifiable information (PII) or protected health information (PHI). Not sure what counts as an identifier? Check out this guide from University of Virginia. Once you know the dataset you want to work with, move on to Step 2

Step 2: Apply an Open License

Legal openness is one of the components of Open Data. Applying a license makes the terms of data re-use and distribution clear.

Creative Commons is a nonprofit organization that has created a set of free and easy-to-use prefabricated licenses that you can apply to your work. These licenses allow you as an author to retain all of your rights while allowing specific types of reuse by others without them having to seek your permission first. 

Image result for creative commons

There are six types of CC license plus a public domain dedication available.

To make your data open, share it with CC0 or CC BY designation.

CC0 (aka CC Zero) is a public dedication tool, which allows creators to give up their copyright and put their works into the worldwide public domain. CC0 allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, with no conditions.

CC 0 icon

CC BY This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.

CC BY icon

 

Read more about open licenses.


Step 3: Make the Data Available

To make your data technically open it should be freely downloadable from the internet and available in an open (non-proprietary), machine readable format.

What is a Machine-Readable Data?

Machine-readable data is structured data that can be parsed by a computer. If you've shared your data before as a table in a PDF or Word document, keep in mind that these are human-readable, but not machine-readable formats. Instead consider formats like CSV, JSON, and XML.

Read more about machine-readable formats in the Open Data Handbook.

Data Description

Data that isn't understandable is not very usable. Be sure to included proper documentation like a data dictionary and readme file with your dataset.

Read more about describing data.

Open Repositories

Share your data in an established open data repository. This is a good way to ensure long-lasting access to your data. It will also make your data more easily findable! (See more on this in Step 4).

For help deciding on the best repository for you, check out the following resources, and the list on the right-hand side of this page.


Step 4: Make the Data More Discoverable

As mentioned in Step 3, one goal of making your data open, is increasing the number of people who are able to use it. Before anyone can use the data, people need to be able to find it. Now that you've reached Step 4, your data is already in an established repository where it has been assigned a persistent identifier and is indexed with excellent metadata. 

Now take things a step further and submit this form to include your dataset in the UMB Data Catalog, a finding tool for data produced or collected by UMB researchers. Watch the video below to learn more about the UMB Data Catalog.

To learn more about data discoverability, check out these Ten Simple Rules for Improving Research Data Discoverability

Why Open Data?

Potential Benefits of Open Data:

  • Improved transparency, rigor, and reproducibility
  • Increased collaboration opportunities
  • Accelerated innovation and discovery
  • Efficiency
  • Increased research impact and visibility

Read more about these benefits from PLOS and SPARC.

Open Data Mandates

Increasingly federal funding agencies are moving toward requiring data sharing to increase equitable public access to results of research funded by tax-payer dollars. Two recent policy shifts to be aware of:

  • NIH Data Management and Sharing Policy: going into effect January 25, 2023, requires researchers requesting NIH funding for projects that will produce scientific data to submit a Data Management and Sharing Plan with their grant applications.
  • Nelson Memo: released August 2022 by the White House Office of Science and Technology Policy, directs federal departments and agencies to update their public access policies to make publications and research funded by taxpayers publicly accessible, without an embargo or cost. Agencies are expected to implement these new policies no later than December 31, 2025.

Day 4 Challenge Option 2: Find Data

Complete this challenge if you don't have any data to share.

  1. Explore open repositories and see if you can find some data related to your own research interests!
  2. Create a citation for the data.

Keep reading below for more details.

Step 1: Explore Open Repositories

This list contains just a few of the many open data repositories out there. Try searching in a couple for a dataset of interest to you!

Government Data

Generalist Repositories

Domain-Specific Repositories

Step 2: Cite Data

Give credit where it's due! Just as it is good scholarly practice to cite publications you use in your research, so to should you get in the habit of citing any data you use. Check out the following guides and tools on citing data.

Questions about Open Data?

For questions about this challenge, reach out to the HSHSL Center for Data and Bioinformation Services at data@hshsl.umaryland.edu, or for more support with research data questions, use the button below to request a consult.