601 West Lombard Street
Baltimore MD 21201-1512
Reference: 410-706-7996
Circulation: 410-706-7928
Welcome to Day 4 of the HSHSL Open Access Week Challenge! Today we'll explore the benefits of open data and data sharing.
A definition of open data from SPARC, a non-profit advocacy organization for Open Content is as follows:
Open Data is research data that:
- Is freely available on the internet;
- Permits any user to download, copy, analyze, re-process, pass to software or use for any other purpose; and
- Is without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself.
Open Data typically applies to a range of non-textual materials, including datasets, statistics, transcripts, survey results, and the metadata associated with these objects. The data is, in essence, the factual information that is necessary to replicate and verify research results. Open Data policies usually encompass the notion that machine extraction, manipulation, and meta-analysis of data should be permissible.1
1. SPARC, licensed under CC-BY 4.0 and available at https://sparcopen.org/open-data/.
Complete this challenge if you have a dataset you want to share or have shared already! These steps have been adapted from the Open Knowledge Foundation's guide How to Open Data1.
Keep reading below for more details on each step.
1. Open Knowledge Foundation, licensed under CC-BY 4.0 and available at https://okfn.org/opendata/how-to-open-data/.
Have you ever submitted a dataset along with a journal publication, or do you plan to do so soon? Some journals like PLOS and Nature require you to make the data underlying your study available.
Legal openness is one of the components of Open Data. Applying a license makes the terms of data re-use and distribution clear.
Creative Commons is a nonprofit organization that has created a set of free and easy-to-use prefabricated licenses that you can apply to your work. These licenses allow you as an author to retain all of your rights while allowing specific types of reuse by others without them having to seek your permission first.
There are six types of CC license plus a public domain dedication available.
To make your data open, share it with CC0 or CC BY designation.
CC0 (aka CC Zero) is a public dedication tool, which allows creators to give up their copyright and put their works into the worldwide public domain. CC0 allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, with no conditions.
CC BY This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
Read more about open licenses.
To make your data technically open it should be freely downloadable from the internet and available in an open (non-proprietary), machine readable format.
What is a Machine-Readable Data?
Machine-readable data is structured data that can be parsed by a computer. If you've shared your data before as a table in a PDF or Word document, keep in mind that these are human-readable, but not machine-readable formats. Instead consider formats like CSV, JSON, and XML.
Read more about machine-readable formats in the Open Data Handbook.
Data Description
Data that isn't understandable is not very usable. Be sure to included proper documentation like a data dictionary and readme file with your dataset.
Read more about describing data.
Open Repositories
Share your data in an established open data repository. This is a good way to ensure long-lasting access to your data. It will also make your data more easily findable! (See more on this in Step 4).
For help deciding on the best repository for you, check out the following resources, and the list on the right-hand side of this page.
As mentioned in Step 3, one goal of making your data open, is increasing the number of people who are able to use it. Before anyone can use the data, people need to be able to find it. Now that you've reached Step 4, your data is already in an established repository where it has been assigned a persistent identifier and is indexed with excellent metadata.
Now take things a step further and submit this form to include your dataset in the UMB Data Catalog, a finding tool for data produced or collected by UMB researchers. Watch the video below to learn more about the UMB Data Catalog.
To learn more about data discoverability, check out these Ten Simple Rules for Improving Research Data Discoverability
Potential Benefits of Open Data:
Read more about these benefits from PLOS and SPARC.
Open Data Mandates
Increasingly federal funding agencies are moving toward requiring data sharing to increase equitable public access to results of research funded by tax-payer dollars. Two recent policy shifts to be aware of:
Complete this challenge if you don't have any data to share.
Keep reading below for more details.
This list contains just a few of the many open data repositories out there. Try searching in a couple for a dataset of interest to you!
Give credit where it's due! Just as it is good scholarly practice to cite publications you use in your research, so to should you get in the habit of citing any data you use. Check out the following guides and tools on citing data.
For questions about this challenge, reach out to the HSHSL Center for Data and Bioinformation Services at data@hshsl.umaryland.edu, or for more support with research data questions, use the button below to request a consult.