Creating a Data Package to Share Your Research Data
Creating a comprehensive and well-organized data package for your physical sciences research data is essential for ensuring that others can understand, use, and build upon your work. This section describes some key elements to include in your data package.
What should be included in a data package
Use standard formats
Where possible use standard data formats that are widely accessible and compatible with commonly used software tools. For data files these should ideally be accepted standard file formats used within the physical sciences community. For documentation, it is better to use machine-readable formats such as .TXT and .XML rather than PDF or Word documents.
Data Collection and Organization:
- Raw data: Your research data is of most use to others if you include all your raw data files in their original format.
- Processed data: Provide any processed or analyzed data, along with a clear description of how it was derived from the raw data.
- Include code that was used to generate the data, for example models and simulations, and scripts for data processing
- Use a consistent file naming convention: ensure that users can understand how to navigate your file structure and that the file names reflect the content and purpose of each file.
Documentation and metadata
It is important to include documentation with your data, so that users can understand your data and how they were created.
- Include provenance information: Who created the data, when, and the details of the research project
- README file: Create a README file that provides an overview of the data package, including the purpose of the research, the data collection methods, and any relevant background information. See Creating a README File for more detailed information.
- Data dictionary: Include a data dictionary that defines all variables, units of measurement, and any codes used in the dataset. See How to Make a Data Dictionary for guidance on how to create a data dictionary for your research data.
- Metadata for datasets: Provide metadata for each dataset, including information on the data source, collection dates, and any relevant protocols or standards used.
- Use of InChIs and InChiKeys to describe the chemical entities in your research, see Using InChI and InChIKey to annotate your chemical entities for guidance on how to generate these for your research if the software you use does not do this for you.
- Data quality: Document any data quality checks or validation procedures that were performed to ensure the accuracy of the data and detail any known issues or errors in the data and how they were addressed.
- Where data and methods were reused in your research, include appropriate citations.
Licensing and Permissions
It is important to share licence information with your data package, so that other researchers know for what purposes and under what conditions they can use your data:
- License information: Clearly state the license under which the data is being shared, such as Creative Commons or Open Data Commons licenses.
- Usage permissions: Specify any permissions or restrictions on the use, redistribution, or modification of the data.
- Provide contact information for the owner of the data in case users have questions or need further assistance
For more information about licences and creating a licence for your data, see Licences for Sharing Your Research Data.
Research Objects
A Research Object is a framework used to systematically identify, group together, and share scholarly resources on the web. Its main objective is to link related materials from a scientific study so they can be easily shared using a unique identifier. This method enhances the reproducibility of scientific research by facilitating the sharing of key research artifacts such as datasets, software, and related documentation. Research Objects follow principles of identity (assigning unique identifiers), aggregation (collecting related resources), and annotation (adding metadata to describe the resources).
Research objects can also be used throughout the lifecycle, and not just at the point of publication. See Research Object to capture the Research Lifecycle for an easy to read introduction to Research objects and how they fit into the research lifecycle.
RO-Crate
An RO-Crate is a Research Object (or RO) formed of a collection of data (a crate) and a special ro-crate-metadata.json
file, also known as the RO-Crate Metadata Document, which describes the collection. he collection may contain any kind of research data - papers, data files, software, references to other research, and so on. It may be a folder full of files, an abstract grouping of connected references, or a combination of both. The RO-Crate Metadata Document is a plain text file in JSON-LD format that is both human and machine-readable and contains all the metadata required to describe the collection.
Using an RO-Crate Metadata Document in your data package helps to properly document your research and make it usable for both humans and machines.
See the following links for more information about RO-Crates:
To view an example of research data packaged as an RO-Crate, see A multi-omics data analysis workflow packaged as a FAIR Digital Object.
Reproducible XAFS Analyses
PSDI has worked with the Reproducible XAFS Analyses Zenodo community to create RO-Crates to for reproducible research in Catalysis and also a series of tutorials on using RO-Crates in Galaxy. See Reproducible XAFS Analyses for more information.
ELN file format
The ELN file format has been defined as an archive format to capture Electronic Laboratory Notebooks (ELN). The ELN specification is based on the RO-Crate specification and is exported by a number of different ELN providers so that the contents of the notebook associated with a research study can be shared.
For more information, see Electronic Lab Notebook on the RO-Crate website.
What to do next
- Learn how to create a README file
- Get guidance on choosing a suitable repository for your research data
Related links:
- Prepare your chemical data for publication
- Discover how to improve the FAIRness of chemical structure information
- Read more about Research Objects with the RO Primer
- Learn how to create a Research Object Bundle with the Research Object tutorials
- See an example of how to create a RO-Crate Metadata Document
- Creator: Cerys Willoughby, Louise Saul
- Last modified date: 2025-02-06
If you would like to contribute content to the PSDI Knowledge Base or have feedback you would like to give on this guidance, please contact us.