Introduction to Data Management Planning
Good data management planning has a number of benefits both for researchers and for users of the data once it has been published. For researchers, it means that the data and associated resources such as notebooks and samples will more organised and easier to locate. For future users of the data it means they will be more discoverable, easier to understand, and more likely that they can be reused. It is important to consider how you are going to organise, store, share, publish and preserve your data up front before you begin your project. Although it may be a case of better late than never, it is much easier to create consistency in directory structures, file naming, and linking data together if done from the beginning.
Data Management Plans
An effective way to think about planning how you will manage your research data is to create a Data Management Plan (DMP), see Data Management Plans. A range of example DMPs and guidance are provided by the Digital Curation Centre (DCC). Your institution or funder may provide a specific template for you to use to create your DMP or you can use the DMPOnline tool which provides a wizard to help you create a DMP. See Data Management Plans for Physical Scientists for more specific guidance and examples for physical scientists.
You should keep your DMP updated throughout the course of the project to reflect changes to the requirements of the project, or in the nature of the data being collected and the way it is being managed. Ensure that the commitments stated within the DMP are being met and that others who are involved in the project are adhering to the same standards.
Data standards
You should become familiar with the data formats and standards in your area of research. Using open formats and standards that have been adopted by the community means that more tools are will be available to work with the data in the future, whereas proprietary formats reliant on specialist software may change or be withdrawn over time. There are many resources that discuss standard data formats in the physical sciences, for example:
- IUPAC digital standards for chemistry
- DATACC standard formats in chemistry and physics
- Data format standards in analytical chemistry
- Shared metadata for data-centric materials science
Storing your research
Plan what data you will keep during your project, including raw and processed data. Consider how you will store your data as you collect it, and how you will organise and label it so that you will be able to find it later. Ensure that your data and other experimental information is stored safely, for example by storing it on a server or file sharing service that is regularly backed up rather than relying on your hard drive or a USB key. Investigate what tools your organisation may provide and what others in your group use to keep their data safe.
Consider how you be able to link between your data and other documentation and data produced by your experiments. One approach is to create a directory for each experiment and to save different files relating to that experiment in the same folder, for example the experiment plan and record of the experiment, along with images, raw data, and processed data.
Where software, code or scripts have been specifically written for the research this counts part of the research data, and it is important to ensure that they are stored safely, including different versions to document changes that were made over time. Code repositories such as Github, Gitlab, and Subversion provide versioning features that also enable researchers to indicate what changes were made and why. It is important to document which version of the software was used to generate or transform your data.
Electronic Laboratory Notebooks
Consider using an Electronic Laboratory Notebook(ELN) to record your research. An ELN has the advantage that the content is machine-actionable, making it much easier to search and share your research in an appropriate digital format. Many ELNs contain tools for drawing chemical structures or identifying chemical materials and therefore generating machine-actionable chemical structure information such as InChI, InChI Keys and SMILES. If you are producing relatively small amounts of data through your experiments, then it may be possible to store the data directly within an ELN, enabling you to directly link the record of the experiment and the data all in one place. The ELN Finder tool can help you to find an ELN suitable for your project.
Documenting your research data
It is likely that your research data will be transformed throughout the project by techniques used in data processing and analysis, or through conversion from one format to another. Ensure that any new data and information produced does not overwrite earlier data by saving the data using different file names or using a different directory. Using meaningful names and labels for files, data, entries in a notebook and samples can make it much easier to track and link information throughout the project. Make use of a data inventory, as described in What are Research Data. It can be helpful to keep an index of experiments as a record of the research and associated data. ELNs can often generate an index automatically.
Create metadata for your data as you go along and keep it safe, along with your data and other resources. Documenting any transformations that have been made to data enable you to create a provenance trail making it easier for you and others to understand what has happened to the data when you come back to or share it later. For research data that has been analysed or otherwise transformed using software it is important to document the details of this software, for example the version used and any parameters set. Metadata for your research can be shared publicly at the end of the project through a data catalogue, even when the data itself cannot be shared.
Computational environments and code
Most research involves the use of calculations and software for processing, analysing, and visualising the data. For many disciplines, computational techniques are used hand in hand with laboratory techniques, whilst in others the experiments are carried out entirely in-silico. If you make use of code, scripts, workflows, or even formulas in a spreadsheet to generate or analyse your data, consider how you will keep and maintain that code or calculations. Document what the code should be doing and how you have tested it, and use commenting and sensible naming for variables and functions so that others can understand what your code does, how to use it, and how to verify that it works the way you intended. Document prerequisites that another would need to run your code, for example, do you require a particular operating system version or are there specific libraries that it requires? It can be useful to run the code on data where the results are already known to verify it, or to ask another researcher to directly use or tailor your tool for their research to gauge its correctness and usability. Keep different versions of your code, scripts, workflows, and calculations so that you have a record of how they have changed, and so that you can validate results using data produced earlier in the project.
Publishing, sharing, and preserving data
To prepare your data for sharing or publishing you need to select the most useful or appropriate data from everything recorded. Although funders or journals may not require it, sharing raw data along with processed data can be extremely useful to others. You should consider where the data should be shared at the end of the project when the research is published. Research data may also be shared earlier during the project lifecycle, for example for working with collaborators or as an open science project. There are various considerations for selecting an appropriate repository for sharing your research data, see Choosing a data repository.
What to do next
Related links:
- Find out how to get help with research data management
- Learn about best practices for collaborative working in your research
- Get guidance on the FAIR principles and sharing your data
- Learn how to choose the right data repository for your data
-
Creator: Cerys Willoughby, Louise Saul
-
Last modified date: 2025-03-25
If you would like to contribute content to the PSDI Knowledge Base or have feedback you would like to give on this guidance, please contact us.