Choosing a Repository for Your Research Data
During your research you are likely to need to deposit your research data in a repository or database of some kind. This may be because you need a place to store or archive the data, or because you want to share it with collaborators or the wider scientific community when you publish the research. Depositing data in repositories is important for adherence to the FAIR Data Principles because:
- Findable: Repositories ensure data is discoverable by assigning unique identifiers (e.g., DOIs) and providing rich metadata that is indexed in searchable resources.
- Accessible: They make data and metadata available through standardized protocols, often with clear access conditions, ensuring both humans and machines can retrieve the data.
- Interoperable: Repositories use standardized formats, vocabularies, and metadata to enable data integration and compatibility with other datasets and tools.
- Reusable: They provide detailed documentation, provenance information, and clear licensing to facilitate data reuse and replication in future research.
Repositories often provide ways to restrict access on the research data that has been deposited and many also provide capabilities to embargo data, so using a repository does not automatically mean that the data is made open.
Types of repositories
Repositories for scientific research data can be categorized into several types based on their focus and scope:
- Discipline-specific repositories
- Generalist or generic repositories that accept data from any discipline (e.g., Zenodo, Figshare)
- Institutional repositories, managed by universities or research institutions
- Government repositories, usually for publicly funded research data (e.g., UK Data Service).
- Publisher-linked repositories for academic journals to store data linked to published articles
Considerations for choosing a repository
In certain situations, such as those dictated by your project, funder, institution, or publication, the choice of repository may be predetermined. However, when you do have the opportunity to choose, it is important to select the repository that best suits your data for several important reasons, including:
- Ensuring the long-term access and preservation of your data
- Maximising discoverability enabling others to find, reuse, and cite your data
- Complying with funder and publisher requirements and standards
- Credibility and transparency for your research
- Improving reusability by supporting the FAIR principles
- Appropriate accessibility through access control for data that cannot be publicly shared
- Discipline specific for domain specific standards or aggregation with related data
- Cost and sustainability considerations
- Technical functionality, for example the support of large datasets, version control, and tools for working with the data, back-ups etc.
- Licensing and legal compliance considerations
TRUST Principles
The TRUST principles for digital repositories are a framework designed to guide the selection of trustworthy digital repositories for research data:
- Transparency: Repositories should clearly communicate their policies, procedures, and governance to build trust with users. These should include information about data deposition, data preservation, discovery, terms of use and whether additional functions are provided such as capabilities for managing sensitive data.
- Responsibility: They must demonstrate accountability in managing and preserving data including adherence to appropriate standards, provision of data services, and managing and protecting the data.
- User Focus: Repositories should prioritize the needs of their user communities, ensuring accessibility and usability. Providing discoverability for others requires that repositories encourage users to fully describe their data at the time of deposition and enforcing community standards.
- Sustainability: Long-term preservation and uninterrupted access to data should be supported through reliable funding, governance, and infrastructure.
- Technology: Robust and secure technological systems should be in place to maintain data integrity and accessibility, and prevent potential threats.
Some certification programs exist for repositories to demonstrate that they meet standards of trustworthiness and reliability. Some widely recognized certifications include:
- CoreTrustSeal (CTS): An international, community-based certification that evaluates repositories based on criteria like data integrity, accessibility, and long-term preservation.
- ISO 16363: An international standard for auditing and certifying trustworthy digital repositories.
- Nestor Seal for Trustworthy Digital Archives: based on a German standard (DIN 31644).
Choosing a discipline specific repository
Where possible, the majority of funders and publishers will recommend that you deposit your data in a discipline specific, community recognised repository. You should check the advice given by your funder, publisher, or institutional librarian if there is specific guidance for your discipline or data type. Ensure that the selected repository meets your requirements for the FAIR principles, for example by providing a DOI for your data and provides access to the data in a standard format.
Search by subject using the list below to view a list of available repositories for that subject or data type:
- PSDI-affiliated
- Computational simulations
- Environmental
- Images
- Life sciences
- Materials
- Omics and Sequence data
- Spectroscopy
- Software, code and models
- Structural
- Supramolecular
PSDI-affiliated
| Repository | Data Type | Description |
|---|---|---|
| Biomolecular Simulations Database (BioSimDB) | Datasets in any file format up to 100GB | The biomolecular simulation database (BioSimDB) is a free repository of trajectory files produced from molecular dynamics (MD) simulations of biomolecules. |
| Collaborative Computational Project for NMR Crystallography (CCP-NC) Magres Database | Magres format | The Magres database is a repository of first-principle computational results for solid-state NMR Crystallography, stored in the Magres (.magres) format. This resource provides a central platform for researchers to share, explore, and utilise data associated with calculations of NMR parameters in solid-state structures. |
| Data to Knowledge | Data collection in any file format up to 100GB | The Data to Knowledge Community Collection is a repository to store data to be used or generate by machine learning models in modelling materials and molecular systems. This currently includes simulations data, training data and models themselves. |
Computational simulations
| Repository | Data Type | Description |
|---|---|---|
| ioChem-BD | Computational chemistry files | Data stored as Chemical Markup Language (CML) (XML-CML). Data can be worked on in a private area before publication. |
| Materials Cloud | Computational materials science | Materials Cloud is built to enable the seamless sharing and dissemination of resources in computational materials science, offering educational, research, and archiving tools; simulation software and services; and curated and raw data. You can browse, explore, download, or deposit raw and curated data. |
| NOMAD | Materials simulation data including electronic structure and molecular dynamics. | Upload and manage raw materials science data and search; supports most community codes and file formats. Enables access to search and download materials data in raw and processed forms. |
| Protein Data Bank (PDB) | Coordinate files (PDBx/mmCIF, PDB, XML) | Repository for experimentally-determined 3D structures for large biological molecules. |
Environmental
| Repository | Data Type | Description |
|---|---|---|
| Centre for Environmental Data Analysis (CEDA) Archive | Atmospheric and earth observation research and environmental data. Any format but must be well formatted and be accompanied by appropriate documentation | A Core Trust Seal approved repository of atmospheric and earth observation data from climate models, satellites, aircraft, met observations, and other sources. |
| Environmental Data Initiative (EDI) | EDI publishes data from the ecological and environmental sciences including very large datasets | A Core Trust Seal approved repository helping the scientific community curate and preserve all scales of environmental and ecological data. |
| EarthChem | Geochemical, geochronological, and petrological data in any format but must be adequately documented | EarthChem provides open data services to the geochemical, petrological, mineralogical, and related communities. Services include data preservation, discovery, access, and visualization. EarthChem adheres to the FAIR, TRUST and [CARE] principles. |
| World Data Center for Climate (WDCC) | Earth System Model data, including climate data and models. Only open source data formats are accepted. Network Common Data Format(NetCDF) is preferred, but also accepts GRIdded Binary(GRIB), CSV, ASCII, and Zarr). | WDCC is a Core Trust Seal approved Repository. WDCC is the long-term archiving service in the WDCC primarily for DKRZ HPC (high performance computing) project data but also accepting data from external sources. |