Skip to main content

Steps to Making Your Research Data FAIR

The FAIRification process ensures that research outputs adhere to the FAIR principles.

FAIR and the importance of metadata

Data is not useful or usable without metadata, that is additional information that describes data. For example, a dataset generated during an experiment is useless unless it is accompanied by other data that describes the conditions of the experiment, and the materials that were used or examined as part of the experiment. It is also important that the metadata is can be understood by others who might use that data. For example, it is important that the terms used to describe certain aspects of the data, such as chemical names, units, techniques, and instruments, are understood by others who might use that data. Ideally these terms and their values are standardised, so that there is a common vocabulary or representation. Machine-readability of the data and the metadata is also important, so that it can be indexed and made discoverable, and also so that it is more likely it will be compatible with software commonly used in the community. There are different kinds of metadata that are useful for different purposes, and a variety of standards that are applicable to physical sciences research data.

For more information about metadata in the physical sciences, see Metadata in the Physical Sciences.

FAIRification Goals

FAIRification has a cost and it is much easier, and less costly in terms of time and resource, to plan how you will ensure that your research data is FAIR from the beginning of the project. There are also circumstances where you may want to make existing research data adhere to the FAIR principles, for example if they are data that are valuable to the community, could be used for machine-learning, or they have the potential to lead to new discoveries in cross-disciplinary research. Assessing the maturity of your dataset against the requirements of the FAIR principles can help to identify the goals that you want to reach with FAIR, and the steps that need to be taken to improve your practice.

Assessing the FAIRness of your existing data

There are a variety of ways of thinking about FAIR data, but the FAIR Data Maturation model suggests breaking it down into three distinct categories:

  1. Content-related: what information is reported in the research data and the metadata. For instance, a melting point dataset for an organic compound could be deposited with a metadata record giving information about the conditions of collection.
  2. Representation and format: how the metadata and data are formatted. For example, melting point data may be held within a .CSV file whilst the metadata description of the data may be in a .JSON format.
  3. Hosting environment capabilities: how does the environment that stores your data and metadata adhere to FAIR principles? For example, if you use a repository to host your data and metadata, does it provide persistent identifiers and does it support indexing of machine-readable metadata so that the data is easier to discover?

You can use the FAIRplus-DSM Maturity levels to assess how FAIR your research data currently is, and then to determine which level you would like to get to with a particular dataset, research project, or potentially as an organisation as a whole. For example, at the bottom end of the FAIRplus Data Maturity scale, Level 0 describes data that is missing one or more of the fundamental FAIR requirements. In the physical sciences this might be a standalone data set from a PhD project that is stored in a non-accessible environment such as on a personal computer and where the metadata that describes the data in available only in a non-machine readable formats such as a Word document or PDF. It can only be used and understood by the researcher who created the data. The data is not Findable, Accessible, Interoperable, nor Reusable. At the other end of the scale, Level 5 describes the maturity of a data-driven enterprise where multiple researchers are supported and data governance and management are central. The levels in between represent different roles in the data management space, from individual researchers to organisations such as funding bodies, institutions, and research data centres. For each level the appropriate FAIR goals are described.

More in depth information about the indicators for FAIR data maturity can be found in the FAIRplus-DSM indicators definitions.

Include FAIR in your data management planning

The best way to ensure that your data is FAIR is to include the requirements of the FAIR principles in your research data management planning. Many Data Management Plans ask questions to get you thinking about the different requirements that make your data reusable and include aspects of making your data interoperable. For more information about research data management and data management plans, see Research Data Management.

Implement FAIR when you share or publish your research data

If you are part way through a project, then some effort is involved in making your data FAIR. However, a good time to think about how you can make your data more FAIR is when you come to prepare data to be shared or published. At the point you construct your data package you should consider how you can make the data more usable and FAIR, for example:

  • Ensure you share your data in a repository that supports persistent identifiers
  • Use InChI & SMILES identifiers for your chemical structures
  • Include RAW data files
  • Convert your data files into standard and machine-readable formats
  • Include metadata and documentation in your data package that explains the structure of the data set, who created the data, how they were created, and the conditions of the experiment
  • Include a license that makes it clear who can access the data and what they can be used for.

Recipes for FAIRification

There are a variety of resources available produced by the ELIXIR project that detail how to address the FAIR principles for research data:

Although these recipes have been created by researchers in the biological sciences, they are also useful to those in other scientific disciplines, including the physical sciences.

What to do next

Related links:


About this page

If you would like to contribute content to the PSDI Knowledge Base or have feedback you would like to give on this guidance, please contact us.