Skip to main content

Licences for Sharing Your Research Data

What is a licence?

A licence in the context of scientific research data is essentially a legal document that specifies how the data can be used, shared, and distributed. It outlines the permissions, restrictions, and obligations for both the data provider and the data user. Here are some key aspects typically covered by such licences:

  • Access and use: Who can access and use the data, and under what conditions.
  • Redistribution: Whether the data can be shared with third parties, and if so, how.
  • Modification: Whether the data can be modified or combined with other datasets.
  • Attribution: How credit should be given to the original data creators.
  • Commercial use: Whether the data can be used for commercial purposes.
  • Educational use: Whether the data can be used for educational purposes.
  • Compliance: Requirements for compliance with legal and ethical standards.

A number of standard licences exist which allow broad use and sharing, including the Creative Commons licences and licences commonly use for code and software, such as the MIT License and the GNU General Public License (GPL). There are also standard open licences designed for use with databases from Open Data Commons. An alternative where open licences cannot be used is to give your research data bespoke licences with specific restrictions which limit access and use of the data to more specific groups and purposes.

Licences also specify what attributions are required when the data, code or other work is reused. It is essential that a clear link exists between the licence and the data to which it applies.

Why use a licence for your research data?

Sharing your research data does not automatically give others the right to use your data. A licence is required to tell others how they can get access to the data, how the data may be reused, for what purposes, and how to properly attribute or cite the work. This ensures that the data is not used for purposes that you do not intend and that you get appropriate credit for the research data and related outputs such as protocols and code that you create.

Although a repository or database may provide the capability to set a licence when you upload the data, it is a good idea to include a separate licence file in your data package, so that the licence is still available when the data has been downloaded. You can also tailor the licence to your requirements more easily if you have specific restrictions on use or requirements for the attributions. A licence is also required to enable the use of data by machines, for example for automatic retrieval or processing. Providing a licence in machine-readable form helps to ensure that more of your data is reused, and that it is only used for the purposes you intended. Including a licence in machine-readable format as well as a human-readable format is a good practice to follow.

Selecting a licence

It is important to check whether your funders, publishers, institution, or collaborators mandate or recommend a specific licence, or have prepared one that you should use.

Standard licences, such as the Creative Commons licences have a variety of advantages for use with research data:

  • Clarity: Standard licences are well-documented and clearly outline the terms of use, making it easier for researchers to understand their rights and responsibilities.
  • Simplicity: They provide a straightforward way to share data without the need for creating complex, custom licences.
  • Recognition: These licences are widely recognized and understood across the research community, facilitating smoother collaboration and data sharing.
  • Compliance: Many funding agencies and journals require or recommend the use of standard licences, ensuring compliance with their policies.
  • Consistency: Using common licences promotes consistency in data sharing practices, which helps to avoid confusion and disputes over data usage.

There are some tools to help you choose a standard licence, see License selector and Open License selector.

Although standard licences are easy to use and understand, it is not always possible to use them, and instead a custom licence is required. A bespoke licence is often needed where the data have significant commercial value or where the data contains other sensitive information that requires access conditions, and restrictions or responsibilities of reuse to be carefully described. If you require a custom licence, you should seek ask your institution's research office or legal department for further guidance.

Structure of a human-readable licence file

The following text is an example of what information might be contained in a human-readable licence file. Bespoke licence files will likely contain more information about the restrictions, and including permissions, responsibilities, and disclaimers.

    Dataset title:			[Descriptive title of your research]
Authors: [Name, Affiliation, ORCID]
Publication date: [Date]
Dataset description: [Description of what the dataset contains]
Citation: [Author, A., & Author, J. (2025). Title of the research or dataset [Data set]. Name of Institute. https://doi.org/10.xxxx/yyyy]
License: [If the work is shared under a standard licence, name the licence and provide a link. Include any conditions of use including whether attribution is required. For example:
This work is licenced under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/). You are free to share, adapt, and use this data as long as you give appropriate credit, provide a link to the licence, and indicate if changes were made.]
Contact: [A Author at a.author@research.ac.uk]

Such a licence may be created as a standalone file which can be packaged with a dataset that is shared, deposited in a repository, or otherwise published. This license information could also be embedded in certain kinds of data file, for example they could be added as a header to a CSV data file.

Structure of a machine-readable licence

Creating a human-readable license in a machine-readable file type such as a .TXT or .CSV, rather than using a Word document or a PDF file, makes it much more likely that the content of the license can be read by machines. However, machine-readable versions of licenses are typically generated in formats such as XML (eXtensible Markup Language), JSON (JavaScript Object Notation), RDF (Resource Description Framework), and XMP (Extensible Metadata Platform). Using machine-readable formats enables either a separate license file to be included in a package for sharing or publication, or the content can be embedded as metadata in some file types.

Creative Commons provide the Creative Commons Rights Expression Language (ccREL) specification for how licence information can be attached to work and described using RDF and XMP formats. Creative Commons also provides a useful License Chooser tool that generates licence content in text, HTML, and XMP format. For the licence example shown above, the following machine-readable XMP is generated using the tool.

<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
<x:xmpmeta xmlns:x='adobe:ns:meta/'>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:xapRights='http://ns.adobe.com/xap/1.0/rights/'
xmlns:cc='http://creativecommons.org/ns#'xmlns:dc='http://purl.org/dc/elements/1.1/'>
<rdf:Description rdf:about=''>
<xapRights:Marked>True</xapRights:Marked>
<xapRights:Owner>
<rdf:Bag>
<rdf:li>A. Author, A University, ORCID & J. Author, B. University, ORCID</rdf:li>
</rdf:Bag>
</xapRights:Owner>
<xapRights:WebStatement rdf:resource='http://example.com/dataset.doi'/>
<xapRights:UsageTerms>
<rdf:Alt>
<rdf:li xml:lang='x-default'>This work is licensed under &lt;a href=&quot;https://creativecommons.org/licenses/by/4.0/&quot;&gt;Creative Commons Attribution 4.0 International&lt;/a&gt;</rdf:li>
<rdf:li xml:lang='en-US' >This work is licensed under &lt;a href=&quot;https://creativecommons.org/licenses/by/4.0/&quot;&gt;Creative Commons Attribution 4.0 International&lt;/a&gt;</rdf:li>
</rdf:Alt>
</xapRights:UsageTerms>
<cc:license rdf:resource='https://creativecommons.org/licenses/by/4.0/'/>
<cc:attributionName>A. Author, A University, ORCID & J. Author, B. University, ORCID</cc:attributionName>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang='x-default'>Descriptive title of your research</rdf:li>
<rdf:li xml:lang='en-US'>Descriptive title of your research</rdf:li>
</rdf:Alt>
</dc:title>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end='r'?>

Creative commons also supports the use of RDFa to make their licences machine-readable. The XML snippet below is an example of how you can represent the licence information for data in a machine-readable format using RDF (Resource Description Framework). You can customize the dc:title, dc:creator, and dc:date fields to fit your specific research data. This format can be used to embed the licence information within your data files or metadata records.

Your Research Data TitleYour Name/InstitutionDate of License Creation

To find out more about creating a creative commons licence with RDF, see Extend Metadata.

Licences for code and software

Sharing your code and software enables provides a number of benefits, both for you and other researchers. Others can verify and validate that your software works in the way that you intended, and they can also make suggestions for improvements, or even contribute to improving the functionality or maintaining the code. For other researchers, it enables them to validate your results, compare their data using the same processes, and provides a starting point for customising code for their own research.

Choosing a licence for code and software is very similar to choosing a licence for a dataset, but there are some differences to consider:

  • Are the source code and the software being shared?
  • Do you want to allow others to modify the code?
  • Do you want to allow others to share the modified code?
  • Do you want to subject others who make modifications to your code and share it to follow the same license?
  • Do you want others to share how they have modified your code?
  • If you are using code that was created by someone else, with or without modification, you need to ensure that your license is compatible with the original license on the code
  • Are their any intellectual property claims on the code (for example, patent claims on the algorithms)?
  • Do you want to allow others to use the code for commercial purposes?
  • Do you require attribution of your code?
  • Do you need to add any disclaimers or statements about liability for your code?

The University of Cambridge provide guidance on Choosing a software licence and the Choose a license website provides a comparison of different open licences to help you select an appropriate licence for your code.

If you are publicly sharing your code online, for example using GitHub, you should ensure that you choose the appropriate licence for your repository. The licence you select should match the licence that you intend to provide with your code. See [Licensing a repository] (https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository), for more information about licencing in GitHub. It is also possible to link GitHub to Zenodo so that you can create a persistent identifier for your repository, making it easier to get appropriate attribution and citations for your code-based research data. See Referencing and citing content for more information. A citation file can also be added to your repository in GitHub to make it easier for others to cite your work. For more information, see What is a CITATION.cff file.

What to do next

Related links:


About this page

If you would like to contribute content to the PSDI Knowledge Base or have feedback you would like to give on this guidance, please contact us.