General Guidelines for PSDI Metadata
Summary
These are guidelines about that are applicable to all metadata files that describe PSDI - published within https://metadata.psdi.ac.uk/. There is more specific guidance for different files within it e.g.:
Background reading
- JSON-LD Best Practices
- Where possible we are following WorldFAIR's Cross-Domain Interoperability Framework (CDIF) as far as possible which in the first instance involves defining the following profiles:
- Core discovery - resource catalogue
- Core controlled vocabulary - vocabulary
Guidelines
General notes
- We have not included explicit definition of node type of
["@type":"xsd:string"]
since this is the default and we are trying to keep the JSON-LD as simple as possible.
PIDs (Persistent Identifiers)
We use the following pattern for PIDs:
[IRI = [http : // ] + { metadata.psdi.ac.uk/ } + { profile/ } + [type] + [ # | / ] + { id } ]
e.g. http://metadata.psdi.ac.uk/psdi-dcat/data/07d83ee6-bad9-4c23-abea-ac70745b0803
or http://metadata.psdi.ac.uk/psdi-voc#psdiPlatform
The following apply to all PSDI pids:
- PIDs are unique
- PIDs are cool: Cool URIs for the Semantic Web. Note that while we have followed other guidance in here, we have not yet implemented the recommendations:
- to deliver RDF to semantic web applications but return HTML to web browsers (currently raw .jsonld is returned for both)
- for the first version of PSDI individual terms will not be resolvable
The following vary between different PSDI profiles:
- psdi-voc.jsonld implements hash URI's (see https://www.w3.org/TR/cooluris/) because it is a small dataset with non-resolvable terms which are interelated to each other. In contrast, psdi-dcat.jsonld implements 303 URIs because this seems to be more common in data catalogues, and this metadata is a larger dataset
- the form of the pids varies within each profile so please see specific guidelines for identifier naming conventions
- where possible we are using human-readable term ids (e.g. for vocabulary http://metadata.psdi.ac.uk/psdi-voc#psdiPlatform) rather than opaque term ids to make the relations between the terms more understandable. However, this is not extensible for more complex metadata which requires a more extensible term-definition convention for the future e.g. the PSDI resource catalogue is based on guids. The following additional care is taken when chossing identifiers that are human-readable:
- term ids are unique within their namespace
- term ids are based on their names - shortened if possible while retaining meaning
- term ids are camelCase e.g. communityActivity
- care is taken when choosing these ids initially because it might be appropriate to change them if titles and labels change in the future (although
[dcterms:isReplacedBy]
and[dcterms:replaces]
could be used in these cases)
- where human-readable term ids aren't possible, opaque terms ids are in general randomly generated GUIDs (using UUIDv4).
Person, Organization and Agent
Across PSDI metadata we need to identify people and organisations associated with its resources in a consistent way by a permanent id [@id]
which idenitifies them and links to more information about them.The Friend of a Friend (FOAF) specification allows us to do that since it provides a way to describe a foaf:Person, foaf:Organization, or collectively these can both be referred to as the more general class foaf:Agent. Where possible, the more specific forms, [foaf:Person]
or [foaf:Organization]
should be used in a profile, but in some cases it is necessary to use the more generalised form [foaf:Agent]
. For example in our resource catalogue DCAT profile psdi-dcat.jsonld we use the SHACL shape DCAT-AP 3.0 (from DCAT-AP/releases/2.1.1/) for validation, which constrains the [dcterms:creator]
and [dcterms:publisher]
terms to the @type [foaf:Agent]
. We use the following form to capture these: ["dcterms:creator": {"@id": "http://orcid.org/0000-0003-2397-1996", "@type": "foaf:Person", "foaf:name":"Aileen Day", "foaf:openid": "https://orcid.org/0000-0003-2397-1996"},]
where:
[foaf:name]
is simply a human readable label for display purposes[@id]
is used to match an individual throughout the PSDI metadata and[foaf:openid]
has the same value- For people,
[@id]
should be taken as:- ORCID identifiers by preference if at all possible e.g.
[http://orcid.org/0000-0003-2397-1996]
but if not available... - linkedin url e.g.
[http://www.linkedin.com/in/aileen-day-60a62912/]
but if not available... - if neither of these options is available use insitutional website about this person at their place of work (but be aware that this might change)
- email addresses should not be used because of privacy issues
- ORCID identifiers by preference if at all possible e.g.
- For organizations,
[@id]
should be taken as:- ROR by preference (e.g.
[https://ror.org/0439y7842]
) but if not available... - use insitutional website (but be aware that this might change)
- ROR by preference (e.g.
- Note that one organisation which is used a lot in PSDI metadata is PSDI. ROR doesn't include PSDI currently, so this should take the form e.g.
["dcterms:publisher": {"@id": "http://www.psdi.ac.uk", "@type":"foaf:Organization", "foaf:name":"Physical Sciences Data Infrastructure", "foaf:openid": "http://www.psdi.ac.uk" },]
Publisher
The dcterms definition of publisher is "An entity responsible for making the resource available". Publisher can either be referred to as a "foaf:Agent" See "Person, Organization and Agent" guidance above for details of how PSDI should be captured. PSDI is included as a publisher for all resources and metadata developed by PSDI and its Pathfinders. However, PSDI also describes resources from other publishers e.g. those accessed via cross-search for example.
Subjects and themes
We are currently using EuroSciVoc as the framework for capturing themes and subjects. We do not want to duplicate information held there, so suggest just specifying an ID in that accompanied by a human-readable @label at a minimum e.g.:
["dc11:subject": [ {"@id":"http://data.europa.eu/8mn/euroscivoc/ff3c21f8-d2ca-4a8e-ad5a-a23190ab8557", "@type": "skosxl:Label", "rdfs:label":"physical sciences"}]]
Version numbers
We use semantic versioning
Dates
Note that datetimes are being used in PSDI rather than dates. For datetimes we use e.g. dates according to ISO8601 "dcterms:created": { "@type": "xsd:dateTime", "@value": "2020-09-27T00:00:00Z"}
Language
Currently the language for all PSDI metadata is set as English via the context ["@language": "en"]
. Where explictly stated (e.g. DCAT requires ["dcterms:language")]
) this is expressed as: ["dcterms:language":{"@id":"http://id.loc.gov/vocabulary/iso639-1/en", "rdfs:label":"en"},]
Licenses
For licenses in the PSDI metadata, we link to SPDX License List to identify the licenses, and give a human-readable label e.g. ["dcterms:license": {"@id": "https://spdx.org/licenses/CC0-1.0.html", "@type": "dcterms:LicenseDocument", "rdfs:label": "Creative Commons Zero v1.0 Universal"},]
. Please see the PSDI Knowledgebase article Licences for Sharing Your Research Data for information about more information about licenses.
For the PSDI metadata as a whole the license applied is Creative Commons Zero v1.0 Universal.
- Creator: Aileen Day
- Last modified date: 2025-04-09