PSDI Resource Catalogue Description
Summary
The PSDI catalogue is part of the metadata for the Physical Sciences Data Infrastructure (PSDI) and it is a resource catalogue that describes resources, data, services, tools and guidance which are made available through it.
The latest released version of it is available at: https://metadata.psdi.ac.uk/psdi-dcat.jsonld.
This document outlines background reading, guidance and guidelines for anyone who wants to use thi smetadata, contribute to it or wants to describe a new resource which will become part of it.
Purpose of PSDI resource catalogue
- it contents are used to populate the PSDI What We Provide resource catalogue
- discoverability and interpretation of PSDI applications and components for machines and indexes
What is included?
Top-level description of PSDI resource catalogue and the resources, data, services, tools and guidance that it describes
Background reading
Nomenclature and overall structure
The output is a Data Catalog Vocabulary (DCAT) - Version 3 vocabulary in JSON-LD format.
This has been implemented as a top-level PSDI resource catalogue with id [psdiCat:resourceCatalogue]
which also contains nested catalogues - each of which corresponds to a "resource theme" as defined by PSDI vocabulary (not to be confused with the DCAT definition of "resource"):
[dcat:Catalog]
with id[psdiCat:resourceCatalogue]
defines the entire PSDI resource catalogue as a whole- PSDI-wide services, tools and guidance which do not belong to a specific resource are defined at this top-level
- a nested, separate
[dcat:Catalog]
representing each resource theme- each of these resource themes can contain data, service(s), tool(s) and/or guidance(s)
Note that:
- the definition for each of these terms in PSDI is defined in machine-readable form in PSDI vocabulary and in human-readable form in PSDI Terminology
- resource theme corresponds to the dcat definition of
[dcat:Catalog]
(not[dcat:Resource]
as you might expect, because this class should not be directly defined in a dcat catalogue according to DCAT for Beginners ). Note that due to the nested nature of PSDI we do have some instances of[dcat:Catalog]
which don't have a[dcat:Dataset]
within them - data corresponds to the dcat definition of
[dcat:Dataset]
- service broadly corresponds to the dcat definition of
[dcat:DataService]
with the slight broadening of its DCAT definition that these are not always associated with a specific dataset, and they sometimes process data, rather than just making it accessible. - in addition to these two sub-classes of the
[dcat:Resource]
class, we have made two other extensions to the[dcat:Resource]
class to capture tools and guidance. The DCAT description of these classes and their properties are described in psdi-dcat-ext.jsonld and their definition in the context of PSDI is in tool and guidance respectively. These also inherit the properties of their parent[dcat:Resource]
class too. - we refer to data, service(s), tool(s) and/or guidance(s) collectively as resources
- properties specified at the resource theme level describe all outputs from this research (data, services, tools and/or guidance)
- properties in the data, service, tools and guidance objects are relevant to these items only (values are inherited from the resource above if blank)
PIDs (Persistent Identifiers)
The following convention has been used for persistent identifiers of items within the PSDI resource catalogue:
- all identifiers are within the
["psdiDcat:"]
namespace where["psdiDcat": "http://metadata.psdi.ac.uk/psdi-dcat/"]
- ids defined in this vocabulary are not in general human readable. When the namespace prefix "psdiDcat" of
[@ids]
within this jsonld are expanded their iris take the form:{http://} + {metadata.psdi.ac.uk/} + {psdi-dcat} + [/] + {type/} + {id}
(e.g.http://metadata.psdi.ac.uk/psdi-dcat/data/07d83ee6-bad9-4c23-abea-ac70745b0803
). See PSDI Metadata General Guidelines > PIDs (Persistent Identifiers) for more general information. - pids in the PSDI catalogue are generally opaque strings that are 36 characters long which are randomly generated GUIDs (using UUIDv4) -using the python uuid class e.g.
[python -c "import uuid; print(str(uuid.uuid4()))"]
or an online UUIDv4 generator - one exception to this are ids for data in the "OPTIMADE Data Providers" which have the @id format
[{psdiDcat:data/optimade-} + provider_id + {-} + dataset_id]
(e.g.psdiDcat:data/optimade-alexandria-alexandria-pbesol
) and their distributions which have the @id format[{psdiDcat:distribution/optimade-} + provider_id + {-} + dataset_id] + {-i}
(e.g.psdiDcat:distribution/optimade-alexandria-alexandria-pbesol-i
)
Fields
-
dcat:resource properties
- dcterms:identifier and
[@id]
: see "PIDs (Persistent Identifiers)" section above - dcterms:title: This is the full title (as opposed to "label" which is intended to be more concise). Its capitalisation should be title-case.
- rdfs:label: This is a shortened version of the title intended as a concise label in PSDI user interfaces.
- dcterms:type: Should correspond to a value in the DCMI terms vocabulary - so in general
- PSDI resource themes will have
[dcterms:type]
http://purl.org/dc/dcmitype/Collection - PSDI data will have
[dcterms:type]
http://purl.org/dc/dcmitype/Dataset - PSDI services will have
[dcterms:type]
http://purl.org/dc/dcmitype/Service - PSDI tools will have
[dcterms:type]
http://purl.org/dc/dcmitype/Software - PSDI guidance will have
[dcterms:type]
http://purl.org/dc/dcmitype/Text but there may be other more appropriate choices in some cases.
- PSDI resource themes will have
- dcterms:description: A description of what this resource is, why someone might want to use it and its key features.
- dcat:keyword: Keywords used to retrieve this resource in a simple text search. Multiple keywords can be expressed as a list of strings.
- dcat:theme: resources within the PSDI catalogue are classified using EuroSciVoc - see PSDI Metadata General Guidelines > Subjects and themes for more details
- dcterms:publisher Should be expressed as a foaf:Agent - see PSDI Metadata General Guidelines > Publisher for more details
- dcterms:creator: There may be more than one creator, specified by a list. Each should be expressed as a foaf:agent - see PSDI Metadata General Guidelines > Person, Organization and Agent for more information
- prov:qualifiedAttribution: Should be expressed as a foaf:Agent - see PSDI Metadata General Guidelines > Person, Organization and Agent. The accompanying dcat:hadRole should also be specified with this property for more information
- prov:qualifiedAttribution dcat:hadRole: Property to be populated if prov:qualifiedAttribution is populated only. Values should be chosen from the list specified at http://standards.iso.org/iso/19115/resources/Codelists/gml/CI_RoleCode.xml
- dcat:landingPage: Note that the landing page might not give direct access to the resource - in some cases access may only be obtained through some Web page where the user needs to follow some links, provide some information and/or check some boxes first.
- dcat:contactPoint: Should be expressed as a vcard:Kind with either vcard:hasEmail or vcard:hasURL - whichever is most appropriate
- dcterms:conformsTo: Please enter any relevant standards as a url linking to that standard description, or list of urls
- dcterms:language: See PSDI Metadata General Guidelines > Language for more details
- adms:status: Use values from enumerated list specified by http://www.w3.org/TR/vocab-adms/#adms-status
- dcterms:issued: The date when the resource was first created. See PSDI Metadata General Guidelines > Dates for more details
- dcterms:modified: The date when the resource was last modified. See PSDI Metadata General Guidelines > Dates for more details
- dcterms:license: See PSDI Metadata General Guidelines> Licenses for more details
- dcterms:accessRights: This should correspond to a value defined by Controlled Vocabularies for Repositories.
- dcat:version: see PSDI Metadata General Guidelines > Version numbers for more information.
- adms:versionNotes: Optional property in PSDI.
- psdiDcatExt:furtherInformation: URL containing further information about this Resource for PSDI users and contributors.
- dcat:qualifiedRelation describedby: Advice/information made available to PSDI users and contributors via the PSDI platform to provide guidance about this resource or associated with it. This property should be populated with the dcterms:identifier of that Guidance resource. Can be a list if there are more than one.
- psdiDcatExt:logoURL: URL to apporopriate logo image to display with this resource.
- dcterms:bibliographicCitation: Please fill in with preferred way to cite this resource theme or resource. This field is optional, but should be filled in if the resource license requires its users to provide attribution. Recommended practice is to include sufficient bibliographic detail to identify the resource as unambiguously as possible.
- psdiDcatExt:displayPriority: Integer used to determine order of resources when listed alongside others.
- dcterms:hasPart: Used to indicate resources which are part of this resource theme but it is not their primary resource theme.
- dcat:inCatalog: Primary parent resource theme that this resource belongs to. If this resource belongs to a resource theme (true for everything apart from core-PSDI resources) this property should be populated with the dcterms:identifier of the DCAT catalog that contains it.
- dcterms:isPartOf: Additional (non-primary) parent resource theme that this resource belongs to - can be multiple values. If this resource belongs to additional resource theme(s), as well as its primary resource theme then this property should be populated with the dcterms:identifier of the DCAT catalog that contains it.
- dcterms:identifier and
-
additional properties specific to dcat:Dataset:
- dcterms:accrualPeriodicity: Use enumerated values from http://purl.org/cld/freq/ (note that this doesn't resolve currently, but values are in http://dublincore.org/specifications/dublin-core/collection-description/frequency/freq.rdf).
- dcat:spatialResolutionInMeters: Probably not relevant for most PSDI datasets.
- dcat:temporalResolution: Probably not relevant for most PSDI datasets.
- dcterms:spatial: Probably not relevant for most PSDI datasets.
- dcterms:temporal: Probably not relevant for most PSDI datasets.
- prov:wasGeneratedBy: Probably not relevant for most PSDI datasets.
-
additional properties specific to dcat:Distribution (for each distinct instance that a particular dataset is made available at):
- dcat:Distribution/dcterms:title: This is the full title (as opposed to "label" which is intended to be more concise). Its capitalisation should be title-case.
- dcat:Distribution/dcterms:description: A description of what this resource is, why someone might want to use it and its key features.
- dcat:Distribution/dcat:downloadURL: If data distribution is accessible (1) for the whole dataset to be downloaded as a direct link as a whole then use dcat:downloadURL:, (2) as a service, accessible via a direct url then use accessURL; (3) as a service or download from a landing page which governs access then use dcat:landingPage or (4) use accessService if it corresponds to a service that is described as a dcat:DataService (with an endpoint Url)
- dcat:Distribution/dcat:accessURL: If data distribution is accessible (1) for the whole dataset to be downloaded as a direct link as a whole then use dcat:downloadURL:, (2) as a service, accessible via a direct url then use accessURL; (3) as a service or download from a landing page which governs access then use dcat:landingPage or (4) use accessService if it corresponds to a service that is described as a dcat:DataService (with an endpoint Url)
- dcat:Distribution/dcat:accessService: If data distribution is accessible (1) for the whole dataset to be downloaded as a direct link as a whole then use dcat:downloadURL:, (2) as a service, accessible via a direct url then use accessURL; (3) as a service or download from a landing page which governs access then use dcat:landingPage or (4) use accessService if it corresponds to a service that is described as a dcat:DataService (with an endpoint Url) - in this case accessService should contain its dcterms:identifier
- dcat:Distribution/dcat:byteSize: Not mandatory for PSDI.
- dcat:Distribution/dcat:compressFormat: Only fill in if whole database is available for download in compressed format from downloadURL. Use enumerated value from http://www.iana.org/assignments/media-types/media-types.xhtml e.g. http://www.iana.org/assignments/media-types/application/gzip
- dcat:Distribution/dcat:packageFormat: Only use this property if data is grouped together into package in this distribution. Use enumerated value from http://www.iana.org/assignments/media-types/media-types.xhtml e.g. http://www.iana.org/assignments/media-types/application/gzip
- dcat:Distribution/dcterms:format: Use this property if format doesn't match a value in http://www.iana.org/assignments/media-types/application/sql and as such cannot be captured in dcat:mediaType
- dcat:Distribution/dcat:mediaType: Use enumerated value from http://www.iana.org/assignments/media-types/media-types.xhtml e.g. http://www.iana.org/assignments/media-types/application/sql
- dcat:Distribution/dcterms:license: See PSDI Metadata General Guidelines> Licenses for more details
- dcat:Distribution/odrl:hasPolicy: TBD
- dcat:Distribution/dcterms:conformsTo: Please enter any relevant standards as a url linking to that standard description, or list of urls
- dcat:Distribution/dcterms:issued: The date when the distribution was first created. See PSDI Metadata General Guidelines > Dates for more details
- dcat:Distribution/dcterms:modified: The date when the distribution was last modified. See PSDI Metadata General Guidelines > Dates for more details
- dcat:Distribution/spdx:checksum: Only use if database is downloadable as a single file. Should be specified with the algorithm that generates it e.g.
["spdx:checkSum": {"@type": "spdx:checkSum","spdx:algorithm": {"@id": "spdx:d4e4247", "rdfs:label": "checksumAlgorithm_sha256", "@type": "spdx:ChecksumAlgorithm" }, "spdx:checksumValue": { "@value": "de9d85cf2b8f5843ad8bcf03a3abf49c360c607f47c84c8b33a6ad18da5e72a1", "@type": "http://www.w3.org/2001/XMLSchema#hexBinary" } }]
-
additional properties specific to dcat:DataService:
- dcat:endpointDescription: This should be a url to a webpage with a formal description of the service.
- dcat:endpointURL: Should only be populated for a webservice with an endpoint URL. If not then use the landingPage property.
- dcat:servesDataset: Should contain the dcterms:identifier of the dcat:Dataset that is served by this dataservice.
-
additional properties specific to additional psdiDcatExt:Tool:
- psdiDcatExt:downloadURL: The URL of the downloadable file(s) for the tool in a given format.
- psdiDcatExt:repositoryURL: The URL of the repository which contains the code for the tool.
- psdiDcatExt:installCommand: Command(s) to install tool if simple installation.
- psdiDcatExt:programmingLanguage: A language in which source code is written that is intended to be executed/run by a software interpreter. Programming languages are ways to write instructions that specify what to do, and sometimes, how to do it. Where possible Version should be specified too. Takes values from 'Programming Language' classifiers in http://pypi.org/classifiers where possible. e.g. "Programming Language :: Python :: 3.14". Multiple values are allowed.
- psdiDcatExt:deliveryFormat: The file format if the tool is available as a single file or package or distribution for download and installation. The format SHOULD be expressed using a media type as defined by IANA media types registry http://www.iana.org/assignments/media-types/, if available. The media type for docker container image is 'application/vnd.docker.distribution.manifest.v2+json' (this is not currently in IANA).
- psdiDcatExt:developmentStatus: Development status is an information content entity which indicates the maturity of a software entity within the context of the software life cycle. Takes values from 'Development Status' classifiers in http://pypi.org/classifiers where possible e.g. "Development Status :: 5 - Production/Stable"
- psdiDcatExt:byteSize: The size in bytes can be approximated (as a non-negative integer) when the precise size is not known.
- psdiDcatExt:exampleData: An input dataset for this tool. This should be expressed as the dcterms:identifier of an appropriate DCAT dataset.
- psdiDcatExt:operatingSystem: Optional property - if tool is specific to particular operating systems please indicate them here. Version numbers should be included where needed. Takes values from 'Operating System' classifiers in http://pypi.org/classifiers where possible (e.g. 'Windows :: Wi11, Android'). Multiple values are allowed.
- spdx:checkSum: Only use if database is downloadable as a single file. Should be specified with the algorithm that generates it e.g.
["spdx:checkSum": {"@type": "spdx:checkSum","spdx:algorithm": {"@id": "spdx:d4e4247", "rdfs:label": "checksumAlgorithm_sha256", "@type": "spdx:ChecksumAlgorithm" }, "spdx:checksumValue": { "@value": "de9d85cf2b8f5843ad8bcf03a3abf49c360c607f47c84c8b33a6ad18da5e72a1", "@type": "http://www.w3.org/2001/XMLSchema#hexBinary" } }]
Versioning
- After making changes to this file the following top-level metadata fields for the file should be reviewed or updated:
[dcat:version]
[adms:status]
[adms:versionNotes]
[dcterms:issued]
[dcterms:modified]
[spdx:checksum]
Validation
See general JSON-LD metadata validation outlined in PSDI Metadata Checking Guidelines. Additional validation specific to this profile:
- Automated validation performing SHACL validation against SHACL shapes found via DCAT-AP 3.0.0 and downloaded from DCAT-AP/releases/2.1.1/ on 20250210:
- shacl/dcat-ap_2.1.1_shacl_shapes.ttl,
- shacl/dcat-ap_2.1.1_shacl_imports.ttl,
- shacl/dcat-ap_2.1.1_shacl_mdr_imports.ttl
- (shacl/dcat-ap_2.1.1_shacl_range.ttl is just validated for information)
- (shacl/dcat-ap_2.1.1_shacl_mdr-vocabularies.shape.ttl is just validated for information, but is not critical if it does not pass because we choose not to use some of the vocabularies that are enforced in this SHACL shape e.g.
[dcterms:language]
to values in http://publications.europa.eu/resource/authority/language,[dcterms:publisher]
to values in http://publications.europa.eu/resource/authority/corporate-body,[dcat:themes]
to values in http://publications.europa.eu/resource/authority/data-theme) - ("shacl/dcat-ap_2.1.1_shacl_shapes_recommended.ttl" is not used for validation since it gives an encoding error)
- Additional validation:
- Creator: Aileen Day
- Last modified date: 2025-04-09