Schemas, Vocabularies, and Ontologies
Metadata schemas
Standardised metadata schemas define common fields and formats that provide a consistent framework for describing data. The schema specifies what information should be recorded about a dataset, experiment, or publication and how they should be formatted. This consistency makes it easier to compare, combine and validate datasets from different sources. It also enables interoperability between systems making data easier to find and facilitates data exchange because the data can be easily understood without additional programming.
A metadata schema contains:
- Metadata elements to describe a dataset
- How the elements are named, formatted, and organised
- Any rules or constraints for each element
Examples of schemas
| Field | Name | Description |
|---|---|---|
| General | Dublin Core | Dublin Core is a set of standardised metadata elements used to describe digital resources such as documents, datasets, and images. It includes 15 basic descriptors and it is frequently extended to provide more domain-specific elements. |
| DataCite | The DataCite metadata schema defines core metadata properties to enable resources to be accurately identified for citation and retrieval purposes. | |
| Observations, Measurements, and Samples | This standard defines a schema for observations, sampling, and the observation process to enable the exchange of information describing observations and their results. | |
| PREMIS Data Dictionary for Preservation Metadata | PREMIS is an international standard for metadata to support the preservation of digital objects. | |
| PROV | PROV provides a schema for provenance information that enables assessments to be made about the quality, reliability, and trustworthiness of data. | |
| Chemistry | Chemical Markup Language (CML) | An XML-based standard for representing molecules, reactions, spectra, and computational chemistry results. The standard enables the exchange of chemical data between software tools and databases. |
| NeXus | NeXuS is an international standard for neutron, X-ray, and muon experiments. It aims to define a common data exchange format with domain-specific rules for organising the data and a dictionary of well-defined field names. | |
| Materials | MatCore | MatCore is a standard under development to define metadata for computational materials science research. |
| Proteins | PDBx/mmCIF | PDBx/mmCIF describes the Macromolecular Crystallographic Information File used by the Protein Data Bank (PDB) to describe macromolecular structures. |
Vocabularies
A vocabulary is a curated set of standardised terms used within a metadata schema to promote consistency and clarity. These terms are often tailored to specific disciplines, helping ensure that concepts are described accurately within their research context, especially when the same term might carry different meanings across fields. In metadata, vocabularies may take the form of a simple list of defined terms, or they can be structured hierarchically—like a taxonomy—where terms are organized into broader and narrower relationships to reflect conceptual groupings. For example:
- Flat lists (e.g., a list of chemical techniques: "NMR", "IR", "Mass Spectrometry")
- Hierarchical structures (e.g., a taxonomy where "Spectroscopy" includes "NMR" and "IR" as narrower terms)
Examples of vocabularies
| Field | Name | Description |
|---|---|---|
| Chemistry | Chemical Component Dictionary | The Chemical Component Dictionary (CCD) is a chemical reference data resource that describes all residue and small molecule components found in Protein Data Bank (PDB) entries. Each chemical definition includes descriptions of chemical properties. |
| FAIRsharing.org Chemistry Vocabulary | A controlled vocabulary used for indexing bibliographic records in the PASCAL database | |
| IUPAC Compendium of Chemical Terminology | The IUPAC Compendium of Chemical Terminology, also known as the "Gold Book" provides a list of standard terms and definitions. The list of terms is viewable on the Web and also downloadable in JSON and XML formats. |
Ontologies
An ontology provides a structured set of terms that describes the key concepts in a field and the relationships between them. In physical science, this means that concepts such as molecules, reactions, or material properties are defined in a consistent way, allowing data to be linked reliably across experiments, publications, and databases. Ontologies are widely used in physical science, especially within materials science.
Ontologies provide the following features:
- A list of the important concepts with the field, for example, atoms, bonds, materials, processes
- Standardised definitions for concepts
- Concepts have properties, for example substances have boiling points, reactions have reaction conditions and yields
- Relationships between concepts, for example, 'a catalyst participates in a reaction'
- Ontologies can have rules that enable logical inferences to be made
Ontologies are machine-readable models that are important for discoverability and interoperability of data on the Web and providing shared vocabulary and rules that give meaning to data. Ontologies also enable software to understand that meaning of the underlying data especially for artificial intelligence (AI) and machine-learning, where the ontology enables the AI to infer information about the data. For example, if a compound is a solvent and solvents are liquid, then the compound is a liquid. Ontologies can also help AI to make inferences across domains, for example to connect molecular properties with biological pathways.
Examples of ontologies
| Field | Name | Description |
|---|---|---|
| General | Information Artifact Ontology (IAO) | An ontology of information entities and forms the basis for some other ontologies. |
| Web Ontology Language (OWL) | OWL is a family of knowledge representation languages designed for authoring ontologies that describe classes, properties, and relationships of things in a domain. Many of the physical science ontologies are built upon OWL. | |
| Quantities, Units, Dimensions and DataTypes (QUDT) | QUDT defines the base classes properties, and restrictions used for modeling physical quantities, units of measure, and their dimensions in various measurement systems. | |
| Chemistry | AFO | |
| Chemical Entities Mixtures and Reactions Ontological Framework (CHEMROF) | An ontology representing chemical and related entities including atoms and molecules. | |
| Chemical Entities of Biological Interest (ChEBI) | ChEBI is an open-access database and ontology of chemical entities. The ontology provides details of relationships between chemical entities enabling querying queries on chemical class and role. | |
| Chemical Information Ontology (CHEMINF) | CHEMINF aims to establish a standard in representing chemical information. In particular, it aims to produce an ontology to represent chemical structure and to richly describe chemical properties, whether intrinsic or computed. | |
| Chemical Functional Ontology (ChemFOnt) | describing the functions and actions of more than 341,000 biologically important chemicals. | |
| Chemical Methods Ontology(CHMO) | CHMO describes methods used to collect data in chemical experiments, preparation of data, and the instruments used for experiments. | |
| Classyfire Ontology (ChemOnt) | ChemOnt is a comprehensive, computable, and manually curated chemical taxonomy of nearly 5,000 chemical classes of organic and inorganic compounds designed for use with the Classyfire tool. | |
| Mass Spectrometry (MS) | A structured controlled vocabulary for the annotation of experiments concerned with proteomics mass spectrometry. | |
| Molecular Process Ontology (MPO) | MOP is an ontology of molecular processes underlying reactions, for example cyclization, methylation and demethylation. | |
| Name Reaction Ontology (RXNO) | RXNO is an ontology of chemical reactions named for their discoverer or developer created by the Royal Society of Chemistry. | |
| Nuclear Magnetic Resonance Controlled Vocabulary (nmrCV) | a standard, machine-readable collection of precisely defined terms used to describe and annotate nuclear magnetic resonance (NMR) data and experimental details | |
| Ontology for Chemical Kinetic Reaction Mechanisms (OntoKin) | OntoKin is an ontology developed for representing chemical kinetic reaction mechanisms. | |
| Ontology of Property | An Ontology of Property for Physical, Chemical and Biological Systems and theory of laboratory procedures. | |
| OntoRXN | OntoRXN is an ontology developed for expressing chemical reaction networks (RXNets) characterized from computational calculations.) | |
| Chemical Species Ontology for Data Integration and Knowledge Discovery (OntoSpecies) | OntoSpecies is an ontology for chemical species and their properties. It covers a diverse collection of identifiers, classifications and uses of chemical species, as well as spectral data. | |
| Open Crystallographic Defects Ontology (OCDO) | An ontology describing crystal defects and related topics such as simulation concepts. | |
| Vibrational Spectroscopy Ontology (VIBSO) | VIBSO is an ontology focused on the domain of vibrational Raman spectroscopy. | |
| Computational Chemistry | Computational Chemistry Ontology (ontocompchem) | Linked-data framework for connecting species in chemical kinetic reaction mechanisms with quantum calculations. |
| Computer Aided Process Engineering (OntoCAPE) | OntoCAPE is a large-scale ontology for the domain of Computer Aided Process Engineering (CAPE) | |
| Materials Science | NFDI-MatWerk ontology (MWO) | An ontology for Materials Science and Engineering. |
| Elementary Multiperspective Material Ontology (EMMO) | EMMO is a standard representational framework for knowledge capture and interoperability in applied science and engineering, especially materials science and manufacturing. | |
| Chemical Biology | Protein Modification Ontology (PSI-MOD) | PSI-MOD is an ontology consisting of terms that describe protein chemical modifications. |
| Bioscience | Experimental Factor Ontology (EFO) | An ontology of experimental variables particularly those used in molecular biology. |
| Gene Ontology (GO) | GO is a structured, standardised representation of biological knowledge organised by Molecular Function, Cellular Component, and Biological Process. | |
| Phenotype And Trait Ontology (PATO) | PATO is an ontology of phenotypic qualities (properties, attributes or characteristics). | |
| Biomedical | Medical Subject Headings (MESH) | The Medical Subject Headings (MeSH) thesaurus is a controlled and hierarchically-organized vocabulary produced by the National Library of Medicine used for indexing, cataloging, and searching of biomedical and health-related information. |
| Ontology for Biomedical Investigations (OBI) | OBI defines more than 2500 terms for assays, devices, objectives, for life-science and clinical investigations. | |
| Nanoscience | eNanoMapper ontology | The ontology includes and defines common vocabulary terms in use in nanosafety research with a classification hierarchy and other relationships. It extends and refines the Nanoparticle Ontology. |
| Processes | ISO 15926 Ontology | A data model designed to facilitate integration of life-cycle data for process plants, including oil and gas production facilities. |
| Process Chemistry Ontology (PROCO) | PROCO is a community-based ontology focused on process chemistry including considerations such as product quality, process robustness, economics, environmental sustainability, regulatory compliance and safety. |
What to do next
Look up terms and information about ontologies from the following services:
Related links: See the links below to information about controlled vocabularies and standards in the physical sciences community:
- Creator: Cerys Willoughby
- Last modified date: 2025-11-21
- License: CC-BY-4.0
- Citation: Please cite: Cerys Willoughby, Schemas, Vocabularies, and Ontologies, https://guidance.psdi.ac.uk/docusaurus-pages/docs/guidance/metadata/metadata-for-ps/schemas, PSDI (modified 2025-11-21)
If you would like to contribute content to the PSDI Knowledge Base or have feedback you would like to give on this guidance, please contact us.