CLI Usage

data_collections_api provides a few command-line tools for simplifying the process of uploading or verifying data and metadata.

data_collections

data_collections
operation {validate,template,dump,upload}
validate

Validate metadata

template
dump

Dump a template file.

`upload`

Upload a dataset to an invenio repository.

-V, --version

Show program’s version number and exit.

data_collections is the general top-level interface to the tools. These tools are implemented as sub-parsers within the main module.

Running data_collections

By default, if the data_collections_api package is installed, data_collections is installed as an executable script on your main PATH. In general, this is the main entry point.

If that is not desired, it is possible to run data_collections through the python module system:

python -m data_collections_api

where the data_collections_api module (folder) is on the current sys.path (by being installed, in the current PYTHONPATH or being in the current working directory.):

PYTHONPATH=/path/containing/data_collections_api python -m data_collections_api

Throughout the rest of this page, we will assume data_collections is used as the main entrypoint.

upload

data_collections upload
--api-url URL

URL for the API associated with the Invenio repository, e.g. https://data-collections-staging.psdi.ac.uk/api

--api-key str

Your API key/token for accessing the Invenio repository instance.

--metadata-path file

File path to the yaml file containing the metadata to upload a record to an Invenio repository, e.g. path/to/files/record.yaml

-f {json,yaml}, --metadata-format {json,yaml}

Parse metadata file as this type (default: yaml).

--files FILES [FILES ...]

List of file paths associated with the record to be uploaded, e.g. path/to/files/data.*

--community str

Name of a Invenio repository community to upload the record to, e.g. biosimdb, data-to-knowledge, etc.

data_collections_api can take your data and metadata and automatically upload it to the Invenio repository. To do so, you need to have some information at hand:

  • The URL of the repository you wish to upload the data to. In the case of PSDI data, this will often be https://data-collections.psdi.ac.uk.

  • Your API key (also called a Personal Access Token or PAT) for the repository to give permissions to write and upload data.

  • A metadata file detailing the data relating to the files (see Schemas).

  • The files ready to upload.

With all this prepared, uploading the data is as simple as:

data_collections upload --api-url https://data-collections.psdi.ac.uk --api-key 1234567890abcdef --metadata-path /path/to/metata_file.yaml --files FILE1 FILE2 --community my_community

Note

Since this is a common operation it is also available as the standalone upload_record

validate

data_collections validate
FILE

File to validate.

-f {json,yaml}, --format {json,yaml}

Parse FILE as this type (default: determine from suffix).

-S SCHEMA, --schema SCHEMA

Validate against the given schema (default: base)

Validate the metadata file for a dataset before uploading.

data_collections_api can validate your metadata file against the schema to verify the contents of the file match what is required to make a valid upload.

Note

The validator does not verify most data itself, you must ensure that all entries are spelled and written correctly.

To validate a data file simply run:

data_collections validate [file]

e.g.

data_collections validate examples/biosim_record.yaml

The file can be either in json or yaml formats (see: Metadata Format). data_collections validate will attempt to determine the appropriate format from the file extension, but this can be specified explicitly with the -f flag.

data_collections validate -f json examples/biosim_record.yaml

Note

The above will raise an error since the file is not in json format.

dump

data_collections template
data_collections dump
FILE

File to dump.

-f {json,yaml}, --format {json,yaml}

Dump FILE as this type (default: determine from suffix).

data_collections_api provides a method to quick-start building metadata, template will dump an example metadata file for a particular community and data-type (though currently only a basic example is available). To do so, simply run

data_collections dump my_metadata.yaml

You can then edit and modify this template to fill in the data needed.

upload_record

upload_record
--api-url URL

URL for the API associated with the Invenio repository, e.g. https://data-collections-staging.psdi.ac.uk/api

--api-key str

Your API key/token for accessing the Invenio repository instance.

--metadata-path file

File path to the yaml file containing the metadata to upload a record to an Invenio repository, e.g. path/to/files/record.yaml

-f {json,yaml}, --metadata-format {json,yaml}

Parse metadata file as this type (default: yaml).

--files FILES [FILES ...]

List of file paths associated with the record to be uploaded, e.g. path/to/files/data.*

--community str

Name of a Invenio repository community to upload the record to, e.g. biosimdb, data-to-knowledge, etc.

One-stop tool to upload a record to the repository, see upload.