geometamaker.geometamaker module

geometamaker.geometamaker.describe(source_dataset_path, compute_stats=False)[source]

Create a metadata resource instance with properties of the dataset.

Properties of the dataset are used to populate as many metadata properties as possible. Default/placeholder values are used for properties that require user input.

Parameters:
  • source_dataset_path (string) – path or URL to dataset to which the metadata applies

  • compute_stats (bool) – whether to compute statistics for each band in a raster.

Returns:

a metadata object

Return type:

geometamaker.models.Resource

Raises:
  • ValueError if the file type or protocol of the dataset is not supported.

  • FileNotFoundError if the path does not exist.

geometamaker.geometamaker.describe_archive(source_dataset_path, scheme, **kwargs)[source]

Describe file properties of an archive file.

Parameters:
  • source_dataset_path (str) – path to a file.

  • scheme (str) – the protocol prefix of the filepath

  • kwargs (dict) – additional options when describing a dataset.

Returns:

dict

geometamaker.geometamaker.describe_collection(directory, depth=32767, exclude_regex=None, exclude_hidden=True, describe_files=False, backup=True, target_filename=None, **kwargs)[source]

Create a single metadata document to describe a collection of files.

Describe all the files within a directory as members of a “collection”. The resulting metadata resource should include a list of all the files included in the collection along with a description and metadata filepath (or placeholder). Optionally create individual metadata files for each supported file in a directory.

Parameters:
  • directory (str) – path to collection

  • depth (int, optional) – maximum number of subdirectory levels to traverse when walking through directory to find files included in the collection. A value of 1 limits the walk to files in the top-level directory only. A value of 2 allows descending into immediate subdirectories, etc. All files in all subdirectories in the collection will be included by default.

  • exclude_regex (str, optional) – a regular expression to pattern-match any files you do not want included in the output metadata yml.

  • exclude_hidden (bool, default True) – whether to exclude hidden files (files that start with “.”).

  • describe_files (bool, default False) – whether to describe all files, i.e., create individual metadata files for each supported resource in the collection.

  • backup (bool) – whether to write a backup of a pre-existing metadata file before ovewriting it in cases where that file is not a valid geometamaker document.

  • kwargs (dict) – optional keyward arguments accepted by describe.

Returns:

Collection metadata

geometamaker.geometamaker.describe_file(source_dataset_path, scheme)[source]

Describe basic properties of a file.

Parameters:
  • source_dataset_path (str) – path to a file.

  • scheme (str) – the protocol prefix of the filepath

Returns:

dict

geometamaker.geometamaker.describe_raster(source_dataset_path, scheme, **kwargs)[source]

Describe properties of a GDAL raster file.

Parameters:
  • source_dataset_path (str) – path to a GDAL raster.

  • scheme (str) – the protocol prefix of the filepath

  • kwargs (dict) –

    additional options when describing a dataset: * 'compute_stats' (bool): whether to compute statistics

    for each band in the raster. Default is False.

Returns:

dict

geometamaker.geometamaker.describe_table(source_dataset_path, scheme, **kwargs)[source]

Describe properties of a tabular dataset.

Parameters:
  • source_dataset_path (str) – path to a file representing a table.

  • scheme (str) – the protocol prefix of the filepath

  • kwargs (dict) – additional options when describing a dataset.

Returns:

dict

Raises:

ValueError if the file cannot be read as a table.

geometamaker.geometamaker.describe_vector(source_dataset_path, scheme, **kwargs)[source]

Describe properties of a GDAL vector file.

Parameters:
  • source_dataset_path (str) – path to a GDAL vector.

  • scheme (str) – the protocol prefix of the filepath

  • kwargs (dict) – additional options when describing a dataset.

Returns:

dict

geometamaker.geometamaker.detect_file_type(filepath, scheme)[source]

Detect the type of resource contained in the file.

Parameters:
  • filepath (str) – path to a file

  • scheme (str) – the protocol prefix of the filepath

Returns:

str

Raises:

ValueError on unsupported file formats.

geometamaker.geometamaker.validate(filepath)[source]

Validate a YAML metadata document.

Validation includes type-checking of property values and checking for the presence of required properties.

Parameters:

directory (string) – path to a YAML file

Returns:

pydantic.ValidationError

Raises:

ValueError if the YAML document is not a geometamaker metadata doc.

geometamaker.geometamaker.validate_dir(directory, depth=32767)[source]

Validate all compatible yml documents in the directory.

Parameters:
  • directory (string) – path to a directory

  • depth (int) – maximum number of subdirectory levels to traverse when walking through directory.

Returns:

a list of the filepaths that were validated and

an equal-length list of the validation messages.

Return type:

tuple (list, list)