Some problems with standard geospatial metadata

Simon J D Cox, Bruce Simons, Nick Car

CSIRO Land & Water

Executive Summary

We describe a number of issues with standards for geospatial metadata.

  1. The ISO/ANZLIC metadata standardwas designed primarily by map and image data managers, and reflects a provider-centric viewpoint. The metadata record targets a level of aggregation corresponding to traditional map, image or ‘dataset’ series, and does not scale to the feature- or item-level at which most data is used;
  2. A single, self-contained metadata ‘record’ is asked to play multiple roles -i.e. discovery, evaluation, access, use - which are not used together in practice. These functions require different information at different stages, thatare better satisfied by a web of metadata rather than a single document;
  3. The XML representation of the metadata was intended only for transfer, but is used for storage and processing. Processing textis substituted for manipulating an object-model. There is no normalization of the data, and the same items appear inconsistently in separate records.XML validation is the quality-control mechanism, though it fails to assess important factors, while also disallowing ‘best-effort’ provision of metadata. The original ISO object-model is relegated to defining a structure for validation on ingest, rather than functionality for metadata use;
  4. Focus on the metadata document distorts efforts to build infrastructures. It is easy to count the number of records in a catalogue, but this does not measure its usefulness. Creation of large numbers of metadata records hasdisplacedeffort from more user-centric elements of a spatial data infrastructure.

Managers of geospatial data in the public sector are relatively anomalous in theirfetish fordetailed,general purpose,metadata records. In the non-geospatial sector, the emphasis is on either highly specialized domain-specific schemas (e.g. health), or very flexible general purpose vocabularies (e.g. Dublin Core, DCAT), with mix-and-matching of RDF vocabularies (FOAF, VoID, ORG, PROV-O) emerging in the web of linked data. Meanwhile, the mass of web-hosted data is not described formally at all, with its content indexed directly by the search engines. While the latter strategy is clearly challenging for non-text data, some key questions are begged:

  • Is a pre-defined descriptive schema more effective than a statistical approach emerging from mining the data?
  • Is the separation of metadata and data clear in practice?
  • Should metadata ever be entered manually?
  • Do we know what a minimum valid record contains?

We propose a more flexible - though transitional - approach, based on application of semantic-web technologies.