Some Problems with Standard Geospatial Metadata

Some problems with standard geospatial metadata

Simon J D Cox, Bruce Simons, Nick Car

CSIRO Land & Water

Executive Summary

We describe a number of issues with standards for geospatial metadata.

The ISO/ANZLIC metadata standardwas designed primarily by map and image data managers, and reflects a provider-centric viewpoint. The metadata record targets a level of aggregation corresponding to traditional map, image or ‘dataset’ series, and does not scale to the feature- or item-level at which most data is used;
A single, self-contained metadata ‘record’ is asked to play multiple roles -i.e. discovery, evaluation, access, use - which are not used together in practice. These functions require different information at different stages, thatare better satisfied by a web of metadata rather than a single document;
The XML representation of the metadata was intended only for transfer, but is used for storage and processing. Processing textis substituted for manipulating an object-model. There is no normalization of the data, and the same items appear inconsistently in separate records.XML validation is the quality-control mechanism, though it fails to assess important factors, while also disallowing ‘best-effort’ provision of metadata. The original ISO object-model is relegated to defining a structure for validation on ingest, rather than functionality for metadata use;
Focus on the metadata document distorts efforts to build infrastructures. It is easy to count the number of records in a catalogue, but this does not measure its usefulness. Creation of large numbers of metadata records hasdisplacedeffort from more user-centric elements of a spatial data infrastructure.

Managers of geospatial data in the public sector are relatively anomalous in theirfetish fordetailed,general purpose,metadata records. In the non-geospatial sector, the emphasis is on either highly specialized domain-specific schemas (e.g. health), or very flexible general purpose vocabularies (e.g. Dublin Core, DCAT), with mix-and-matching of RDF vocabularies (FOAF, VoID, ORG, PROV-O) emerging in the web of linked data. Meanwhile, the mass of web-hosted data is not described formally at all, with its content indexed directly by the search engines. While the latter strategy is clearly challenging for non-text data, some key questions are begged:

Is a pre-defined descriptive schema more effective than a statistical approach emerging from mining the data?
Is the separation of metadata and data clear in practice?
Should metadata ever be entered manually?
Do we know what a minimum valid record contains?

We propose a more flexible - though transitional - approach, based on application of semantic-web technologies.