Design of a core of metadata of the Basque Meteorological Service

M. Maruri1, J.A.Romo2, B.Manso1,L.Lantarón1

(1)MathematicalAppliedDepartment.EngineeringSchoolofBilbao,UniversityoftheBasqueCountry, Bilbao(Spain),

(2)DepartmentofElectronicsandTelecommunications.SchoolofEngineering,University of the BasqueCountry, Bilbao (Spain),

1. Introduction

Metadata is data about data.That is, information that is provided about a given data set, called resource.According to the World Meteorological Organization (WMO):

In the scope of this work, metadata is, in short, the information you have about the meteorological data. This one is necessary to make use of data. This information is considered as a descriptor of the data and it is crucial for quality evaluation between others.

But the definition is theoretical and non-practical due to the lack of identification and definition of metadata in the flow of data acquisition. So, this article presents the methodology used to design a metadata model applicable online for observations and meteorological data.

1.1Motivation

The Basque Country lies between 42° and 43.5° north the equator, and this range puts the country in what has been called temperate zone, between the Atlantic and Mediterranean climates. Besides the Basque Country has a complex topography between Pyrenees and Cantabrian chain.As a result, there can be distinguished four climatic zones locally, the North Atlantic (north of the region), a sub-Atlantic climate zone (western valleys of Alava), a sub-Mediterranean climate zone and a continental climate.

In the last decade, the society demands more weather information, thereforethe Basque Government's investment in meteorological observations in order to improve analysis/diagnostic/forecast definition of the meteorological situations.

The installation of new systems of observation and the technological improvements in data management is a natural evolution, consistent with trends that have experienced other meteorological centers worldwide, resulting in complex networks of data collection.But like at other places, it is necessary to adapt to this technologies in order to continue growing and be able to reply to new demands from the society.This growth in data acquisition systems needs a good information management and assurance of the quality of measure, so that the information is really useful.

To do so, within the framework of the program ETORTEK Strategic Research Project, there is the project ISD (Instrumentation, Systems and Data), led by Dominion I+D, which aims to improve the efficient management of meteorological data,providing the scientific community access to powerful tools.

1.2Framework

In order to improve service and maximize efficiency of the investment in instrumentation, and properly handle the large volume of meteorological information that is collected continuously, the research activity takes place in several lines of action.

This work is under the line ISD.1 Instrumentation, Part 4: Metadata and Quality Measurement, and is specifically about the definition of a metadata model on weather information gathered by the DMC.This is done in the School of Engineering of Bilbao (ETSI), belonging to the University of the Basque Country (UPV-EHU), and with a direct cooperation with Dominion I+D, the company leader of the project.

1.3Aims

The main aim of this work is; first theidentification of the information, second definition of the field, third design of the structure and finally implementation of the metadata.All this steps are associated with the weather information collected by the DMC.It is designing a metadata model as simple as possible, but that meets the needs of information about the data demanded by users.

The requirements of the model are; to be able to hold the metadata that is already implemented, so that the historical data could be included in the new system and to be a homogeneous model, as to maintain the same structure for different types of data that exist and may exist in the system.

The model also must be based on the regulations applicable to the work context, as it will be necessary the use of tools that manage metadata for the implementation of the model.

The model design is done with the intention of being final and attempting to fulfill the expectations that are marked in this section of aims from the beginning.However, it is not intended to cover all possible cases of metadata specifically, but generally define it, so it can serve in all cases, although it may be complete later.Still, it should be flexible, so that it is possible to extend.

It is also essential in its definition, metadata to support needed for management control procedures and quality assurance.

2 Methodology

2.1Development

The first tasks in the work are the studies of meteorological instrumentation and state of the art related to metadata.It is selected from them some literature as key entries of information for the project. They are considered the pillars or the point from which to begin building the project:

a)“An Inventory of the Historical Automatic Weather Station Database of the Basque Met Service” (Maruri et al, 2006).This document sets out the main problems that arise in the definition of metadata.

b)Standard UNE 500550: “Redes de estaciones meteorológicas automáticas. Formatos de intercambio de registros meteorológicos y climatológicos. Metadatos”. Sets the standard for exchanging data and metadata from surface automatic weather stations in Spain.

c)“Metadata Standards for all Automatic Weather Station Installations” by Igor Zahumenský. The document makes a detailed definition of the metadata associated with surface automatic weather stations.

d)Structure of the current database of the DMC.Brief description of metadata currently being considered in the database of the DMC.

e)Standard UNE-EN ISO 19115: “Geographic Information. Metadata". Defines the metadata associated with geographic data.

f)Standard ISO 19139: “Geographic Information - Metadata - XML Schema Implementation”. Describes how to implement computerized geographic metadata defined according to the UNE-EN ISO 19115.

Some of these documents address the problem of metadata from a purely meteorological standpoint (a, b, c, d), while others are approached from a geographical point of view (e) or from an IT perspective (f).

The use of the UNE-EN ISO 19115: “Geographic Information.Metadata", concerning geographical metadata, as a basis for this work is justified by several circumstances.Firstly, the lack of a similar standard specifically referring to meteorological data requires seeking other solutions.Actually, there is a UNE standard (UNE 500550: “Redes de estaciones meteorológicas automáticas. Formatos de intercambio de registros meteorológicos y climatológicos. Metadatos”) about meteorological metadata, but is not applicable to this case because it is a standard that provides a general classification of metadata information.The UNE 500550 is aimed at exchanging data and metadata only referring to surface automatic weather stations, while the present project aims to define a metadata capable of supporting all types of instruments the DMC may possess now or in the future.

Secondly, the UNE-EN ISO 19115 standard has a number of features that make it versatile, adaptable to other areas apart from the purely geographical, with the advantages of suitability for data search with geographic content (such asthe meteorological), and being a widely used and proven standard with many applications that make use of it.Some of the features that allow you to adapt to other areas are that it defines a small core and the explicit possibility to extend metadata.

In this rule, the core of the metadata is a set of mandatory, conditional, and optional elements that are used as basic descriptors of a data set.Moreover, the fact of not all fields being mandatory, and also the structure being expandable from its definition, allows for some flexibility in defining the metadata.

All this, coupled with the fact that the WMO defines its own core of metadata as a profile of ISO 19115, clearly justifies the applicability of this standard for defining metadata for meteorological data.

The use of the ISO 19139 standard is directly linked to its "sister" ISO 19115.As discussed above, the first defines how to implement the second in XML format.This file format is very suitable for the purpose of this study for various reasons.To begin with, it is the present and presumably future standard proposed by the World Wide Web Consortium (W3C) for exchanging data.It also allows modification of files with any simple text editor and viewing with any web browser.

2.2Selection and Organization of metadata

In a next phase of work, there are used as bases, in order to collect all the metadata that can be taken into account in the definition of meteorological metadata, the following documents:

-Structure of the current database of the DMC.

-Standard UNE 500550: “Redes de estaciones meteorológicas automáticas. Formatos de intercambio de registros meteorológicos y climatológicos. Metadatos”.

-“Metadata Standards for all Automatic Weather Station Installations” by Igor Zahumenský.

There arefew metadata common among first two documents and third but are selected not only commons but also those considered indispensable or more relevant of each. There are also some metadata included by the author to give cohesion to the schemes that are presented in later sections of this (which also include feedback from users).

In order to simplify the metadata model to make it more simple and manageable for their actual implementation, it was decided to dispense with some metadata that complicate the scheme and whose contribution is not essential.To reach the small and functional design intended, first, the most important metadata is selected.Some features are sacrificed as the management of sensor exchange between different AWS, but it is proposed a simplified scheme that meets most requirements.Thus, we finally selected the following list of metadata:

Nombre del metadato / Significado
nombre_archivo_datos / Name of the data file to which metadata relate. Format: “codigo_instrumento_instante_adquisicion”, being “código_instrumento” and “instante_adquisicion” the metadata previously known.
variable_meteorologica / Meteorological phenomenon that is measured with the data.
instante_adquisicion / Instant of time the data acquisition instrument creates the data file, formatted “YYYY-MM-DDThh:mm:ss".Some instruments, such as AWS or the weather radar provide data in short periods of time.Their data are deemed to have been collected at the moment setting up the data file.Others, like the lightning detector, include in the file the moment it occurs every ray detected.
localizacion_geografica / Geographic data type. This field can only be fulfilled in two ways: “remota” or “in-situ” and refers to how the instrument acquires the data.If information relating to other locations is acquired, will be "remota", and if acquiring data relating only to the place where the instrument is placed (the weather station), "in-situ."
nombre_calidad / Identification of each quality procedure applied to the data.
fecha_hora_calidad / Moment of conducting the process of data quality.
procedimiento_calidad / Detailed description of the quality procedure.
resultado_calidad / Whether or not the data pass the quality procedure.Is a Boolean value, so that can only be completed with true or false.
nombre_instrumento / Name of the instrument in the acquisition.
Tipo_instrumento / If it is an AWS, a Radar…
codigo_instrumento / A code univocally associated with the instrument that acquires the data. The format proposed to designate the instrument is "XNNN" where "X" with a capital letter designating the type of instrument and "NNN" three figures that distinguish it from other instruments of its kind.Following this list, we present the letters that are assigned to each type of instrument.
fabricante / Name of the instrument manufacturer.
modelo / Model of the equipment.
num_serie / Equipment serial number.
accion / Action taken by the instrument. It is always fixed to “adquisicion” (acquisition) for the scope of this paper.
superficie_cubierta / Description of the geographic area in which the instrument provides data.In cases where the metadata "localizacion_geografica" has the value "in-situ", metadata "superficie_cubierta" has the value "point."In cases where the metadata "localizacion_geografica" has the value "remota", metadata "superficie_cubierta" has a detailed description of the geographic area in which the instrument provides data.
unidad_medida / Unit(s) in which the measure is presented.
fecha_calibracion / Date of last calibration. Format: “AAAA-MM-DDTHH:MM:SS”
calibracion_recomendada / Recommended frequency of calibration.
fecha_mantenimiento / Date of last maintenance. Format: “AAAA-MM-DDTHH:MM:SS”
mantenimiento_recomendado / Recommended frequency of preventive maintenance.
nombre_estacion / Name of the weather station.*
nombre_estacion_alternativo / Alternative name of the weather station.
codigo_estacion / code that is assigned to the meteorological station and identify it uniquely.The format proposed to designate it is "XNNN" where "X" is a capital letter designating the type of station and "NNN" three figures that distinguish it from other stations of its kind.Following this list, we present the letters that are assigned to each type of weather station.
Tipo_estacion / Weather station type.
coordenadas / Geographical location of the station. Coordinates of the station, with format "(X, Y)" expressed as decimal numbers representing positive or negative "X" degrees North latitude and "Y" degrees East longitude.
altitud / Elevation of the station (meters).
descripción_entorno / Description of the environment.May include: soil type, physical constants, terrain profile, vegetation type and the date of entry, local topography and roughness of the terrain (Davenport classification).
codigo_altura / Data level which the atmospheric pressure of the station refers to.
propietario_estacion / Owner of the weather station, meaning the enclosure where weather instruments are installed.
fecha_inicio_estacion / Date of establishment of that place as a weather station with format "YYYY-MM-DD."

The proposed codes to uniquely identify the stations are formed by a letter indicating the type of station in question, followed by three numbers that are used to distinguish between the stations of each type.The letters proposed are {“T”: for terrestrial stations, “M”: for maritime stations, “A”: for atmosphere stations, “E”: for special stations, “O”: for space stations}.

If a station includes Tools that define it belonging to more than one type, both letters will be included followed, in the order they appear in this list.For example, a station that had an AWS and also a radar, could have the code: "TE001.”

The instrument codes proposed are also made up of a letter indicating the type of instrument, followed by three numbers that distinguish each such instrument.The letters proposed for each case are { “Y”: ocean-weather buoy, “B”: coastal platform, “D”: disdrometer, “V”: lightning detector, “R”: radar, “M”: weather satellite, “G”: Geónica AWS, “C”: Campbell AWS}

Note that the ground stations are distinguished by the type of datalogger installed (Geonica or Campbell) and not by the type of AWS.This is done for consistency with the current way of distinguishing them in the DMC.

*Within this work is called “weather station” the place or enclosure where meteorological measurements are made.

2.3Organization of the metadata according to ISO 19115 – ISO 19139

Following the selection of metadata, these are applicable to their organization, following the scheme rules UNE-EN ISO 19115: “Información Geográfica. Metadatos” and ISO 19139: “Geographic Information - Metadata - XML Schema Implementation”. To this end, it is used the tool CatMDEdit, which is an initiative of the National Geographic Institute of Spain (IGN), and the result of the scientific and technical collaboration between IGN and the Advanced Information Systems Group (IAAA) of the University of Zaragoza with the technical support of GeoSpatiumLab (GSL).

In the next section, we present the XML file design simplified in some schemes to facilitate understanding.This file comes to be the result of proper and orderly enter the metadata in the geographic metadata editor CatMDEdit, which follows the standard UNE-EN ISO 19115: “Geographic Information.Metadata”, and export the design in accordance with standard ISO 19139 “Geographic Information - Metadata - XML Schema Implementation.”

3 Results/Design

Below is an outline of how to sort the metadata belonging to the model in the standard UNE-EN ISO 19115.The orange boxes correspond to the entities belonging to the standard, and the green ones to metadata contained in the proposed simplified definition.The blue boxes (step process) and purple (source) boxes correspond to oranges, but are highlighted to clearly show which ones are the points where they leave the branches that are presented separately.It is divided into three parts because the whole scheme becomes illegible to a simple folio size.

This scheme clearly indicates how the metadata are grouped naturally within the file by its identification (Identeificación), content (Contenido), quality (Calidad) and lineage (Linaje), with the "step process" (Paso de proceso) and the "source" (Fuente) that group respectivelymetadata relating to the acquisition instrument and meteorological station. “Extensión” means extent and “Informe” means report.

The metadata present in the illustration above are relative to the instrument that creates the data entering the database, which in this point of development of the project is the instrument of acquisition.They are divided between the "Mención" (Citation) that identifies the instrument, the "Extensión" (Extent), indicating that geographical area covering the data, and the "Pasos de fuente" (Source Steps) not more than process steps relative to the source, which in this case are used to specify the maintenance and calibration performed on the instrument.

This illustration shows the metadata related to the meteorological station (one or several) where the instrument is installed.In the "Mención", it is identified, and in the "Extensión" it is geographically described.The model assumes that the metadata of a weather station are equal for all data coming from any instrument of that season.

In this implementation are ignored, in favor of the simplicity of the model, certain features such as data type you have (raw data, product, etc.), or the various actions you can take an instrument (acquisition, processing, etc.).This implies that the instrument is taken into account is always the one that creates the file in which data are delivered to the database, except for maintenance and calibration, which provides that can be made for any instrument (sensors ofAWS or other that do not deliver data directly.)It is left for future development the field “accion” (action), belonging to metadata relating to the instrument, to be allowed to evolve giving support to various possible actions on an item, such as the acquisition (the only currently supported), processing, storage, etc.