RA-II/WG-IOS/WIS-1/Rep.5.1(4), p. 1

World Meteorological Organization / RA-II/WG-IOS/WIS-1/Rep.5.1(4)
RA II WORKING GROUP ON WMO INTEGRATED OBSERVING SYSTEM AND WMO INFORMATION SYSTEM (RA-II/WG-IOS/WIS) / Submitted by:
Date: / 25.XI.2011
FIRST SESSION
SEOUL, REPUBLIC OF KOREA
30NOVEMBER - 7 DECEMBER 2011 / Original Language: / English
Agenda Item: / 5.1

Climate Data Rescue Technology

(Submitted by Vladislav Shaymardanov)

Summary and Purpose of the Document

This document includes the report on the status of Climate Data Rescue Technology

Introduction

Let us consider the solution of the problem of climate data rescue and management by an example of Roshydromet’s Archive System. For this purpose, Roshyrdomet updated its Archive System. Robotic libraries were created where data were rewritten from various media such as magnetic tapes, cartridges, CD’s, etc.

1. Roshyrdomet’s Climate Data

1.1 Composition and volume of information and media types

Roshydromet’s Archive System stores a unique collection of observation data on environmental conditions. One part of this collection was stored on computer-readable media (magnetic tapes, machine-readable media), the other – in hard copies (paper medium, photographic documents and microfilms). Detailed information on composition and volume of the Archive Fund is given in Table 1. The volumes are given in Terabytes (TB), at that for Table sections 2-4, the calculated volumes are given of documents imaging after they have been scanned (in graphics format).

1.2 Structured Data on Machine-Readable Media

This part of Archive System data is heterogeneous in its status, content, origin and media types. The most important part of the archive fund is observation data on environmental conditions for1881-2005 (Table 1, line 1). They are typically stored on outdated media such as ½-inch magnetic tapes. Now most tapes are being physically degraded. The most part of them has been additionally writtento IBM 3480 cartridges which are also outdated as well as to compact disks that cannot be considered to be media for long-term storage

All data of this type are divided into several large sections by observation types (meteorology, aerology, oceanography, etc.) Each section consists of a number of bunches of files homogeneous both in content and data format. Such bunches of homogeneous files are commonly referred to as archives. Each archive is written in a certain format and found in a different number of ES MT volumes (from one to one thousand and more).

These data category is divided into two large groups by its structure (used formats):

  • archives in Hydrometeorological Data-Description Language formats (HDDL – Roshyrdomet’s industry standard) – HDDL-archives (over 80%);
  • archives in other formats; including international onesm for example BUFR, GRIB (less than 20%).

When data are included in the Fund (included those in machine-readable form) the principle of data storage in the form they were obtained is used. An exception is made for cases when this principle cannot be followed, for example, in case of fundamental change in storage media. The transition from punch cards to magnetic tapes for data storage can serve as an example. In this period, a departure from this principle occurred; all data were transferred from punch cards to HDDL-formats.

1.3 Data stored in hard copies

This part of data is under a threat of physical degradation both due to decay of most media and impossibility to maintain normal storage conditions.

The use of this Fund part for customer service is difficult since there are no electronic copies, data have not been digitized and cannot be processed by computing devices and transmitted over communication networks.

The specifications of Fund documents stored in hard copies id given in the Table.

In such a way, the total volume of hard copies amounts to about 220 mln sheets, including as follows:

  • over 90 % – A3 (over 60 %) and A4 documents (over 30%),
  • unbound documents amount to 90 % out of the total volume, a little more than 5 % of which (mainly in A3-format) can be unbound and then bound again without substantial damage to the document,
  • About 30 % of documents are worn-out,
  • A little more than 5 % of documents are hand-written,
  • There are no documents on color and grey paper, however, it should be taken into account that part of the papers have turned yellow with time.
  • Over 92 % of the documents are two sided.

Table 1

Archive Fund Composition and Volume

No. / Type of Data / Type of Media / VolumeTB
1. Structured Data on Machine-Readable Media
1.1 / Observation data on environmental conditions for 1881-2005 / Magnetic tapes / 1.0
IBM 3480 cartridges / 0.5
1.2 / Observation data obtained through international exchange / CD / 4.0
1.3 / Derivative datasets (station, quasi-synoptic, synoptic, statistical and others) / Different mediaPC / 2.5
1.4 / Reanalysis datasets and outcomes of experiments comparinggeneral circulation models obtained through international exchange. / Different media / 450
1.5. / Satellite data and their products (digital formats) / 300
2. Unstructured Data on Machine-Readable Media
2.1 / Satellite data (electronic graphics formats) / Different media / 700
3. HardCopyMaterials
3.1 / Tabular and text based materials (monthly publications,year books, RV reports, foreign and Russian publications, etc.) / Hard copies / 1100
3.2 / Tabular materials (Tables TM-1, TM-3, THM, TAE-16, etc.) – 126 thousandstorage units / Photocarrier(microfilm ) / 1100
4. Unstructured HardCopyDocuments
4.1 / Map materials (weather charts and bulletins) – 15 thousandstorage units / Hard copies / 350
4.2 / Map materials (satellite photos) – 244 thousandstorage units / Photocarrier / 10

Along with permanent storage of data, Roshyrdomet’s Archive System update allows to move climate data from basic archives to products supplied to customers including support of three levels of data (basic archives, derivative datasets or archives, databases) and correspondingly of three main Archive System subsystems:

Storage subsystem, providing storage of basic and derivative archives and handling data mainly at the level of files;

Processingsubsystem, ensuring formation of a layer of derivative datasets (archives) and handling data both at the level of files and at the level of their content;

Service subsystem, ensuring formation of datasets and databases needed for service at the level typical for DBMS and different data analysis systems.

In the transition from the first to the last subsystem, the logic of their functioning varies in a wide range from the execution of a set of strictlyregulated process operations in the storage subsystem to informal use of numerous data analysis, processing and visualization tools according to customer requirements for the storage subsystem. The processingsubsystem occupies a kind of intermediate position.

In the Archive System, the following three functional technological units correspond to the above three subsystems distinguished to a great extent by the content of data handling operations (i.e. by the semantical principle).

  • Basicarchive
  • Operationalarchive
  • Informationserviceandexchange

Below is given a list of main functions of these three units.

Basicarchive

1)Long-term (permanent) datastorage

2)Transfer of hard copy information to machine-readable media

The basic archive also logically includes the recovery of original hard copy documents. In this paper, these efforts belong to supporting components, see below “Paper Document Cluster”.

Operationalarchive

1)Data management imcluding data exchange between the subsystems, creation of derivative datasets, provision of data access, creation and update of catalogs, etc.;

2)Data collection, processing and accumulation;

3)Transfer of information stored on outdated and heterogeneous media to machine-readable media (emergency measure for recovering this information).

4)Informationserviceandexchange

1)Creation of information products

2)Data and information access and exchange.

2 Technical Implementation of Roshyrdomet’s Archive System

During the hardware selection needed for the upgrade, an analysis was carried out of storage reliability and durability of different media as well as media available and used in the major World Data Centers storing a large amount of information. Magnetic tape was selected as information medium based on the analysis; and a Robotic Tape Library was adopted taking into account data integrity, their volume and automation.

Among existing tape libraries of different manufacturers (IBM, HP, SUN, Quantum), other conditions being equal, we have chosen IBM TS3500 tape library. It is this library that meets all the requirements for capacity, scalability and also, not least importantly, service maintenance experience has been accounted for.

IBM z9BC server is used to manage the libraries. In the first place, the choice was dictated by existing technologies for working with archive data on magnetic tapes.

The technologies and software for archive data reading, processing and control have been installed on IBM 4381 to determine the need to use z/VM operating system for applying existing solutions. z/VM operating system supports the drivers needed for connecting tape drives and IBM 3480 cartridge subsystems and allows to transfer all technologies and software without additional alterations.

Besides magnetic tapes and IBM 3480 cartridges information can be transferred from other outdated and heterogeneous media (Fig. 1) including:

  • streaming tapes of various types,
  • floppy and compact disks,
  • magnetooptical disks, etc.

IBM z9 BC architecture allow us to create several independent virtual machines (LPar). z/VM operating system has been installed and from one to several virtual machines have been created on each LPar (depending on the number of tasks). Most virtual machines have zLinux operating system (Fig. 2).

Fig.1. Magnetic Tape Rewriting Technology

Fig.2. IBM z9 BC Program Environment

Such an approach is associated with the need to separate different processes launched and executed on the server separately. IBM System Z Company’s servers, to which z9 BC belongs, have the most successful implementation of this approach, i.e. all processes have almost physical separation allowing to reliably and efficiently deploy different technologies on a single server.

100 TB IBM DS8300 is used as a disk storage system. This choice is due to the need to minimize maintenance and implementation costs (the same manufacturer), avoid potential difficulties in joints that can be faced when using devices of different manufacturers. In addition, IBM DS8300 had the best performance parameters in the industry, e.g. data rates over fiber-optic cable was up to 4 Gb

IBM DS8300 configuration makes it possible to divide the storage system into two storage subsystems within a single device in proportion of 25% and 75%. Such division allows us to create partitions within a single disk array whose data are separated from each other. One partition (25% of the total volume) is used for working with IBM z9 data and has the relevant file structure. The other one (75% of the total volume) has the file structure for working with open operating systems (Windows, Linux).

Fig.3. Roshyrdomet’s Archive System Scheme

As seen from the Archive System scheme (Fig.3), two libraries of the same model having the same capacity are used for storage. Such an approach is associated with the need to ensure high reliability of observation data storage. The concept for the Archive System building allows two identical copies of climate data in two libraries. If one copy is lost, there is another copy; and the time needed to recover data will be very short.

IBM Tivoli Storage Manager standard tools are used to manage data indide the Archive System.

3 Hard Copy Recovery

Besides data storage on different electronic media, Roshyrdomet has a large amount of data in hard copies. All hard copies, if possible, should be scanned as soon as possible to improve their accessibility.

Usually, information storage item in hard copies is a book containing data and information in the form of tables, graphs, images, etc. Each book has its title, table of contents, brief annotation and other attributes characterizing this item to identify it for the acquisition of hard copy fund.

Roshyrdomet’s Archive System storage items are data files. These items (data files) are managed using special software. To correctly present data images from hard copies in a robotic library environment and efficiently maintain, search and obtain information from the Archive System, a formalized model was developed to describe storage items in the automated system for their unambiguous identification.

It should be noted that the storage item is a PDF-file consisting of a sequence of images of scanned pages as in hard copy.

To solve the problem of digitizing paper documents the following three scanners are used:

  • High-performance color sheetfed scanner for unbound A3 documents.
  • Two planetary scanners (monochrome and color) for bound A2 documents.

A scanning place has been created where data are converted to PDF –format. After a PDF-file is created it is transferred for storage on IBM Content Manager On Demand.

3.1 Scheme of Digitizing Paper Documents.

Figure 4 shows the process scheme of digitizing paper document data.

Рис. 4. Paper Document Digitization Process.

The whole process of digitizing paper documents can be divided into several stages:

1. Item scanning and description, and creation of a data package.

2. Creation of a PDF-document using digital images.

3. Transfer to long-term storage media.

After the three stages have been completed the document becomes accessible in electronic form through a standard browser.

3.2 Search Technology in the Archive System.

A universal pattern of digital object description has been developed to facilitate the search of needed documents.

The search system is accessible through a standard Web-browse (Fig.5). Internet access is planned for nearest future. The windows display fields in which documents are searched. The search results are displayed in the same window after which the customer can download the document to his local disk.

Fig. 5. Screenshots of CMoD Interfaces

Conclusion

The above technological approaches and solutions allow to:

  • Provide a reliable permanent storage of information by transferring data from outdated media and recover data.
  • Improve the quality of customer service with information and information products.
  • Enhancedigitalmediadataaccessibility.
  • Provideaccesstothedatainhardcopiesandphotomedia.

Roshyrdomet’s Archive System data recovery experience can be used by National Weather Services.

The main conceptual solutions used for building Roshyrdomet’s Archive System were used in technical upgrade projects in National Weather Services of Central Asia States

______