Proposal

for

Data and Quality Assurance Management

in the

URGENT Programme

URGENT Data ManagementCommercial-in-ConfidenceProposal

Introduction

This proposal has been prepared to provide provision for data management within the NERC URGENT Programme. It is divided into three sections. Section 1 summarises the recommendations made in the database scoping study commissioned by the URGENT Data Management and Quality Assurance (DMQA) Committee. Section 2 lists the main data management and quality assurance requirements identified by the rounds 1 and 2 project leaders during the URGENT workshop held on 8th September 1998. The last section provides brief task descriptions and estimated timescales.

The first two rounds of the URGENT Programme are now well underway and it is important to set data standards as soon as possible so as to avoid waste of resources later in the programme.

1

Issue 2

URGENT Data ManagementCommercial-in-ConfidenceProposal

Contents

Introduction......

Contents......

Section 1 - Recommendations of the Database Scoping Study......

Data Management......

Data Management Infrastructure......

Role of Data Centres......

Role of DMQA Committee and User Advisory Committee......

Proposed Activities......

Task 1 - Data Management Plan......

Task 2 - Quality Assurance Manual......

Task 3 - Data Provision to URGENT projects......

Task 4 - Data Management......

Outputs......

Task 5 - Directory of Urban Regeneration Information and Data Sources......

Task 6 - Programme CD-ROMs and other products......

Section 2 - URGENT WORKSHOP - 8 September 1998......

Task 7 - Practical training and guidance in data management policy, practice and QA

Documentation on data management......

Data formats......

Task 8 - Lexicon of agreed terms......

Metadata standards to be defined and disseminated......

Clarification of IPR and confidentiality position......

Policy / guidelines on dissemination......

Task 9 - International sampling standards and procedures......

Task 10 - GIS Harmonisation and Co-ordination......

Section 3 – Task Descriptions and Timescales......

Main Tasks......

Task 1 - Data Management Plan......

Task 2 - Quality Assurance Manual......

Task 3 - Data Provision to URGENT projects......

Task 4 - Data Management......

Task 5 - Directory of Urban Regeneration Information and Data Sources......

Task 6 - Programme CD-ROMs and other products......

Additional Tasks......

Task 7 - Quality Assurance Advice, Training and “Quality Assurance Manager”..

Task 8 - Lexicon of Agreed Terms......

Task 9 - International Sampling Standards......

Task 10 - GIS Harmonisation and Co-ordination......

Task will be ongoing to the end of 2000.......

Appendix 1 Summary Programme.......

1

Issue 2

URGENT Data ManagementCommercial-in-ConfidenceProposal

Section 1 - Recommendations of the Database Scoping Study

Data Management

It is essential for the URGENT Programme to have an organisational and technical infrastructure in place to ensure that the data requirements of both the researchers and the end-user communities are satisfied. Such an infrastructure should ensure that any data brought together or generated by the project consortia are managed to the agreed standards and made available for the duration of the Programme and beyond.

NERC Data Policy requires that recipients of NERC grants offer to deposit, with NERC, a copy of datasets resulting from NERC-funded research, for use by other bona fide researchers. These datasets are then accorded long-term stewardship by NERC. The DMQA Committee considers that it is essential that, in order to ensure long-term security and archiving of data arising from the URGENT Programme, and hence to conform to the requirements of the NERC data policy, the tasks associated with data management should remain the responsibility of NERC.

The data management should be carried out by organisations that are accustomed to providing, managing and disseminating data and which have a lifetime that extends beyond the duration of the Programme itself. The NERC Data Centres form a well-established network of specialist data managers and data providers serving the NERC community, including Thematic Programmes like LOIS and Environmental Diagnostics (ED), and the DMQA Committee recommends that these NERC Data Centres should have responsibility for data management within the URGENT Programme.

A network of URGENT Data Centres should be established, with each Designated Data Centre responsible for URGENT data in its own area of expertise. This will create a distributed system, unified by common standards and practices overseen by the DMQA Committee. This approach offers URGENT a workable and cost-effective solution, which has been shown to work well in previous NERC Programmes, including LOIS and ED.

The Data Centres will provide the data management infrastructure for the Programme by:

identifying data requirements across the URGENT Programme;

managing the acquisition of major datasets;

receiving data and model output generated by the Programme;

ensuring long-term security of the data;

disseminating Programme data after the completion of URGENT.

It is suggested that the responsibilities of the four URGENT Data Centres, in line with their expertise, should be:

Environmental Information Centre (EIC) responsible for biological, ecological, land-use and land-cover data;

Institute of Hydrology (IH) responsible for fresh-water-quality and hydrological data;

British Atmospheric Data Centre (BADC) responsible for air-quality, atmospheric and meteorological data;

British Geological Survey (BGS) responsible for land-quality, geological and geochemical data.

Data Management Infrastructure

Role of Data Centres

Data Centres will provide the data management infrastructure for the Programme. The role of the Data Centres is to:

identify data requirements;

acquire major data sets;

provide data-management facilities;

receive data and model output generated by the Programme;

ensure long-term security of the data.

Role of DMQA Committee and User Advisory Committee

The DMQA Committee should have a budget and the authority to spend it. It should receive and consider proposals for the funding of activities at URGENT data centres.

These funds will be used for purchasing data common to many projects, activities relating to the harmonisation of data, the development of standards, the assembly and publication of data, organising meetings and other appropriate tasks.

The role of the DMQA Committee is to:

report to the Steering Committee;

co-ordinate the activities of the Data Centres;

set standards;

manage its budget;

exploit the data.

The membership of the DMQA Committee, after the initial policy-making phase, should change to incorporate those people who will be actually implementing the data policies for URGENT. The LOIS Data Committee evolved in this way and proved highly successful. The DMQA Committee will plan for the publication of the data at the end of the Programme.

The key elements of the plan should be:

identifying the data to be published;

deciding the form in which the data will be published (e.g. flat files, spreadsheet tables or a totally integrated database);

deciding the level of documentation required;

preparing a timetable for publication;

design and development of any software needed to store and display the data;

assembly of the data sets;

the design and production of associated printed material;

printing, publicity and dissemination.

The scientists must be made aware of the intention to publish the URGENT data and the timetable at an early point. This awareness must be maintained throughout the Programme. The DMQA Committee and Data Centres in collaboration with scientists will actively seek industrial and academic partners with whom to exploit the data.

Proposed Activities

To ensure efficient data management in the URGENT Programme, within the constraints of NERC policy, several key activities should be commissioned by the URGENT Steering Group. Outline specifications for these are given below, with suggested lead organisations.

Task 1 - Data Management Plan

The NERC Data Policy requires that the URGENT Programme should make arrangements for the provision, management, dissemination and long-term stewardship of data. The DMQA Committee should prepare and publish these arrangements as a Data Management Plan. Staff of the relevant NERC Data Centres and members of the Scoping Study Team should contribute to the preparation of the Plan.

It is important that the Plan should define the obligations of project consortia by providing information and guidance on the following key aspects of data management:

metadata standards;

data management infrastructure (e.g. responsibilities and co-ordination of the NERC Data Centres);

quality assurance procedures (with reference to a Quality Assurance Manual);

data standards (e.g. documentation, file formats and exchange protocols);

plans for data dissemination (e.g. catalogues, WWW and CD-ROM);

intellectual property rights (e.g. period of exclusive rights to exploit data);

long-term stewardship of data (e.g. deposition of data with NERC).

Task 2 - Quality Assurance Manual

Since a great deal of effort has been expended in this important area for the Environmental Diagnostics (ED) Programme, it is appropriate for the Quality Assurance manual for the ED Programme to be expanded to cover the QA requirements of the URGENT Programme. It is important that the QA manual is accessible with illustrative examples of good practice.

Task 3 - Data Provision to URGENT projects

In addition to providing access to data held by the NERC Data Centres, it will be necessary to appoint Data Centre representatives to negotiate access to and/or purchase data on behalf of PIs where a corporate approach is judged useful (for example, OS map data). Experience demonstrates that significant cost savings can be made by the use of Data Centre representatives in this way. The DMQA Committee should review, on a case by case basis, requests for expensive data sets not included in project budgets. In some cases, it may be necessary for the PI to revise his/her initial request and, with assistance from Data Centres, specify an acceptable alternative.

The Data Centres will:

facilitate access to data they hold;

provide advice on data sources and alternatives;

negotiate access to and/or purchase data as appropriate.

A budget should be made available to the DMQA Committee to support community-level data acquisition advantageous to the URGENT Programme as a whole. Full investigation of access to data by other means, such as data trading, will be investigated to ensure that the most cost effective acquisition of major data sets is obtained.

Task 4 - Data Management

The purpose of data management is to ensure that any data brought together or generated by the project consortia are managed to the agreed standards and made available for the duration of the Programme and beyond.

The Data Centres will ensure:

sustainability of data (they are held and archived securely);

compatibility of data format between URGENT projects and existing NERC data holdings;

appropriate protection of IPR;

confidentiality where necessary;

integration of data with that held by NERC and arising from other programmes;

Quality Assurance of the data archive;

compliance of award holders with agreed standards.

Outputs

A key objective for the URGENT Programme is to make URGENT data widely available so as to attract maximum value to researchers and users. Some of the project consortia will also be making data available through the development of models, GIS and decision-support systems, and there is scope for the development of data delivery systems, though these may be project or sector specific.

There are also advantages to retaining certain data dissemination activities within NERC Data Centres. These include a holistic approach to Programme data (rather than project/area specific), familiarity with the data and data ownership, and data security. It is also important to consider that a significant resource of raw data for URGENT projects is already held by NERC Data Centres.

Task 5 - Directory of Urban Regeneration Information and Data Sources

Part of the overall Data Dissemination Strategy should involve the development of a Web-enabled metadata catalogue or directory of urban regeneration information and data sources. Development of Web access will require the provision of a “virtual one-stop shop” for URGENT data – a web site distributed across the Data Centres with a common look-and-feel, offering publicity, information, a catalogue and ultimately data from URGENT. This will be more effectively achieved within NERC Data Centres.

Task 6 - Programme CD-ROMs and other products

The URGENT Programme may wish to produce specific data products such as CD-ROMs (a field in which the Institute of Hydrology has considerable experience, having recently completed CD-ROMs for the LOIS Programme) and data services (e.g. Web access to data). There are several mechanisms and technologies available which require further investigation before a Data Dissemination Strategy can be developed for the Programme.

The development of suitable information systems for data dissemination should come in the latter half of the Programme, as projects develop and the nature of data outputs is better understood. As the Programme progresses, alternative dissemination options should be assessed by the DMQA Committee on behalf of the Steering Group.

1

Issue 2

URGENT Data ManagementCommercial-in-ConfidenceProposal

Section 2 - URGENT WORKSHOP - 8 September 1998

During the URGENT workshop held on the 8 September 1998 a number of data management related tasks, actions and requirements were identified and/or reinforced by PI groups during the seminar sessions. The principal issues raised are detailed below. The majority of these will be addressed by the work recommended by the DMQA Committee (Section 1 above). However, a few of the activities that PIs would like guidance on are not addressed above, these have been included as additional tasks in Section 3 below.

Task 7 - Practical training and guidance in data management policy, practice and QA

Many of the PIs indicated that they would welcome training and additional guidance in data management policy and QA matters, possibly including QA auditing. The provision of the Data Management Plan and Quality Assurance Manual detailed above and liaison with Designated Data Centre should adequately address the data management and many of the QA related aspects discussed at the workshop. However, there is a demand for practical training in the application of QA and there is also the need for the provision of a point of contact for advice on QA related matters as they arise. The former needs to be addressed on a project by project basis by meeting project teams to identify and discuss their requirements and demonstrate how their existing procedures can be adapted to meet the requirements of the manual. A brief project specific Quality Plan will be prepared and agreed. A follow-up visit will be made to monitor application of QA to projects but formal QA auditing will not be undertaken. The latter can be achieved by provision of a “Quality Assurance Manager” for the URGENT Programme to provide ad hoc advice on QA related matters as required.

Documentation on data management

There was a clear demand from participants at the workshop for documentation on data management guidelines. This will be addressed by the Data Management Plan (Task 1 above) which can be made available to all URGENT projects on when available.

Data formats

Similarly there was a clear need for the distribution of guidelines/defined formats from the DDSs. Again this will be addressed by the Data Management Plan (Task 1 above) which can be made available to all URGENT projects when available. However the guidelines need to be made available quickly as several projects are starting to accumulate their data and will be building databases in the near future. These guidelines will be circulated as soon as available. PIs will also be notified of their designated data centre and a “point of contact” within that centre with whom they can discuss data issues and agree data formats.

The need for a project specific “sample number prefix” was discussed at the workshop and will also be addressed by the Data Management Plan (Task 1 above).

Task 8 - Lexicon of agreed terms

Several project representatives at the workshop highlighted a problem of the definition of terms used within projects. It is essential that all members of the URGENT community have access to agreed definitions to ensure appropriate use of information. In many of the scientific fields represented in URGENT suitable lexicons exist. These need to be reviewed to ensure compatibility and to identify areas where confusion may arise (e.g. definition of nitrate) or those that are not currently addressed.

Access to the agreed lexicon(s) should be via an appropriate WWW site (URGENT or NERC) which can be updated if required.

Metadata standards to be defined and disseminated

There was a clear demand for metadata standards to be defined. This will be addressed by the Data Management Plan (Task 1 above) which can be made available to all URGENT projects on when available.

Clarification of IPR and confidentiality position

The URGENT Programme is making extensive use of data from third party sources, from consortia partners or “interested” parties, such as local authorities. Some of these data may be confidential to the supplier/owner or be “sensitive” for other reasons (such as blight related issues) and the need to maintain confidentiality of such data needs to be addressed within consortia and within URGENT Programme as a whole. Guidelines will be provided in the Quality Manual (Task 2) but agreement between the data owners/suppliers, PIs and the appropriate DDC will be required on a case by case basis.

Policy / guidelines on dissemination

Plans for dissemination of URGENT data will be established by the Data Management Plan (Task 1 above).

Task 9 - International sampling standards and procedures

There was some demand from the workshop for information on national and international sampling and analytical procedures. A list of relevant references will be produced and made available via e-mail/WWW to the URGENT community.

Task 10 - GIS Harmonisation and Co-ordination

It is clear from the presentations and seminar discussions that many of the URGENT projects currently underway use or are intending to use GIS. It is likely that many of the third round projects will also be making extensive use of GIS. There is a need to review the approach to GIS and spatial data structures at an early stage in the Programme to ensure integrated development within URGENT so that coherence, and the desired synthesis, can be achieved where necessary on completion of the Programme. Good communication and harmonisation of GIS across the Programme now will provide improved information compatibility and, eventually, ease of analysis and dissemination.