ITR/AP+IM+SI+SY: A Prototype Knowledge Environment for the Geosciences

A.Project Summary

We propose to create a prototype Knowledge Environment for the Geosciences (KEG) that demonstrates a seamless, virtual laboratory for Earth system science research and education. This environment is a platform for fundamental IT research enabling advances in methodologies and tools for distributed, large-scale collaborative research, knowledge evolution, and distributed information environments. It will organize research products into a searchable, shared group resource and thus a knowledge-based problem-solving environment for geosciences. It is targeted at the heart of the research activity—the process—not the end result. This will be the first time a comprehensive knowledge system has been proposed to integrate frontier research on high performance simulation of the Earth system with both archival and interactive geosciences learning environments. The complexity of a geophysical knowledge framework is a new and fertile context for basic IT research. Moreover, the proposed research is structured to foster iteration among IT and the core geophysical problems, whereby basic IT research contributions spur new geosciences developments that in turn pose new IT research challenges.

The need to understand the physical and biological processes that shape our environment is a grand challenge for this century. The possible influence of human activities on the Earth system and the fragility of a complex global economy to severe natural events make this an urgent problem. However, the Earth/Sun system operates at disparate spatial and temporal scales and advancing our understanding of this system requires a vast array of observational data, many scientific disciplines, and scientific models. Moreover, scientific understanding of the environment must be accessible to diverse groups, from scientists to policy makers and from educators to students. Traditional modes of scientific research are stymied when applied to the breadth of scales encountered in the geosciences and often only reach a limited audience of specialists. To meet this grand challenge, new ways of collaboration and dissemination will be necessary that leverage IT. This proposal will contribute to the digital infrastructure vital for the next generation of geosciences research.

The prototype knowledge environment for the geosciences proposed here will be assembled in three layers:

1)Interaction Portal (IP)

2)Knowledge Framework (KF)

3)Multiscale Earth System Repository (MS)

The IP is the connection between the community and the knowledge environment and consists of tools, components and interfaces built upon the common fabric of the Knowledge Framework. Some areas of emphasis for IP are a common code development environment, visualization and supporting interactions among a geographically distributed group. The KF mediates knowledge between the MSESM and the IP using principles of encapsulation, polymorphism, and data abstraction to facilitate interdisciplinary research with a set of distributed methods, classes, and tools. Finally, the MSESM generalizes current Earth system models and will be a linked hierarchy of models at several scales. The prototype multiscale model will have a nonhydrostatic atmosphere with interactive chemistry and cloud microphysical processes.

A fundamental component of our system design is a shared Earth system modeling framework that will provide a “commons” for university and NCAR computer and Earth scientists to compare, test, and evaluate new tools and methods for modeling complex Earth system processes. This diverse scientific effort will be organized and archived with “middleware” that enhances opportunities for applications of the model products to research on impacts and consequences of weather and climate variability. The final modeling and analysis products will be transmitted to the Digital Library for Earth System Education [DLESE] for peer-review and posting in the collection. The integrative nature of learning about the Earth demands a core information technology infrastructure that makes distributed learning a reality—the time to act is now.

The IT research will be accomplished through a multidisciplinary scientific team with expertise ranging from knowledge representation, reasoning and problem solving environments, collaboration research to parallel computation, scientific visualization and data analysis to human-computer interaction, software process and architecture. Support for the substantive geophysical model development is also broad and leverages the considerable resources of the participating universities (Courant Institute of Mathematical Sciences, Howard University, Purdue University, Stanford University, University of Alabama-Huntsville, University of California at Los Angeles, University of Chicago, University of Colorado, University of Illinois Urbana-Champaign, University of Michigan, and University of Wisconsin) as well as the National Center for Atmospheric Research.

1

ITR/AP+IM+SI+SY: A Prototype Knowledge Environment for the Geosciences

B.Table of Contents

A.Project Summary

B.Table of Contents

C.Project Description

1The Information Technology Revolution for the Geosciences

1.1National and Global Context

1.2A Vision for Enabling Virtual Communities of Researchers and Educators

1.3NCAR’s Role

2Elements of the KEG and Related Work

2.1Problem Solving Environments]

2.1.1Problems/limitations with existing systems: Need integration, scalability, etc…

2.2Collaboratories and Related Infrastructure

2.3Portals for Scientific Research & Education

2.3.1An Environment for Hypothesis Development and Testing

2.3.2Frameworks for Realizing the Portal

2.4Scientific Data: Complex, Diverse, and Very Large

2.5Mining: Data, Information, and Knowledge

2.6Discovery of Information, Data, Software, Tools, and Knowledge [Jessup?

2.7Visualization: Multiscale and Terascale

2.8Advanced Collaborative Environments

2.9Next-generation Multiscale Earth System Models [Tribbia, Ghil]

2.10Distributed Group Development of Frameworks, Tool, Models, and Agents

2.11Executing and Managing Simulation Processes

2.12Knowledge Systems: Ontologies for the Geosciences

2.13The IT Challenges: Scalability, Overall Integration, IT Research [Middleton/Fox/Hammond]

3Research Design and Methods

3.1The Concept

3.2IT Research Challenges

3.3KEG Definitions

3.4Goals, Requirements, and Characteristics

3.5Design

3.6Architecture and Enabling Frameworks

3.6.1Detailed description of the architecture

3.6.1.1IP layer

3.6.1.2KF1 layer

3.6.1.33.6.1.3 KF2 layer

3.6.1.43.6.1.4 KF3 layer

3.6.1.53.6.1.5 KF4 layer

3.6.1.63.6.1.6 MS layer

3.73.7 Outcomes

3.7.13.7.1 Infrastructure

3.7.2Education and research

3.7.33.7.3 Services

3.7.43.7.4 An expandable framework

3.83.8 Software Engineering Challenge

4A Multi-scale Earth System Model

4.1Definition:

4.2The Problems: Modeling and Software

4.3The Approach

4.4Goals and Outcome

5A KEG for Everyone!

5.1Deliverables

5.2Technology Transfer

6Education and Outreach

6.1Outreach to the Scientific Community – Summer Design Institutes

6.2Outreach to Communities

6.3The K-12 Educational Community – K-12 KEG

6.4Outreach to a Diverse Community – Collaboration with SOARS

6.5Outreach to the Public – Sharing Information about KEG:

7Usage Scenarios

7.1Hurricane Landfall (HAL) Test Bed

7.2El Nino Southern Oscillation (ENSO) Test Bed

7.3Megacity Impact on Regional And Global Environment (MIRAGE) Test Bed

8Broader Impacts

9Management Plan (up to three pages in length) [hammond]

10Prior Results

D.References Cited

E.Biographical Sketches

F.Proposal Budget [hammond]

G.Current and Pending Support

H.Facilities, Equipment, and Other Resources

I.Special Information and Supplementary Documents

J.Appendices

K.Attic

11Data Mining from UAH

11.1Data mining in a distributed Environment

11.1.1Goals

11.1.2Requirements

11.1.3Basics

11.1.4Applying Data Mining

1

ITR/AP+IM+SI+SY: A Prototype Knowledge Environment for the Geosciences

C.Project Description

1The Information Technology Revolution for the Geosciences

1.1National and Global Context

Over the last 30 years, the global population has doubled, carbon dioxide concentration has increased from 315 to 370 ppm, and the mean global temperature has risen from 13.9 degrees C to 14.4. A gaping ozone hole appears every spring over Antarctica and another seems to be developing over the Arctic. Air and water pollution problems are global in scale. Never before has the need to understand our planet, the complex interactions of its processes, and our own impact upon the system been as urgent and compelling as now. The unique challenge of the geosciences is to address as a whole the many interlocking processes in the atmosphere, oceans, land surfaces, ice sheets, and biota that together determine the behavior of the planet. This holistic set of processes requires an earth systems approach to global and even regional and local problems, combining many specialties in a way that is not required in other scientific pursuits. The research issues are ceasing to be the purview of any single discipline and span multiple communities with stakeholders in the areas of education, environmental and societal impacts, and multiple earth system disciplines.

Detailed observations of the Earth, distributed and diverse data and information holdings, powerful simulation and analysis capabilities, knowledge holdings, and collaboration environments – to name but a few - clearly have tremendous potential to elevate our knowledge and understanding of our planet. The information technology revolution brings us unprecedented new capabilities that offer substantial promise for integrating these resources and turning them into powerful new tools and environments. As we consider our future, however, simple extensions of extant technologies and methodologies will not begin to address our requirements. A new era of scientific discovery is within reach - if these new capabilities can be effectively harnessed in the service of science. [Too vague, more work – don]

1.2A Vision for Enabling Virtual Communities of Researchers and Educators

A centerpiece of NCAR’s long-term vision is to develop a Geosciences Decision Support Environment to substantially improve our understanding of and to provide accurate and timely information about the Earth system in which we live. This information and the decision support environment itself will be used to facilitate and accelerate fundamental scientific research, enrich education programs and to feed into policy decisions and assessments. This vision is consistent with the PITAC report [PITAC99], “Research is conducted in virtual laboratories in which scientists and engineers can routinely perform their work without regard to physical location—interacting with colleagues, accessing instrumentation, sharing data and computational resources, and accessing information in digital libraries.”

To make strides toward realizing this vision we propose to create a prototype Knowledge Environment for the Geosciences (KEG) that will produce a knowledge enabled collaborative problem-solving environment for Earth system research and education. This environment is a platform for fundamental IT research enabling advances in methodologies and tools for distributed, large-scale collaborative research, knowledge evolution, and distributed information environments. It will organize research products into a searchable, shared group resource that underlies a compelling concept: a knowledge-based problem-solving environment for geosciences research, education, and assessment. It is targeted at the heart of the research activity—the process—not just the end result. This will be the first time a comprehensive knowledge system has been proposed to integrate frontier research on high performance simulation of the Earth system with both archival and interactive geosciences learning environments. The complexity of a geophysical knowledge framework is a new and fertile context for basic IT research. The proposed research is structured to foster iteration among IT and the core geophysical problems, whereby basic IT research contributions spur new geosciences developments that in turn pose new IT research challenges.

1.3NCAR’s Role

NCAR’s primary function is to serve as an integrator of people, disciplines, methods, technologies, and activities in the pursuit of advancing the national research agenda. It also acts as a catalyst, bringing together many specialists, disciplines, approaches, technologies, and activities to propel the science forward. While these roles have traditionally been in the context of earth system research, they must now extend into the information technology realm as well if geoscience is to achieve the progress that is needed.

NCAR is well positioned to play a prominent role in motivating the evolution of information technology research in the context of the geosciences. Broad community projects and large-scale simulation efforts push the envelope of what’s possible, and serve as harbingers of future community needs. In this proposed work we team earth system researchers with their counterparts in computational science in order to develop new understanding of the problem domain and to attack the basic research problems in computational science. This synergistic partnership is crucial to advancing the research agenda for all of the disciplines involved.

NCAR has a responsibility to foster the development of important, long-term community infrastructure and to support it as a persistent resource for research. This role complements this work by providing a path for sustaining and providing longevity for the prototype environments, frameworks, tools, and software that are produced as a result of this effort.

2Elements of the KEG and Related Work

In considering a next generation environment for supporting distributed group research, one can identify a number of logical components that we understand fairly well today. Collaboratories present shared, virtual spaces where groups of researchers can conduct experiments, share results, collectively produce intermediate analyses, and work together to produce knowledge products such as publications. In geosciences research, terascale simulations produce terascale data holdings and these in turn must be analyzed in the context of the observed record - massive data in its own right. Recent advances in Grid technologies provide a model for an underlying computational and data fabric conceived for terascale modeling and analysis. Generalized frameworks for advanced numerical models are emerging that not only facilitate plug-and-play flexibility for algorithms, but also have substantial promise for supporting domain-specific problem solving environments.

The overarching challenge is to enable all of these technologies to be combined into effective problem-solving environments. The effort proposed here is aimed at building upon a number of other research efforts and extending them such that a knowledge-enabled meta-framework is realized. It will be all things to all people, ‘nuff said. [Replace, scope is too narrow – don ->] This environment provides an interdisciplinary team with virtual proximity to all required resources and each other. In the sections that follow we describe the primary building blocks of the prototype Knowledge Environment for the Geosciences, related work, and research challenges.

2.1Problem Solving Environments]

[Will work with Elias over the weekend – don]

2.1.1Problems/limitations with existing systems: Need integration, scalability, etc…

2.2Collaboratories and Related Infrastructure

[Need Umich SPARC & CHEF background]

2.3Portals for Scientific Research & Education

2.3.1An Environment for Hypothesis Development and Testing

2.3.2Frameworks for Realizing the Portal

2.4Scientific Data: Complex, Diverse, and Very Large

Observational programs such as NASA’s Earth Observing System (EOS Terra, Aqua, and Aura) [] present a proverbial fire hose of data for the Earth System community. Space Science will face similar challenges when platforms such as the Stratospheric Observatory for Infrared Astronomy (SOFIA) [] and other advanced observatories become operational. At the same time, researchers successfully harness parallel computational platforms to simulate phenomena at unprecedented resolution while nested and multiscale models will add an additional level of complexity. Furthermore, climate and weather researchers and impacts assessment stakeholders require tremendous flexibility to combine and compare multiple disparate datasets including GIS. Overall, the geosciences community faces massive growth in the scope, complexity, and ultimate size of important, crucial scientific data with volumes escalating into the terabyte and petabyte range during this decade. The Data Problem challenges our very ability to understand the systems and underlying processes and could has the potential to stand as a formidable barrier to research progress if not addressed.

A meta-framework that anticipates future data requirements must possess extraordinary qualities relative to performance, scalability, flexibility, distributed operation and, above all, the incorporation of semantic content. We propose to build upon and coalesce several community efforts, each of which contributes a unique part to the KEG concept. Recent work in HDF5 [] addresses scalability and performance in the context of parallel computation and exposes a powerful and flexible data model. The Distributed Oceanographic Data System (DODS) is a popular framework for enabling data abstraction and distributed access but has not been targeted at high-performance applications. Recent work at the University of Wisconsin on the VisAD class library [] provides an elegant abstraction of data that is highly synergistic with both DODS and HDF5. One aspect of this research will be coalescing these into the meta-framework context with a coupling to DataGrid technologies, which enable distributed operation and address performance issues. One of the outstanding opportunities presented by this research is to explore the possibilities afforded by coupling this best-of-class synthesis with geoscience-specific ontologies, which enable management, discovery, and usage based upon semantic content.

2.5Mining: Data, Information, and Knowledge

[Lotsa good material here from Sara and Steve. Need to condense and possibly re-tier – don]

Data Mining is concerned with the technologies that provide the ability to extract meaningful information and knowledge from large, heterogeneous data sources. Currently, large numbers of observations are acquired and stored in diverse and distributed data repositories, resulting in the need for “theories” that distill the information and knowledge content. The challenge of extracting meaningful information becomes progressively more formidable for the Geoscience community with the launch of the components of NASA’s Earth Observing System (EOS Terra, Aqua and Aura) and future missions. Similar challenges will face the Space Science community when platforms such as the Stratospheric Observatory for Infrared Astronomy (SOFIA) and other advanced observatories become operational. Since the acquisition of data is a continuing process, general tools and algorithms are needed for analyzing data, as well as for creating and testing theories or hypotheses. Due to the vast amounts of data involved, automated approaches that limit the need for human intervention are desirable.

Much progress has been made in both data mining and knowledge discovery over the past few years. For example, these techniques have proven useful for the automation of the analysis process and reducing data volume. However, the domain is still fairly new and this research frontier offers many areas for substantial improvement, such as the utilization of background knowledge, provability of results, scalability and the use of distributed computing approaches.