ESG Roadmap1

ESG ROADMAP

Questions we hope to address:

  • How can we work together?
  • How does the work we are focused on fit into existing systems (i.e., security, metadata, data delivery, etc.)
  • How do we interoperate with independent systems

Serving our Community:

  • CMIP3 IPCC AR4 and CMIP4 IPCC AR5
  • CCSM
  • CCES
  • PCM
  • POP
  • NARCCAP
  • C-LAMP
  • CFMIP

Preliminary requirements and architecture development with European partners

Characterization of Users

  • WG1 -- Scientist
  • WG2 -- Impacts
  • WG3 – Mitigations

Characterization of CMIP3 IPCC AR4 Data

  • Gridded files, aggregations
  • Datasets
  • Accessible through netCDF API
  • CF 1.0 compliant

Characteristics of the Data:

  • AR4/AR5 format and convention: netCDF and CF
  • Cannot assume all data will follow CF convention – legacy CCSM and PCM data don’t follow CF convention. So leverage CF convention when possible, but build metadata model and services to be independent of CF
  • Very large data (high resolution regional data)
  • Gridded data
  • Observation data
  • Station data
  • Satellite data

Functional specification and Architecture Design

  • In order to meet the needs of the climate community, the ESG-CET architecture must allow a large number of distributed sites, with varying capabilities, to federate, cooperate, and/or work as standalone entities. Thus, we must be able to work with multiple portals and services and have API-based access, and data delivery mechanisms. To accomplish this, the ESG-CET architecture is based on three tiers of data services, as illustrated in Figure below.

Tiered ESG-CET Architecture

  • Tier 1 comprises a set of Global ESG Services (partially exposed via a Global ESG Portal) that provide shared functionality across the overall ESG-CET federation. These services include 1) user registration and management, 2) common security services for access control to the federated resources, 3) metadata services for describing and searching the massive data holdings, 4) a common set of notification services and registry, and 5) global monitoring services. All ESG-CET sites share a common user database, so that a user will only have to register once and maintain a single set of credentials in order to access resources across the whole system. (Access to specific data collections and related resources, such as IPCC data, may still have to be approved by the data’s “owners”.) The definition of global metadata services for search and discovery guarantees a user’s ability to find data of interest throughout the whole federation, independently of the specific site at which a search is begun.
  • Tier 2 comprises a limited number of ESG Data Gateways that act as data-request brokers, possibly providing specific enhanced functionality and/or documentation to serve specific user communities, while supplying much needed fault-tolerance to the overall system. Services deployed on a Gateway include the user interface for searching and browsing metadata, for requesting data (including analysis and visualization) products, and for orchestrating complex workflows. We expect the software stack deployed at the Gateway level to require considerable expertise to install and maintain, and thus we envision these Gateways being operated directly by ESG-CET engineering staff. Initially, three ESG Gateways are planned: one at LLNL focused on the IPCC AR5 needs, one at ORNL to support the Computational Climate End Station project, and one at NCAR to serve the CCSM and PCM model communities (and possibly also others).
  • Tier 3 is the actual data holdings and the back-end data services used to access the data, which resides on a (potentially large) number of federated ESG Nodes. Tier 3 resources typically host those data and metadata services needed to publish the data onto ESG, and to execute data-product requests formulated through an ESG Gateway. Because researchers and engineers at local institutions with variable levels of expertise operate ESG Nodes, the software stack to be deployed at an ESG Node is kept to a minimum, and supplied with detailed and exhaustive documentation. A single ESG Gateway serves data requests to many associated ESG nodes: for example, as part of the next IPCC project, more than 20 institutions are expected to set up ESG data nodes.

ESG will host approximately 30 – 35 models from over 30 modeling centers

AR5 Requirements Timeline

  • 9/07: Primary set of benchmark experiments defined (WGCM meeting). Preliminary set of standard output defined (CMIP panel)
  • 3/08: standard output finalized (CMIP panel) New version of CMOR released accommodating non-rectangular grids
  • 9/08: All required experiments defined (WGCM meeting)
  • Late-2008: modeling groups begin running benchmark experiments (e.g., control, 1%/year CO2 increase)
  • 2009: modeling groups run models and produce output
  • 1/10: model output starts to be made available to community
  • 9/10: first meeting of WG1 AR5 lead authors
  • 3/11: research paper drafts completed and made available to WG1 AR5 authors
  • 9/11: First order draft of WG1 AR5 completed
  • 8/12: deadline for acceptance of papers cited by IPCC
  • 6/13: release of WG1 AR5

Location of significant instruments, facilities, or other assets

  • Platform(s) required (*nix?)
  • Online storage for the most accessed data sets
  • Tertiary storage for very large, less commonly accessed datasets
  • Online cache for staging from tertiary storage

ESG’s testbed deployment (2008-2009 time frame) is expected to federate the following sites:

  • PCMDI, LLNL (gateway, node)
  • NCAR (gateway, node)
  • ORNL (gateway, node)
  • LANL (node)
  • GFDL, Princeton, NJ (node)
  • BADC, UK (partner)
  • DKRZ, Germany (partner)
  • University of Tokyo Center for Climate System Research, Japan (node)

The full next-generation ESG federation is likely to add one or more sites in the following countries to the testbed sites above:

  • China
  • Norway
  • Canada
  • France
  • Australia
  • Korea
  • Italy
  • Russia

Component Design for Federation

  • Phase 1 – the minimum set to requirements for the testbed to operate
  • Security: Authorization and Authentication
  • User enrolment and management
  • Metadata management
  • Publication
  • Data deliver (FTP, GridFTP, etc.)
  • Portal: search, browse
  • logging and metrics
  • Phase 2 – second set of requirements not needed for initial testbed
  • Monitoring
  • Data product, remote computing services
  • Client-side access
  • Notification service

Defining implementing the components and APIs (perhaps brokers) for interoperability with European partners

  • Interaction sequences

ESG-CET ROADMAP

Deadline: October 2008 for IPCC AR5

Roadmap based on quarterly releases of a deployable Gateway source code base. Each release includes several modules of functionality, which need to be fully tested and involve an almost-final user interface

  1. December 31st, 2007: Standalone Gateway with basic functionality
  2. User management
  3. Authentication (standalone)
  4. Gateway Branding
  5. Catalogs Browsing (datasets and files)
  6. Files Download: files local to Gateway disk
  7. Data Publishing: harvesting from existing THREDDS catalogs
  8. Semantic Search: prototype based on metadata harvested from THREDDS catalogs
  9. March 31th, 2008: Standalone Gateway with extended functionality
  10. Files Download: files on deep storage (SRM w/ WS)
  11. Data Cart and Data Request Monitoring interface
  12. Data Publishing: files local to Gateway (including on deep storage)
  13. Detailed metadata display
  14. Semantic Search: use full metadata from domain model
  15. Authorization Services
  16. June 30th, 2008: Data Node integration
  17. Integration with LAS, TDS
  18. User Interface
  19. Submission of data requests (viz and subsetting) from UI to LAS, TDS
  20. Database-driven configuration
  21. Authorization
  22. Metrics: collection from Gateway and Data Node
  23. Data Publishing: data on remote Data Node
  24. September 30th, 2008: Gateways Federation
  25. Single Sign On among Gateways
  26. Exchange/Share Metadata
  27. Metrics: aggregation among Gateways
  28. Monitoring
  29. Data Publishing: migration of legacy data from current operational systems
  30. Client access:
  31. FTP
  32. GridFTP
  33. Wget
  34. DML

Left for 2009:

  • Opendap-G integration
  • Server-Side processing