Data management Middleware UML Use Case and Activity Model LDM-146 10/10/2013

Large Synoptic Survey Telescope (LSST)

Data Management Middleware UML Use Case and Activity Model

Kian-Tat Lim,Robyn Allsman, Jeff Kantor

LDM-146

Latest Revision:October 10, 2013

This LSST document has been approved as a Content-Controlled Document by the LSST DM Technical Control Team. If this document is changed or superseded, the new document will retain the Handle designation shown above. The control is on the most recent digital document with this Handle in the LSST digital archive and not printed versions. Additional information may be found in the LSST DM TCT minutes.

1

Data management Middleware UML Use Case and Activity Model LDM-146 10/10/2013

Change Record

Version / Date / Description / Owner name
1 / 1/28/2011 / Update Document to reflect Model based on Data Challenge 3 / J. Kantor
2 / 7/12/2011 / Update Document to reflect Model based on Data Challenge 3B PT1 / R. Allsman
3 / 8/18/2011 / General updates / R. Allsman
4 / 9/13/2013 / Final Design updates (MW revision 1.59) / R. Allsman
5 / 10/10/2013 / Formatting revisions (Revision 1.62); TCT Approved / R. Allsman

Table of Contents

Change Record

1Middleware Activities

1.1Actors and Agents

1.2Science Data Archive

1.2.1Catalogs, Alerts, and Metadata

1.2.2Image and File Archive

1.3Data Access Services

1.3.1Data Access Client Framework

1.3.2Data Definition Client Services

1.3.3Query Services

1.3.4Image and File Services

1.4Pipeline Execution Services

1.4.1Pipeline Construction Toolkit

1.4.2Logging Services

1.4.3Inter-Process Messaging Services

1.4.4Checkpoint/Restart Services

1.5Processing Control

1.5.1Data Management Control System

1.5.2Orchestration Manager

1.6Infrastructure Services

1.6.1Event Services

The contents of this document are subject to configuration control by the LSST DM Technical Control Team

1

Data management Middleware UML Use Case and Activity Model LDM-146 10/10/2013

Data Management Middleware UML Use Case and Activity Model

Section: Section1

1Middleware Activities

1.1Actors and Agents

Actor / WBS / Description
Database Administrator / 02C.06.02 / The Database Administrator is any user that has the access necessary to invoke database administration operations (e.g. configure Database security, system parameters, partitioning, resource utilization, etc.) related to in the LSST DM Database management.
Production Coordinator / 02C.01.02 / The Production Coordinator is any user that has the access necessary to oversee and certify the production and release of LSST Data Products.
Science User / 02C.05 / This Science User is any user who has access to LSST Data Products, Pipelines, or both.
System Administrator / 02C.07.02 / The System Administrator is any user that has the access necessary to invoke system administration operations (e.g. configure security, equipment, system parameters, etc.) in the LSST Data Management Control System.
Checkpoint/Restart Services / 02C.06.03.04 / Checkpoint/Restart Services agent provides the orderly halt of pipeline processing including the collection of all data required for its later resumption at the same point at which it was first checkpointed.
Data Access Client Framework / 02C.06.02.01 / The Data Access Client Framework agent manages repositories of datasets. Each dataset has a type belonging to a class; a mapper class used by the framework understands how to persist and retrieve datasets of each class. Provenance information is maintained by the framework, and each dataset is signed to verify its integrity and authenticity.
Data Definition Client Services / 02C.06.02.02 / The Data Definition Client Services agents are used to create schemas and tables within them for any of the databases used in the Science Data Archive or the temporary production catalogs.
DMCS / 02C.07.01.01 / The Data Management Control System has several agent components: Base DMCS, Archive DMCS, DAC DMCS, Replicator, Distributor, Catch-Up Replicator, and Catch-Up Distributor. The Base DMCS is the only DM component that receives commands from the Observatory Control System (OCS) and sends telemetry information. It mediates the interactions of the OCS with all of the DM commandable entities (Archiver, Catch-Up Archiver, Alert Production Cluster, EFD Replicator). The Archive DMCS manages all Archive compute and storage resources. The DAC DMCS does the same for each Data Access Center. The other components are part of the Alert Production execution infrastructure.
Distributor / 02C.07.01.01 / The Distributor agent receives images and metadata from the Base Center and copies them to the Archive DAC and the Alert Production worker nodes.
Download Service / 02C.06.01.01 / The Download Service agent retrieves archived tables from tape and transmits them to the Science User.
Event Broker / 02C.07.02.01 / The Event Broker agent mediates the reliable, high-capacity publish/subscribe mechanism used to communication between DM components.
Event Monitor / 02C.07.02.01 / The Event Monitor agent connected to the Event Broker performs accumulation of event statistics and recognition of patterns of events, allowing triggering of actions (including publishing of new events). Event Monitors will typically be installed by the Orchestration Manager or the Data Management Control System.
Event Services / 02C.07.02.01 / Event Services agents allow DM components, including the DMCS and individual tasks, to communicate with each other using a reliable, high-capacity publish/subscribe mechanism mediated by an Event Broker.
Image and File Archive / 02C.06.01.02 / The Image and File Archive agent provides a Web-based and command-line client interface to the file-based portion of the Science Data Archive.
Image and File Services / 02C.06.02.04 / The Image and File Services agent perform image processing in order to regenerate virtual image data products and to generate images of spatial regions of interest (both large mosaics and small postage stamps). In addition, Image and File Services manages caches of files and images to speed up retrieval.
Ingest Service / 02C.06.01.01 / The Ingest Services agent provides database ingest support for all Catalogs in the Science Data Archive.
Inter-Process Messaging Services / 02C.06.03.03 / The Inter-Process Messaging Services agents enable tasks to send and receive data items according to a defined communications geometry (e.g. broadcast/scatter from each task to all others, gather from all tasks to a designated one, exchange data with neighbors in a grid). A variety of communications mechanisms (e.g. MPI or the LSST DM Event Services) are supported with the same interface.
Logging Services / 02C.06.03.02 / The Logging Servicesagent enable tasks to produce log entries, including fatal errors, warnings, and debugging messages. Log entries may be sent to a variety of destinations including standard streams, files, and the Event Services broker. Each entry may have metadata associated with it. An external configuration specifies which entries go to which destinations with which message format.
Orchestration Manager / 02C.07.01.02 / The Orchestration Manager agent deploys tasks to compute platforms, setting up the execution environment for them and ensuring that input datasets and storage resources are available. It can schedule the tasks for execution according to data dependencies, resource availability, and other factors. It tracks the status of all tasks and can terminate tasks early if necessary.
Pipeline Construction Toolkit / 02C.06.03.01 / The Pipeline Construction Toolkit agent enables application developers to create tasks to perform operations and execute algorithms within the LSST DM stack framework. Tasks may be nested hierarchically. Each task has a configuration that controls its execution.
Query Services / 02C.06.02.03 / The Query Services agent performs scalable execution of SQL-dialect queries on vast amounts of data partitioned in chunks across a platform consisting of a cluster of machines.
Release Service / 02C.06.01.01 / After a Data Release Production is complete, the Release Service agent makes the new catalog available for access at the same time that an older catalog is removed.
Replicator / 02C.07.01.01 / The Replicator agent aggregates data and forwards the aggregate dataset to a connected Distributor.

1.2Science Data Archive

1.2.1Catalogs, Alerts, and Metadata

WBS: 02C.06.01.01

The Catalogs, Alerts, and Metadata component includes logical and physical schema designs for all catalogs in the Science Data Archive, including a database of issued alerts, a database of image metadata, and a replica of the Engineering and Facilities Database. In addition to these layouts, there are services provided by this component: an Ingest Service, a Release Service, and a Download Service.

1.2.1.1Ingest Image Metadata

WBS: 02C.06.01.01

Copy metadata about an Image Dataset into appropriate Tables in a Catalog.

INPUT: Destination Catalog, Image Dataset

Scenario / Steps Summary / Rejoins at
Basic Path /
  1. Ingest Service invokes Data Access Client Framework:Retrieve Dataset to retrieve metadata from Image Dataset.
  2. Ingest Service translates metadata into relations.
  3. Ingest Service stores relations in table(s) in catalog.
  4. Ingest Service invokes Data Access Client Framework:Retrieve Dataset Provenance to retrieve provenance of Image Dataset.
  5. Ingest Service translates Provenance into Relations.
  6. Ingest Service stores Provenance Relations in Table(s) in Catalog.

1.2.1.2Ingest FITS Table

WBS: 02C.06.01.01

Copy information from a FITS table into appropriate tables in a Catalog.

INPUT: Destination catalog FITS table dataset

Scenario / Steps Summary / Rejoins at
Basic Path /
  1. Ingest Service invokes Data Access Client Framework:Retrieve Dataset to Retrieve FITS Table.
  2. Ingest Service extracts schema from FITS Table.
  3. Ingest Service translates FITS Schema into Relational Schema.
  4. Ingest Service invokes Data Definition Client Services:Create Table to create Tables in Catalog.
  5. Ingest Service stores FITS Table data in Tables in Catalog.
  6. Ingest Service invokes Data Access Client Framework:Retrieve Dataset Provenance to retrieve Provenance of FITS Table.
  7. Ingest Service translates Provenance into relations.
  8. Ingest Service stores Provenance in Table(s) in Catalog.

1.2.1.3Bulk Download Catalog

WBS: 02C.06.01.01

Portions or all of a Catalog, including older Catalogs no longer on disk, may be downloaded as bulk files (not as a Database query).

INPUT: Data Release Production number, Catalog tables: all or subset

Scenario / Steps Summary / Rejoins at
Basic Path /
  1. Download Service retrieves archived tables from tape.
  2. Download Service transmits archived tables to Science User.

1.2.1.4Ingest Multiple FITS Tables

WBS: 02C.06.01.01

Copy information from many FITS tables into appropriate Tables in a Catalog, repartitioning the data if necessary.

INPUT: Destination Catalog, FITS Table Datasets

Scenario / Steps Summary / Rejoins at
Basic Path /
  1. If Dataset type is a Distributed Table(see AltPath: Dataset Type not Distributed Table)
  2. Ingest Service invokes Query Services: Partition Data and Load Distributed Table.
  3. For each FITS Table:
  4. .....Ingest Service invokes Data Access Client Framework: Retrieve Dataset Provenance to retrieve Provenance of FITS table .
  5. .....Ingest Service translates Provenance into relations.
  6. .....Ingest Service stores Provenance in Tables in Catalog.
  7. done:
  8. fin:

AltPath: Dataset Type not Distributed Table /
  1. For each FITS table:
  2. .....Ingest Service invokes Ingest FITS Table.
  3. done:
/ Basic Path step:8
1.2.1.5Release New Catalog

WBS: 02C.06.01.01

After a Data Release Production is complete, the new catalog is made available for access at the same time that an older catalog is removed.

INPUT: Data Release catalog; Previous Data Release catalog; Penultimate Data Release catalog

Scenario / Steps Summary / Rejoins at
Basic Path /
  1. Release Service marks older Catalog as to-be-deleted in all DACs.
  2. Release Service marks old Catalog as older in all DACs.
  3. Release Service marks current Catalog as old in all DACs.
  4. Release Service marks new Catalog as current in all DACs.
  5. Release Service removes to-be- deleted Catalog(s) in all DACs.

1.2.1.6Ingest Production Results

WBS: 02C.06.01.01

Copy information, from the results of executing a task in a Production, into a Catalog.

INPUT: Destination Catalog; Dataset Type, Repository, Selection Information

Scenario / Steps Summary / Rejoins at
Basic Path /
  1. If Dataset type is Images:(see AltPath: Dataset Type is FITS Tables)
  2. For each Image:
  3. .....Ingest Service invokes Ingest Image Metadata.
  4. done:
  5. fin:

AltPath: Dataset Type is FITS Tables /
  1. For each FITS Table:
  2. Ingest Service invokes Ingest Multiple FITS Tables
  3. done:
/ Basic Path step:5
1.2.1.7Construct Data Release Catalog

WBS: 02C.06.01.01

Build an instance of the Data Release Catalog that stores the results of Data Release Production.

INPUT: Data Release Catalog schema, stored procedures and user-defined functions, constants

Scenario / Steps Summary / Rejoins at
Basic Path /
  1. Production Coordinator uses Data Definition Client Services:Create Table to create tables necessary for storing results of Data Release Production.
  2. Production Coordinator loads constants into tables.

1.2.1.8Construct Global Catalog

WBS: 02C.06.01.01

Build an instance of the Global Catalog to track all information in the Science Data Archive (across Data Releases).

INPUT: Global Catalog schema, stored procedures and user-defined functions, constants and historical information

Scenario / Steps Summary / Rejoins at
Basic Path /
  1. Database Administrator uses Data Definition Client Services:Create Table to create tables necessary for tracking versions and provenance of all other catalogs; users and privileges; and stored procedures and user-defined functions.
  2. Database Administrator loads constants and historical information into tables.

1.2.1.9Construct Temporary Data Release Production Catalog

WBS: 02C.06.01.01

Build an instance of a Catalog to be used for temporary, intermediate storage during the execution of the Data Release Production.

INPUT: Temporary Data Release Production Catalog schema, stored procedures and user-defined functions, constants and historical information

Scenario / Steps Summary / Rejoins at
Basic Path /
  1. Production Coordinator uses Data Definition Client Services:Create Table to create tables necessary for intermediate storage during Data Release Production.
  2. Production Coordinator loads constants and historical information into tables.

1.2.1.10Construct Temporary Alert Production Catalog

WBS: 02C.06.01.01

Build an instance of a Catalog to be used for temporary, intermediate storage during the execution of the Alert Production.

INPUT: Temporary Alert Production Catalog schema, stored procedures and user-defined functions, constants and historical information

Scenario / Steps Summary / Rejoins at
Basic Path /
  1. Uses Data Definition Client Services:Create Table to create tables necessary for intermediate storage during Alert Production, including the Alert Production control database.
  2. Production Coordinator loads constants and historical information into tables.

1.2.1.11Construct Level 1 Database Catalog

WBS: 02C.06.01.01

Build an instance of the Level 1 Database Catalog that stores the results of Alert Production.

INPUT: Level 1 Database Catalog schema, stored procedures and user-defined functions, constants

Scenario / Steps Summary / Rejoins at
Basic Path /
  1. Production Coordinator Uses Data Definition Client Services:Create Table to create tables necessary for storing results of Alert Production.
  2. Production Coordinator loads constants into tables.

1.2.2Image and File Archive

WBS: 02C.06.01.02

Web-based and command-line client interface to the file-based portion of the Science Data Archive.Uses the Image and File Services and Data Access Client Framework.

1.2.2.1Bulk Download Images and Files

WBS: 02C.06.01.02

Large numbers of images and/or files may be bulk downloaded rather than retrieved one by one.

INPUTS: Identifying information for image/file set

Scenario / Steps Summary / Rejoins at
Basic Path /
  1. For each image/file in set:
  2. .....Image and File Archive invokes Image and File Services:Retrieve Released Image or File to retrieve image/file.
  3. .....Image and File Archive transmits image/file to Science User.
  4. done:

1.2.2.2List Released Images and Files

WBS: 02C.06.01.02

Images and files matching search criteria may be listed.

INPUT: Specific release; Search expression, including desired Dataset Type, date ranges, spatial region, and other criteria

Scenario / Steps Summary / Rejoins at
Basic Path /
  1. Image and File Archive displays all images/files that match search expression

1.2.2.3Retrieve Released Image or File

WBS: 02C.06.01.02

When a specific Image or File has been identified, it can be retrieved.

INPUT: Identifying information for specific Image/File

Scenario / Steps Summary / Rejoins at
Basic Path /
  1. If Image/File Type is Calibrated Image:(see AltPath: Image/File Type is CutOut Image)(see AltPath: Image/File Type is Difference Image)(see AltPath: Image/File Type is Image Mosaic)(see AltPath: Image/File Type is RGB Image)
  2. Image and File Archive invokes Image and File Services:Retrieve Calibrated Science Image
  3. Transmit image/file to Science User.

AltPath: Image/File Type is CutOut Image /
  1. Image and File Archive invokes Image and File Services: Retrieve Cutout Image to retrieve image/file
/ Basic Path step:3
AltPath: Image/File Type is Difference Image /
  1. Image and File Archive invokes Image and File Services:Retrieve Difference Image to retrieve image/file
/ Basic Path step:3
AltPath: Image/File Type is Image Mosaic /
  1. Image and File Archive invokes Image and File Services:Retrieve Image Mosaic to retrieve image/file
/ Basic Path step:3
AltPath: Image/File Type is RGB Image /
  1. Image and File Archive invokes Image and File Services:Retrieve RGB Image to retrieve image/file
/ Basic Path step:3

1.3Data Access Services

WBS: 02C.06.02

These services provide the ability to ingest, index, federate, query, and administer DMS data products on distributed, heterogeneous storage system and data server architectures. All services will be implemented to provide reasonable fault-tolerance and autonomous recovery in the event software and hardware failures.

1.3.1Data Access Client Framework

WBS: 02C.06.02.01

The Data Access Client Framework manages repositories of datasets. Each dataset has a type belonging to a class; a mapper class used by the framework understands how to persist and retrieve datasets of each class. Provenance information is maintained by the framework, and each dataset is signed to verify its integrity and authenticity.

1.3.1.1Verify Dataset Signature

WBS: 02C.06.02.01

Verify the integrity of a Dataset in a Repository.

INPUT: Repository path; Dataset type; Values for Identifying Keys