Dpc-Pla Icd

esac

/ European Space Astronomy Centre
P.O. Box 78
28691 Villanueva de la Cañada
Madrid
Spain
Tel. (34) 91 813 1100
Fax (34) 91 813 1139

PLA Ingestion Procedure

Title
Issue / Revision
Author / Date
Approved by / Date
Reason for change / Issue / Revision / Date
Issue / Revision
Reason for change / Date / Pages / Paragraph(s)

Table of contents:

Page 1/10

DPC-PLA ICD

Date Issue Rev

1 Introduction...... 4

1.1 Purpose...... 4

1.2 Scope...... 4

1.3 Applicable Documentation...... 4

1.4 Acronyms...... 4

2 Procedural Description...... 6

2.1 General workflow...... 6

2.2 Catalogues...... 6

2.2.1 Data preparation...... 6

2.2.2 Data ingestion...... 7

2.2.3 After the ingestion...... 7

2.3 Maps...... 7

2.3.1 Data preparation...... 7

2.3.2 Data ingestion...... 7

2.4 Operational Files...... 7

2.4.1 Data preparation...... 7

2.4.2 Data ingestion...... 7

2.5 POSH...... 7

2.5.1 Import POSH data into the PLA repository...... 8

2.5.1.1 ALL POSH...... 8

2.5.1.2 ONLY POSH CATALOGUE...... 8

2.5.1.3 POSH DOCUMENTS...... 8

2.5.2 Ingest POSH data into the DB...... 8

2.5.3 POSH...... 8

2.5.4 OD...... 9

2.5.5 Survey...... 9

2.5.6 Ring...... 9

2.5.7 State_vector...... 9

2.5.8 Event_type...... 9

2.5.9 Event_flag...... 9

2.5.10 Event...... 9

3 Appendix A: PLA ingestion analysis...... 10

Page 1/10

DPC-PLA ICD

Date Issue Rev

1 Introduction

1.1 Purpose

The purpose of this document is to describe the procedure to follow when performing an ingestion into the PLA.

Due to the absence of a proper ICD developed and agreed by the DPCs, there are a number of actions that must be performed manually in order for the data to fit the specifications outlined in our internal ICD. See Appendix A.

1.2 Scope

This document is intended to assist with the following ingestion products:

 Catalogues.

 Maps.

 Operational Files.

 POSH data.

1.3 Applicable Documentation

AD-1: PLA URD v0.4 (Planck/PSO/2008-016)
AD-2: PLA DPC to PSO Product Delivery ICD (PGS-ICD-037)
AD-3: Planck IDIS Exchange Format Design Document v4.0 (PL-COM-SSD-IF-47)
AD-4: ICD-031 DPC-DPC Maps and Power Spectra Exchange v2.0 (PL-LFI-OAT-IC-002)
AD-5: Planck Internal Archive DPC to PSO Product Delivery ICD (PGS-ICD-104)

1.4 Acronyms

CMBCosmic Microwave Background

DPCData Processing Centre

DTDDocument Type Definition

ERCSCEarly Release Compact Source Catalog

ESACEuropean Space Astronomy Centre

FITSFlexible Image Transport System

HDUHeader and Data Unit

HFIHigh Frequency Instrument

HKHouse Keeping

IAPInstitut D’Astrophysique de Paris

INAF-OATOsservatorio Astronomico di Trieste

LFILow Frequency Instrument

PLAPlanck Legacy Archive

PSOPlanck Science Office

RIMOReduced Instrument Model

SATScience Archives Team

TBCTo be confirmed

TBDTo be done

TMTelemetry

UTFUnicode Transformation Format

XMLExtensible Markup Language

2 Procedural Description

2.1 General workflow

The current ingestion process has the following phases:

HFI/LFI and PSO upload the products to either the PLA ftp or manually via a hard disk.
SAT manipulates those products manually in order to prepare them for the ingestion.
Use the automatic ingestion software if exists for the particular product, otherwise, perform the necessary tasks manually.
SAT notifies PSO ingestion has be done.

2.2 Catalogues

2.2.1 Data preparation

 Create a tar file per frequency channel with the list of postcards plus the descriptive index.html file.

 Export the catalogue tables to “.csv” files using TOPCAT. Add the following columns:

 catalog_oid = [number]

 position = concat( concat("(", replaceAll(toString(RA), "E(\\d)", "E+$1"), "d,"), concat(replaceAll(toString(DEC),"E(\\d)","E+$1"),"d)"))

 position_galactics = concat( concat("(", replaceAll(toString(GLON),"E(\\d)","E+$1"),"d,"), concat(replaceAll(toString(GLAT),"E(\\d)","E+$1"),"d)")

 Accommodate the data products into the stage area:

 Copy catalogue/catalogs to stage/catalogs/tables.

 Copy catalogue/documents to stage/catalogs/documents/documents.

 Copy catalogue/qa/cutouts and catalogue/qa/skymaps to stage/catalogs/documents/documents.

 Create a descriptor file per document to associate that document to the appropriate catalogues.

 Copy catalogue/notes to stage/catalogs/notes.

 Copy all postcards directly (without sub-directories) to stage/catalogs/postcards. In order to avoid duplicates, rename them adequately:

■ Postcards located under psf directory must be renamed to postcardName_psf.jpg

■ ECC/353 postcards must be renamed to postcardName_353.jpg.

■ ECC/545 postcards must be renamed to postcardName_545.jpg.

■ ECC/857 postcards must be renamed to postcardName_857.jpg.

2.2.2 Data ingestion

Use the PLA import software to ingest automatically notes, postcards and documents.

2.2.3 After the ingestion

Import manually into the DB the sources exported in the “Data Preparation” phase.

2.3 Maps

2.3.1 Data preparation

Accommodate the data products into the stage area:

 Copy map/maps to stage/maps/maps/[map_type].

 Copy map/documents to stage/maps/documents/documents.

 Create a descriptor file per document to associate that document to the appropriate maps.

 Copy map/postcards to stage/maps/postcards.

2.3.2 Data ingestion

Use the PLA import software to ingest automatically maps, postcards and documents.

2.4 Operational Files

2.4.1 Data preparation

Accommodate the data products into the stage area:

 Copy operational_files to stage/operational_files/[operational_file_type].

2.4.2 Data ingestion

Use the PLA import software to ingest automatically the operational files.

2.5 POSH

The ingestion of the POSH is not currently included as part of the automated process and therefore all required tasks must be performed manually.

POSH data is distributed by PSO as an only zip file. In order to make that data accessible from the PLA, the individual files must be properly reorganized and accommodated into the repository. Besides, the metadata associated must be manually ingested into the PLA DB. See the steps bellow:

2.5.1 Import POSH data into the PLA repository

Three different POSH products are reachable from PLA:

2.5.1.1 ALL POSH

Making accessible the whole POSH .zip file delivered by PSO from PLA:

 Copy the whole POSH package file into the repository:

e.g. cp POSH_v0_6_5_beta.zip /pla/plaps/others/misc/

2.5.1.2 ONLY POSH CATALOGUE

POSH fits file only:

 Copy only the POSH “fits” file into the repository:

e.g. cp POSH_v0_6_5_beta/POSH_v0_6_5_beta.fits /pla/plaps/others/misc/

2.5.1.3 POSH DOCUMENTS

POSH documents:

 Create the documentation package and accommodate it conveniently:

e.g. zip -r POSH_v0_6_5_beta_docs.zip

POSH_v0_6_5_beta/Interface\ Functions/ POSH_v0_6_5_beta/poshEDD_v0_6_5_beta.pdf POSH_v0_6_5_beta/poshEDD_v0_6_5_beta.xml POSH_v0_6_5_beta/ReadMe.txt POSH_v0_6_5_beta/Thermistor\ Locations.pdf

cp POSH_v0_6_5_beta_docs.zip /pla/plaps/others/misc/

2.5.2 Ingest POSH data into the DB

The following metadata must be manually injected into the DB based on the POSH .zip package and some other files delivered by PSO.

2.5.3 POSH

Create three entries in the Posh entity, corresponding to the three different POSH products described above:

 Whole POSH .zip file delivered by PSO, with type POSH_ALL.

 POSH fits file, with type POSH_CATALOG.

 POSH documents, with type POSH_DOCUMENTATION.

2.5.4 OD

Insert OD data manually into the DB from OD_information.csv file delivered by PSO.

2.5.5 Survey

Insert survey data manually into the DB from Survey_information.csv file delivered by PSO.

2.5.6 Ring

Taken directly from the state_vector extension (belonged to the package POSH file). Add the following columns:

 Ring_oid

 Survey_oid

 Survey linking is based on PREF:

update pla.ring as rg set survey_oid=(

select survey_oid from pla.survey where rg.pref ilike '0' || to_char(number,'FM9') || '%')

2.5.7 State_vector

Taken directly from the state_vector extension.

2.5.8 Event_type

Ingested directly from this extension, taking EVENT_TYPE_ID to create the oid index.

2.5.9 Event_flag

Ingested from EventIDs extension:

 name: taken from sBinaryNumber columns (s00000001, s00000010...)

 event_attribute: binary identifier for the event_flag.

 event_type_oid: taken from event_type entity.

2.5.10 Event

Ingested directly from Events extension.

 event_title: use the description.xml file delivered by PSO and cross-match it with the Events extension.

 Bx: based on the SubType column of the Events extension. It shows the list of active sub types in binary format.

 event_type: taken from event_type entity.

3 Appendix A: PLA ingestion analysis

Current PLA ingestion is strongly conditioned by the situation of the PLA ICD. The ICD developed by the PSO, is a very detailed and well structured document, but it is compromised because of two factors:

a) The scope of the document is not wide enough: it does not cover the release procedure, staging area structure for upload or additional naming conventions that are needed to fully automatize the procedure.

b) The data structures described, did sometimes in the past not match to what is actually released, with the ICD being reactively updated to what's actually received, instead of the other way round.

In order to internally solve these issues, a SAT ICD was developed, covering all the aspects which were missing in the PSO ICD. But, since neither this nor the PSO IC are approved and signed by the DPCs, it does only help internally to keep track of what's already automated and how.

The impact of having this procedure in that way, is that:

a) The import/ingest software has to be developed _up to the release date_, where the actual data is received, expecting to correct inaccuracies between data structures in the ICD and actual data provided. This requires to keep development open, involving high and critical manpower required precisely in the moment when delays shall be avoided.

b) Data received, since it does not follow any release procedure, shall be manually accommodated to the internal ICD, involving a lot of additional manual work by the archive developers, in order to repackage tar files, compile documentation files, relate those files which require correspondences, and so on. This, again, requires high and critical manpower precisely in the moment when delays shall be avoided.

There are two other steps which are currently being performed through manual tasks:

a) Ingestion of the Source entities, i.e. catalog tables contents. This is currently performed using TOPCAT, creating synthetic columns and importing them directly to the database. This may be easily automated (non-tested skeleton is already in the software) in case the ICD is agreed and endorsed by the DPCs.

b) Ingestion of the POSH. This development is highly dependent on the number of releases to be expected. If the number is high, the actual tasks performed now through TOPCAT may be directly translated to STILTS commands.

This moment of development may be mature enough, since there are product samples for almost all categories, to develop a tight ICD, get DPCs approval and, following to it, develop the remaining software to fully automate all the process.

Page 1/10

DPC-PLA ICD

Date Issue Rev