Design and Implementation of CoreWall’s Session Database

TABLE OF CONTENTS

1.INTRODUCTION

1.1CoreWall......

1.2Background & Need Identification......

1.3Related Work......

2.THE COREWALL SUITE: GEOSCIENCE CORE DATA INTEGRATION AND VISUALIZATION

2.1Core Workflow Database......

3.PROJECT DESCRIPTION AND WORK BREAK DOWN STRUCTURE

4.DATABASE

4.1Database Design [13]......

Annotation_Table......

Annotation_Relationship_Table......

Database_Table......

DataFile_Table......

Graph_Table......

Graph_Annotation_Mapper_table......

Graph_Image_Mapper_Table......

Group_Table......

Group_Database_Mapper_Table......

Image_Table......

Image_Annotation_Mapper_table......

Image_Manipulation_Non-Uniform_Scaling......

Image_Manipulation_Uniform_Scaling......

Manipulation_sequencer_table......

Session_Annotation_Mapper_table......

Session_table......

Session_Relationship_Table......

Track_table......

Track_Annotation_Mapper_table......

Track_Image_Mapper_table......

Track_Session_Mapper_Table......

Users_Table......

4.2XML DTD[8]......

5.TECHNOLOGIES USED / CONCEPTS APPLIED

5.1XML DOM Parsing [8,10]......

5.2XML to RDB Mapping [2]......

Introduction to the Problem of XML to RDBMS Mapping......

Pros And Cons Of Xml Over Relational Model......

Need To Convert Xml Data Into Relational Model......

Challenges And Complexities To Efficient Storing Of Xml Databases......

Storage......

Creating a XML structure that corresponds to a database schema......

Referential Integrity and storing of XML documents to a RDBMS......

5.3Java Prepared Statements 14]......

5.4Java Native Interface [4]......

6.WORKFLOW [7]

6.1Save Session:......

6.2Retrieve_Session......

6.3Queries......

UserSessions:......

SessionTracks:......

ImageSessions:......

SessionTrackImages:......

ImageAnnotations:......

ImageGraphs:......

6.4User Management [12,13]......

7.SAMPLE SCREENSHOTS

8.SYSTEM REQUIREMENTS

9.TABLE OF ACRONYMS

10.REFERENCES

1.INTRODUCTION

1.1CoreWall

CoreWall Suite(CWS) is a real-time stratigraphic correlation, core description (CD) and data visualization system to be used by the marine, terrestrial and Antarctic science communities. It is aimed to significantly alter and enhance the current approaches used for core description and analysis of sediment and rock cores by providing an integrated environment for these activities, for both field and repository environments.

1.2Background & Need Identification

A primary need for marine, lake, and polar studies of sediment and rock cores, is an integrated environment for stratigraphic correlation, visual core description, and data visualization. Within the last few years, digital images of cores, microfossils, thin sections, etc. are being routinely collected, replacing traditional photography. This dramatic change to digital imagery is creating a huge change and opportunity in the way scientific measurements are captured, collected, analyzed, and distributed by the scientific community.

These images provide the fundamental template for all sediment descriptive work, including annotations about structures, lithologic variation, macroscopic grain size variation, bioturbation intensity, chemical composition, and micropaleontology, among other features.The integration of core-section images with discrete data streams and ‘nested’ images (e.g., smear slides, photomicrographs, thin sections) provides a robust approach to the description of sediment and rock cores. Integration of value-added data, later acquired in a timely manner by individuals and teams of scientists is another major priority. The real-time and/or simultaneous display of multiple integrated databases, with all the data rectified (co-registered) to the fundamental template of the core image, is sorely needed to expedite the process that currently takes months to years.

The cost of wasted time and lost opportunity for science caused by the lack of integrated data and image management is enormous and the challenge exists for all current geoscience’s coring programs. Currently, visual core description (VCD) within IODP(Integrated Ocean Drilling Program) is primarily carried out in manual fashion with observations carried out with pencil and paper and entered into a variety of 3rd party software (e.g. AppleCore), and Excel template spreadsheets to create summary “barrel sheets” (see Figure 1 for example) of basic lithology, structures, color, and grain size, etc. This is perhaps the most labor-intensive activity with up to 8 to 10 people allotted per expedition. The VCD summaries are also now supplemented by digital line scans of the split-cores, color reflectance measurements, and other data sets.

However, the automated measurements are not captured by the “barrel sheets” and certainly not integrated into the summaries. Only the interest of individual scientific party members drives combining these data into an overall site summaries capture any of this analysis.

Essentially, there is no data visualization system within IODP or any other drilling program that allows easy, interactive data visualization and integration. The ability to easily map and analyze these diverse datasets onto multiple depth scales interchangeably has now become a requirement. The need for a data visualization system that is scaleable, integrated, linked to outside sources, and allows for collaborative research both onboard and onshore is absolutely a necessity to improve the overall science.

CWS will be an integrated interpretive environment to facilitate all scientists that use geosciences data as a component of their research.

1.3Related Work

LacCore is an integrated system that provides training and service to all researchers regardless of funding source, and that aims to function as a role model for handling of and primary analysis of lacustrine sediment cores. Its facilitates interdisciplinary studies of lake sediments and environmental history by bringing together researchers in an efficient workspace that provides standard and specialized equipment, knowledgeable staff, and training. A networked computer operates as a link to the collections and acts as a server to host the various databases. However, a quantum leap in data display capability is needed to better facilitate researchers who use the LacCore. CoreWall Suite will fill this glaring need.

Figure 1: Example Barrel Sheet

2.THE COREWALL SUITE: GEOSCIENCE CORE DATA INTEGRATION AND VISUALIZATION

The central component of the CWS is the Core Workflow Database (CWD) whose role is to intercept queries from end-users and retrieve the relevant pieces of data and meta-data, and stage it for rapid and frequent access by the users. Figure 2 depicts how CWD associates with other CoreWall Suite components.

Figure 2: Components of CoreWall Suite

2.1Core Workflow Database

The data required for a core interpretation session can be very large. The data for an entire IODP core, including core images, multi-sensor scans, smear slides and all related data, can amount to many gigabytes. If a scientist wants to simultaneously compare multiple cores, the size of the needed data can indeed be large. To compound this problem, many users will be interpreting at locations with slow Internet connections; in the field or drawing data from databases which themselves are on slow networks. In addition, users will be interpreting data from databases that are often designed as read-only archives and not designed to hold “works in progress” of investigators.

The Core Workflow database will be developed to address these two problems. First it will have interfaces to enable the CWD to retrieve user selected data from established databases such as JANUS, LacCore, dbSEABED, and PaleoStrat. It will also be able to pull data through the emerging portals such as CHRONOS which will allow much more diverse data to be accessible to the CWS than through any single database. This gives the user instant web-services based access to multiple data sources . Second, the CWD captures the results of analyses and interpretations made through accessing and correlating data retrieved from these databases while the user is working to analyze the cores. As the workflow is captured other collaborators can access it locally or remotely. For example, the work of one scientist can be shared from one shift to the next and biologists in a separate lab can watch as other labs in the ship interpret a specific core segment. In a higher bandwidth environment, such as a core lab or a university office, a group of collaborators could track the work of one-another as they work on the same cores.

A primary feature of the CWD is the co-registration of the data across multiple coordinate systems. For instance, once several wire length control points have been assigned depths, the remainder can be extrapolated, and depth is now stored as an alternate coordinate system for the well log. Networks of Core Workflow Databases can be connected to facilitate mirroring of data, and permissions can be set at each institution’s database to control the degree of sharing. A central Core Integration Database can also be established (for example, on a ship or a core lab on shore) to collect all contributions made by the community.

3.PROJECT DESCRIPTION AND WORK BREAK STRUCTURE

Corewall includes a database which stores data related to the work done by geoscientist in the form of sessions for e.g. core manipulations and observation related data . In general,
Session could be thought as an aggregated study done by a geoscientist upto the point of exiting Corelyzer, which is an integrated visualization tool for the study of lake and ocean sediment .
Corelyzer allows a geoscientist to load multiple cores and associate some text with them in the form of text(called as annotations). Apart from this textual information, a user's work in Corelyzer could also contain graphs, images and there associated annotations These annotations are notes about the core study. This study could contain the hierarchical associations between the annotations in terms of parent child relationship amongst themselves.

Currently geoscientists are not able to build on or reuse previous core studies . This restricts them in terms of retrieving their older work (old sessions), extending their work and/or integrating work with other geoscientists and analysis on the basis of different CoreWall specific data attributes e.g finding out what sessions referred a particular image, what users used a particular image in the session. All this data is beneficial to their core studies and research.

CoreWall database is the main focus of this project which allow the researchers to save, retrieve and query the session information. This information could then be used to go back in time to see what studies were performed or to analyze the studies at different points of time. In addition to this, it also allows execution of any ad-hoc queries on this session data

With the integration of database into Corelyzer , CoreWall system was separated into two independent components: Corelyzer rendered the core data and met other visualization needs of the geoscientist and session database acted as a repository for the core study data from which user could retrieve, save or query on specific data attributes.

Figure 3: CoreWall Demonstration

Their was a need of bidirectional data communication (see figure 4) between session database and Corelyzer. XML was chosen as the medium for data exchange due to the following reasons:

Figure 4: High Level CoreWall Design

Firstly, XML could represent the hierarchical structure of CoreWall's data better than a relational model and secondly it avoided CoreWall from getting tied up to a specific database back-end, which gave the flexibility to switch to any other proprietary databases with minimal efforts.

The work break down to build the session database included the following tasks:

  • Identification of database needs for the CoreWall suite (Section 1 and 2).
  • Design a database that meets those needs.(Section 4.1)
  • User Management Interface for CoreWall session database(6.4).
  • Design a DTD to simulate the flat structure of CoreWall session database(Section 4.2) .
  • Parse the XML state files to store information into the database. This involves intricacies of XML to RDBMS mapping(Section 5.2) .
  • Retrieve existing session information as small XML files (Section 6.2)
  • Generate query results as XML files(Section 6.3).
  • Integration of database into CoreWall as its backend. This database serves as a repository of data to meet the data integration objectives of CoreWall Suite.

4.DATABASE

4.1Database Design [13]

The database schema for CoreWall session database is depicted below. A detailed description about each table’s columns, their data types, data constraints and the integrity constraints imposed on each table is

described in the following pages.

Database Schema

Database Schema Cont….

Database Schema Contd…..

Annotation_Table

This table is used to store annotation related information. These annotations could belong to session, track, image or graph.

Column_Name / Null / Data Type / Description
Annotation_ID / No / bigint(20) / A unique ID assigned to every annotation.
User_ID / No / bigint(20) / ID of annotation’s owner
Annotation_Author / No / Varchar(25) / Name of the author
Annotation_Text / Yes / Text(500) / Annotation text
Annotation_Type / No / Varchar(25) / This tells if an annotation is related to an image, graph track or session.
Timestamp / No / Timestamp / Time when the annotation was last updated.

Primary Key : Annotation_ID

Foreign Key : User_ID REFERENCES track_table(track_ID) on DELETE CASCADE on UPDATE RESTRICT

Annotation_Relationship_Table

This table is used to store parent –child relationship between annotations.

Column_Name / Null / Data Type / Description
Annotation_Parent_ID / No / bigint(20) / A unique ID assigned to every annotation
Annotation_Child_ID / No / bigint(20) / A unique ID assigned to every annotation

Primary Key : Annotation_Parent_ID, Annotation_Child_ID

Foreign Key :

Annotation_Parent_ID REFERENCES Annotation_Table(Annotation_ID) on DELETE CASCADE on UPDATE RESTRICT

Annotation_Child_ID REFERENCES Annotation_Table(Annotation_ID) on DELETE CASCADE on UPDATE RESTRICT

Database_Table

The table is used to store information about external databases.

Column_Name / Null / Data Type / Description
Database_ID / No / bigint(20) / A unique ID assigned to every external database
Database_Name / No / Varchar(255) / Name of the database
IP_Address / No / Varchar(20) / IP address of the machine hosting the database
Admin_email / Yes / Varchar(25) / Email address of the database administrator

Primary Key : Database_ID

DataFile_Table

The table is used to store information about datafiles that are used to generate graphs

Column_Name / Null / Data Type / Description
DataFileID / No / bigint(20) / A unique ID assigned to every data file
File_Name_1 / No / Varchar(255) / Name of the first file
File_Name_2 / No / Varchar(255) / Name of the second file

Primary Key : DataFileID

Graph_Table

This table is used to store graph information.

Column_Name / Null / Data Type / Description
Graph_ID / No / bigint(20) / A unique ID assigned to every annotation
Data_File_ID / No / bigint(20) / Id of the file which stores data for the graph.
Field_Name_1 / No / Varchar(255) / Name of graph’s first field
Field_Name_2 / No / Varchar(255) / Name of graph’s second field

Primary Key : Graph_ID

Foreign Key :

Data_File_ID REFERENCES datafile_table(datafile_ID) on DELETE CASCADE on UPDATE RESTRICT

Graph_Annotation_Mapper_table

This table is used to store information that relates annotations with graphs

Column_Name / Null / Data Type / Description
Annotation_ID / No / bigint(20) / A unique ID assigned to every annotation
Graph_ID_1 / No / bigint(20) / ID of the first graph
Graph_ID_2 / Yes / bigint(20) / ID of the second graph
Session_ID / No / bigint(20) / Session with which this graph_annotation is associated with.
Position_X_1 / No / Decimal(10,4) / X Position where graph 1 is laid out.
Position_Y_1 / No / Decimal(10,4) / Y Position where graph 1 is laid out.
Position_X_2 / Yes / Decimal(10,4) / X Position where graph 2 is laid out.
Position_Y_2 / Yes / Decimal(10,4) / Y Position where graph 2 is laid out.
Position_X_Ann / No / Decimal(10,4) / X Position where annotation is laid out.
Position_Y_Ann / No / Decimal(10,4) / Y Position where annotation is laid out.
Timestamp / Yes / Timestamp / Time at which this graph_annotation was created.

Primary Key : Annotation_ID, Graph_ID_1, Graph_ID_2

Foreign Key :

Annotation_ID REFERENCES annotation_table(Annotation_ID) on DELETE CASCADE on UPDATE RESTRICT

Session_ID REFERENCES session_table(session_ID) on DELETE CASCADE on UPDATE RESTRICT

Graph_ID_1 REFERENCES Graph_table(Graph_ID) on DELETE CASCADE on UPDATE RESTRICT

Graph_ID_2 REFERENCES Graph_table(Graph_ID) on DELETE CASCADE on UPDATE RESTRICT

Graph_Image_Mapper_Table

The table is used to store information about mappings between graphs and images during a user session.

Column_Name / Null / Data Type / Description
Graph_ID / No / bigint(20) / Identification for the Graph
Image_ID / No / bigint(20) / Identification for the Image
Session_ID / No / bigint(20) / ID of the session
Position_X / No / Decimal(10,4) / X position of the graph in the image
Position_Y / No / Decimal(10,4) / Y position of the graph in the image

Primary Key : Graph_ID, Image_ID

Foreign Key :

Graph_ID references Graph_Table(Graph_ID) on DELETE CASCADE on UPDATE RESTRICT

Image_ID references Image_Table(Image_ID) on DELETE CASCADE on UPDATE RESTRICT

Session_ID references Session_Table(Session_ID) on DELETE CASCADE on UPDATE RESTRICT

Group_Table

The table is used to store information about the user groups of core wall. Each user is assigned to a group.

Column_Name / Null / Data Type / Description
Group_ID / No / bigint(20) / A unique ID assigned to every group
Group_Name / No / Varchar(255) / Name of the group
Description / Yes / Varchar(255) / A brief description about the group

Primary Key : Group_ID

Group_Database_Mapper_Table

The table is used to store associate groups and the databases that group have access to.

Column_Name / Null / Data Type / Description
Group_ID / No / bigint(20) / A unique ID assigned to every group
Database_ID / No / bigint(20) / A unique ID assigned to every database
User_login / No / Varchar(255) / Login for the user
User_password / No / Varchar(255) / Password for the user

Primary Key : Group_ID, Database_ID

Foreign Key :

Group_ID REFERENCES Group_Table(Group_ID) on DELETE CASCADE on UPDATE RESTRICT

Database_ID REFERENCES Database_Table(Database_ID) on DELETE CASCADE on UPDATE RESTRICT

Image_Table

This table is used to store information about images that are used in a corewall session.

Column_Name / Null / Data Type / Description
Image_ID / No / bigint(20) / A unique ID assigned to every image
Image_Name / No / Varchar(255) / Name of the image.
Image_URL / No / Varchar(255) / URL where the image is stored
Image_Description / Yes / Varchar(255) / A brief description about the image

Primary Key : Image_ID

Image_Annotation_Mapper_table

This table is used to store annotations associated with images.

Column_Name / Null / Data Type / Description
Annotation_ID / No / Bigint(20) / A unique ID assigned to annotation.
Session_ID / No / Bigint(20) / Session to which the image and annotation belong
Core_Image_ID_1 / No / Bigint(20) / ID of the first image
Core_Image_ID_2 / Yes / Bigint(20) / IDof the second image
Position_X_1 / No / Decimal(10,4) / X Position where image 1 is laid out.
Position_Y_1 / No / Decimal(10,4) / Y Position where image 1 is laid out.
Position_X_2 / Yes / Decimal(10,4) / X Position where image 2 is laid out.
Position_Y_2 / Yes / Decimal(10,4) / Y Position where image 2 is laid out.
Position_X_Ann / No / Decimal(10,4) / X Position where annotation is laid out.
Position_Y_Ann / No / Decimal(10,4) / Y Position where annotation is laid out.
Timestamp / Yes / Timestamp / The time when this annotation was last saved.

Primary Key : Annotation_ID,Core_Image_ID_1, Session_ID

Foreign Key :

Annotation_ID REFERENCES Annotation_table(Annotation_ID) on DELETE CASCADE on UPDATE RESTRICT

Session_ID REFERENCES session_table(session_ID) on DELETE CASCADE on UPDATE RESTRICT

Core_Image_ID_1 REFERENCES Image_table( Image_ID) on DELETE CASCADE on UPDATE RESTRICT

Core_Image_ID_2 REFERENCES Image_table( Image_ID) on DELETE CASCADE on UPDATE RESTRICT

Image_Manipulation_Non-Uniform_Scaling

This table is used to store non-uniform scaling related image manipulation information.

Column_Name / Null / Data Type / Description
Image_Manipulation_NUS_ID / No / bigint(20) / A unique ID for every Non uniform manipulation.
Image_ID / No / bigint(20) / ID of the image that is manipulated.
X_value / No / Decimal(10,4) / X value
Y_value / No / Decimal(10,4) / Y value
Width / No / Decimal(10,4) / Width
Height / No / Decimal(10,4) / Height
Scale_x / No / Decimal(10,4) / Scaling ratio X
Scale_y / No / Decimal(10,4) / Scaling ratio Y

Primary Key : Image_Manipulation_NUS_ID

Foreign Key :

Image_ID REFERENCES Image_Table(Image_ID) on DELETE CASCADE on UPDATE RESTRICT

Image_Manipulation_Uniform_Scaling

This table is used to store uniform scaling related image manipulation information.

Column_Name / Null / Data Type / Description
Image_Manipulation_US_ID / No / bigint(20) / This is a unique ID for every uniform scaling manipulation.
Image_ID / No / bigint(20) / ID of the image that is manipulated.
Scale_x / No / Decimal(10,4) / Scaling ratio X
Scale_y / No / Decimal(10,4) / Scaling ratio Y

Primary Key : Image_Manipulation_US_ID