System Proposal

for a

Cancer Patient Management Platform

Data Warehouse/Mining/Reporting

Prepared For / Prepared By
Glenn Bauman MD FRCPC
Citywide UWO Chair Oncology
790 Commissioners Road East
London, Ontario, Canada N6A 4L6
T: 685-8600 ext. 55962
E: / I-THINK Research
Lawson Health Research Institute
801 Commissioners Road, Suite B3041
London, ON, Canada N6C 5J1
T: 519-685-8500 ext. 42691
F: 519-667-6629
E:
W: www.ithinkresearch.com

Version 1.5

December 11, 2009

Innovative Translation of Health Informatics and Knowledge Research

Table of Contents

Table of Contents 2

1. Introduction 3

1.1 Purpose 3

1.2 Scope 3

1.3 Overview 4

2 Data Sources 5

3 Extraction Translation and Loading (ETL) Process 6

4 Data Warehouse 7

4.1 Design Considerations 8

4.2 User’s Portal 8

4.3 Administrator’s Portal 8

4.4 Extracts 9

4.5 Reports 10

5 Online Analytical Processing (OLAP) Cubes 11

5.1 Reports 11

5.2 Data Mining 13

5.3 External Programs 13

6 Other Advantages 14

6.1 Soft Delete 14

6.2 Data Security and Safety 14

7 Scalability Features (Possible Future Updates) 15

8 Timelines 16

8.1 Rapid Application Development (RAD) 16

9 Budget and Maintenance Fees 17

9.1 Budget 17

9.2 Maintenance and Technical Support Bundle 17

9.3 Other Costs 17

1.  Introduction

1.1  Purpose

The purpose of this document is to outline a combined database and web application designed for the tracking Cancer Patients built upon the base of existing database management systems (DBMS).

More specifically, this will include:

·  An outline of all the data sources to be accessed (including OPIS 2005, CERNER, ARIA, etc)

·  Outline a high level design on how the different data sources will be linked, stored and used to query data.

1.2  Scope

The scope of this document is to elaborate on the functionality of the system by:

·  Identifying the data sources

·  Identifying what data to extract from each data source

·  Providing a high level data warehouse layout

·  Providing functionality and scalability of proposed system

·  Provide a high level online analytical processing (OLAP) structure

o  revealing output of results

o  mining of datasets

o  other uses of these processes

·  Provide a high level report and extract structure

1.3  Overview

The vision is to implement the following solution:

·  Data sources represent all the sources from which data can be imported.

·  ETL stands for Extraction Translation and Loading and is used to populate the warehouse.

·  The warehouse will be designed to meet all current data extracts and reports. The data warehouse will expose its data via:

o  Extracts (primary function)

o  Reports

·  The OLAP Cubes provide additional reporting and data mining features. The OLAP will expose its data via:

o  Reports (primary function)

o  Data Mining

Note: The data warehouse and OLAP cubes have the ability to sanitize the data such that all reports can be used for anonymous studies.

2  Data Sources

The following data sources have been identified:

·  OPIS 2005 is an existing chemotherapy treatment database.

o  This database is currently being used and updated

Required From LRCP

o  Root level access to this system will be made available

o  Extracts in CSV or other compatible format will be made available

·  ARIA is an existing radiation therapy database.

o  This database is currently being used and updated

Required From LRCP

o  Root level access to this system will be made available

o  Extracts in CSV or other compatible format will be made available

·  CERNER is the main hospital patient demographics tracking system (including PIMS)

Required From LRCP

o  Specific data points will need to be specified in order to extract.

o  Develop a process in which new patient ID’s can be delivered to Cerner to include in their extract.

o  Determine how often the data should be exported

§  Schedule (e.g. once a day, week, month)

·  Other possible integration points:

o  ESAS: Survey for patient profiles – education system

3  Extraction Translation and Loading (ETL) Process

The purpose of the ETL is to move the data from the existing systems to the data warehouse.

Technical undertakings on the dataset will include (but not limited to):

·  Merge the existing local cancer DBMS sources while accounting for:

o  Missing and acquired data

o  Variations in field formats (date, names, etc)

o  Incompatible fields (ID’s, primary keys, query sets, etc)

o  Conflicting fields (the use of different standards – i.e. various metrics)

·  Apply various data mining techniques on the merged datasets in order to uncover trends that may lead to better diagnosis of cancer patterns and treatment plans include:

o  Correlation and regression

o  Decision tree’s, neural networks and other learning based algorithms

o  Multivariate statistics

o  Modeling

The ETL process can be summarized as follows:

·  The extraction process entails reading the data from the specified data sources.

·  The translation process will entail:

o  Cleaning various data according to priori rules

o  Translations conforming to standards e.g. change Fahrenheit to Celsius

o  Defining common names, labels and fields

o  Performing score calculation

o  Performing lookups

o  Data linking

o  Modifying layout

·  The loading process sends the prepared data to the data warehouse.

4  Data Warehouse

The purpose of the data warehouse is to store the data such that it’s simple to extract or report the data.

A data warehouse is a database that is optimized for reporting and data extractions. The data warehouse will not be updated in real time from the existing databases - however will be updated each time the ETL process is run.

The warehouse will be built using the following (but not limited to) specifications:

·  Underlying access controls on the warehouse will exist where each user will have a set of permissions allowing particular access to different areas within the database.

·  All data will be maintained within core tables or associated data-forms.

·  Each patient is given a universal identifier that tracks their particular details (since various patient data may need to be de-identified for trials or statistical analysis.

·  Redundancy will be built into the database, where each table is backed up on a nightly basis to a secondary warehouse.

·  MySQL will be the primary system used for the proposed system

·  Data warehouse will be constructed in a fashion that will facilitate the incorporation of additional sources of data in the future (not in current scope of this project), these may include:

o  imaging information from clinical PACs systems,

o  synoptic clinical notes,

o  etc.

The data-forms surrounding the core tables will contain information describing the measures and outcomes. These will include (but not limited to), for example:

·  Medication details

·  Procedural details

·  Radiotherapy treatment details (and associated results – post treatment-effects)

·  Chemotherapy treatment details (and associated results – post treatment-effects)

·  etc

4.1  Design Considerations

The data warehouse will address the following:

·  Data granularity – Extracts and reports will need to be created at:

o  Hospital level

o  Patient level

o  Disease level (assuming a patient could have multiple diseases)

o  Diagnosis level

o  Procedural level

·  Measures – Identifies the measures and scores.

·  Dimensions – Identify the dimensions the fields that describe the measures and scores.

4.2  User’s Portal

Characteristics of the user’s portal of the web platform include:

·  The system will provide a means for registering a new patient and have the ability to append or add data to any of the tables or data-forms (their details, demographics, progress of disease, etc).

·  For each visit, a clinical ticket will be created and can be attached to the patient record.

·  Users with the appropriate permissions will be able to view other patient records and see progression of disease.

·  Researchers and statisticians (for particular trials and study’s) will be given access to de-identified data to query.

4.3  Administrator’s Portal

Characteristics of the administrator’s portal of the web platform include:

·  Each and any record can be modified and removed from the system.

·  All permissions related to each field within a table can be granted and revoked from any user on the system (setting permissions for data rights).

·  Look up lists can be maintained from this portal.

·  Patients can be re-activated.

·  Messages regarding illegitimate entries within the database (based on validation rules) will be presented to the administrator in order to be processed.

·  Users can be added to the system and appropriate permissions can be assigned.

·  Logs can be accessed in order to ascertain the times of relevant updates and additions to the system.

·  Form responses can be maintained in this section for use in any free text fields.

4.4  Extracts

Extracts can be generated directly from the data warehouse. Extracts are authorized via the web-platform and can output the resulting tables a range of desired formats. These include but are not limited to, CSV, Excel, SQL. External Programs can be used to process these results output by the web platform. We can expose the extracts via third party programs - this provides a number of benefits:

o  Familiar user interface

o  Ability to massage the data beyond what the data warehouse is doing.

Furthermore, auxiliary applications that may be desired for research purposes, i.e. SPSS, SAS, etc.

Extracting data from the data warehouse reduces a lot of issues that are common when extracting data out of a database such as:

o  Complicated table linking

o  Cross products

o  Complicated calculations and filters.

Note: These complicated calculations and filters are performed by the ETL and stored in the data warehouse inherently.

4.5  Reports

The data warehouse can also be used for reporting. Through the web-platform, the system can generate customized reports based on the CCO databook requirements. Further discussion will entail to make sure the system generates these custom reports as desired. In general, the system inherently has the ability to generate PDF and MSWord compatible formats.

Required

·  An example CCO report which will be used as a template for system reporting.

·  Discussions with Dept of Oncology researchers would need to take place in order to assess the nature of data extracts and the reports to be generated.

5  Online Analytical Processing (OLAP) Cubes

An OLAP (Online analytical processing) cube is a data structure that allows fast analysis of data. It can also be defined as the capability of manipulating and analyzing data from multiple perspectives. The arrangement of data into cubes overcomes a limitation of relational databases. Relational databases are not well suited for near instantaneous analysis and display of large amounts of data.

The purpose behind the OLAP cubes in this system is to provide the user the ability to:

·  Customize measures to expose the dataset of trends

·  Define data mining and mathematical modeling structures

·  Create reports from these resulting sets

A cube can be optimized to report on data over time e.g. year over year change. This will all be controlled directly on the front end web-platform by the user defining the type and duration of the analysis desired. Furthermore cubes can be saved for other researchers to take advantage of specific measures by the system. Permissions to the cubes may also be defined by the creator of the cube.

5.1  Reports

The following are example screen shots of online reports that use an OLAP structure.

The above figures display user-defined and system predefined reports. The user is able to change what is being reported and is able to change the category to report by. The above report displays Number of ICU Admission by Operative over time.

5.2  Data Mining

The OLAP cubes also feature a number of data mining features such as:

·  Classification Patterns

·  Mathematical Modelling

·  Clustering Methods

·  Association

·  Regression

·  Forecasting

·  Sequence Analysis

Specific mining techniques may be desired, this should be outlined once initial prototype is delivered.

5.3  External Programs

OLAP cubes can produce files that will be understood by external applications (i.e. Excel).

There are a number of advantages to using external applications:

·  Familiar program interface

·  It is installed on a large number of hospital systems

6  Other Advantages

6.1  Soft Delete

The ability to mark a record as deleted. This means that the record is not really deleted from the database but just hidden from view. To keep the data extract easy we will create views of the data that only contain none deleted records. The advantage of a soft delete is that it allows you to remove records from extracts that are not relevant

6.2  Data Security and Safety

The data collected will be stored on secure servers at I-Think Research ((Innovative Translation of Health Informatics & Knowledge Research), a state-of-the-art knowledge translation laboratory. The I-Think Research platform has been developed to ensure secure storage and protection of confidentiality of patient data. All registries and databases hosted on the site:

·  are in compliance with government privacy laws, as well as with the privacy and confidential policies of Lawson Health Research Institute, London Health Sciences Centre and St. Joseph's Health Care, London;

·  are secure, reliable, isolated from other databases and hosted behind the Lawson firewall on a server that employs a certificate and encrypted transmission of data;

·  are backed-up nightly;

·  include an audit trail that tracks any changes made to records and by who;