DTI004ETL Procurement Project BriefV010321/04/2017

DTI004–ETL Procurement

Scope

The scope of this project is to assess, and if necessary procure, the use of a software tool for ETL (Extract-Transform-Load) within the EDW environment.

The tool will be used to create and maintain robust live technical processes to extract data from operational systems, to validate and transform the data and then load information into EDW (Enterprise Data Warehouse).

The project will fully comply with the universities procurement guidelines and policies.

A separate project will be created to install and productionise the chosen ETL tool.

This is part of the EDW programme, under the Digital Transformation portfolio.

Background

The creation of EDW will provide the data and information needed by decision makers across the University, giving a consistent view of data across multiple sources, adhering to consistent data definitions. This is in line with the University’s BI/MI Strategy, our reference architecture and the growing requirement for evidence-based planning and decision making.

ETL is a key component of the EDW programme, allowing data to efficiently flow from operational systems into EDW and be validated and transformed through the different layers within EDW. This is represented by the brown arrows in the diagram below.

EDW Landscape

(EDW Landscape V0101 07/04/2017 @ CJL)

ETL View

The brown arrows above can be further expanded as shown below.

ETL
Stage / Objective / Notes
ETL1 / Objective is to (a) populate the Data Foundation layer in a timely and accurate fashion (b) minimise performance impact on Operational Systems / Extract of source data; full or incremental.
Transform/prepare data prior to load into Data Foundation layer
Incremental load into Data Foundation layer
Monitoring of Data Foundation load progress/completion
Data quality alerts
ETL2 / Objective is to populate the Data Access layer from in a timely and accurate fashion / Transform data prior to load into Data Access Layer
Load from Data Foundation into Data Access
Monitoring of Data Access load progress/completion
Data quality alerts

The ETL marketplace is somewhat complex, and there are many vendors occupying particular niches.It also crosses boundaries into the Data integration, Data Analytics and Data Science areas. Cursory market research reveals over 20 vendors who may have a product suitable for use with EDW.

It is therefore important that sufficient pre-market engagement is done to properly assess and engage with the marketplace and to ensure we create a quality set of requirements that will enable us to segment the marketplace as efficiently as possible.

Objectives and deliverables

The deliverables are prioritised using the MoSCoW prioritisation method

(M=Must Have; S=Should Have; C=Could Have; W=Want

  • M= has to be satisfied for the final solution to be acceptable in terms of delivery dates, compliance, viability etc.
  • S= high-priority requirement that should be included if possible -workarounds may be available
  • C= a nice-to-have requirement
  • W= want but will not be part of this project)

Objective / Deliverable
/ Priority / Responsibility of
O1 To choose the best possible extract-transform-load process tool for EDW, according to the defined specification.
‘Best possible’ means that there must be a balanced choice made in terms of cost, benefit, ease of use, ease of implementation etc, over the life-cycle of the tool and EDW.
D1. The award of a contract to purchase an ETL tool,
or make a decision to use currently available tooling,
or to use no tooling at all / M / Project Team
O2Conduct sufficient analysis & Market research to ensure that most appropriate ETL and data integration tools are included within the Tender phase, by effective use of mandatory and quality requirements.
D2. Market analysis Document / M / ProjectTeam
O3 The creation of a set of technical requirements thatdefine what we expect an ETL to do. This will be written with MoSCoW ratings.
This will feed into the overall specification.
D3. Technical Requirement Document
/ M / Project Team
D4. Specification Document
/ M / Project Team
O4 The inclusion of technical requirements that are related to, but not core to an ETL EDW tool. For example, Metadata mgmt
D5. Technical Requirement Document (supplement)
/ S / Project Team
O5 The creation of an approved procurement strategy document.
D6. Procurement Strategy Document
/ M / Procurement
O6 The execution of a procurement process, in accordance with the guidance, chosen strategy, policy and procedures of the University.
D7. A conformant and well documentedprocurement process. / M / Project Team

Out of Scope

What / Details
OS01 / The implementation of the chosen ETL Tool. / This will form part of a separate implementation project.
OS02 / Any associated training and consultancy requirements. / This will form part of a separate implementation project.
OS03 / Considerations for the use of ETL tools by areas / processes not related to EDW. / The primary goal of the goal is to select a tool for use within the EDW environment.
It’s possible that the chosen tool could be used in other areas of the university, for example to support or enhance existing BI/MI processes. Whilst this may be a possible side benefit, the project will not include this within scope.

Benefits

The benefits have been accurately described within the Digital transformation EDW proposal.

See:

Success Criteria

What / Details
SC1 / To reach an agreed choice for an ETL tool.
SC2 / To reach an agreed choice of ETLwhilst respecting all the appropriate procurement policies and procedures.
SC3 / To reach an agreed choice of ETL broadly in line with the estimated budget (e.g. budget +/- 10%).
SC4 / To reach an agreed choice of ETL broadly within the time-line set by the project e.g. 6 months.

Project milestones

(See projects website)

Priority and Funding

This is a priority one project, funded as part of the EDW programme, within the Digital transformation initiative.

The initial internal estimate is 120 days, spread across IS applications. This excludes procurement resource.

In addition a nominal £5k is estimated for external consultancy services.

Impact and Dependencies

The project is dependent upon outputs from COM029 PofC.

A project to install and productionise the chosen ETL tool (DTI005) will follow on from DTI004.

A project to productionise EDW DTI006 can run in parallel with DTI004-DTI005.
However there is an implicit assumption that we must choose a ETL tool which is fully compatible with our chosen database platform e.g. Oracle.

The initial business project(s) will use the chosen ETL tool if it is available, or utilise a stopgap ETL solution. It is accepted that the first tranche of business data acquisition should not be held up by this procurement.

High Level Work Breakdown Structure

Phase / Activity / Notes
A / Development
A01 / Project Planning
A02 / Develop Strategy / Define procurement Strategy
A03 / Pre-Market Engagement / Conduct Market research
Conduct x Demos, Q&A sessions
A04 / Develop Documents / Create Technical Requirements
Create Specification
Create Selection and award citeria
Create PQQ (if applicable)
Create ITT
B / Tender
B01 / Supplier Selection / Issue PQQ if applicable.
B02 / Open & Evaluate Tender / Issue ITT
Presentations / site-Visits
Evaluations
C / Contract
C01 / Post Tender Clarification
C02 / Contract Award
D / Project Close

Risks

(See projects website)

Stakeholders Communication Plan

(To be added to projects website)

Who / Project
Stakeholder / ETL Tool
Evaluator / Role / Communication Plan
Alex Carter / X / Programme Owner
Service Owner / - Attends Monthly Programme meeting
- Receives regular updates from Project / Programme Manager
Alison Armstrong / X / DTI Portfolio manager / - Attends Monthly Programme meeting
- Receives regular updates from Project / Programme Manager
Alison Johnson / X / Procurement Manager resp. for all IS procurement / - Receives Monthly Project report
Ana Heyn / X / X / App Mgmt Lead / - Attends regular team meetings
- Member of evaluation panel
Chris Lawford / X / X / Programme / Project Manager / N/A
Craig Middlemass / X / MI/BI SME / - Receives Monthly Project report
Dave Berry / X / Sponsor / - Attends Monthly Programme meeting
- Receives regular updates from Project / Programme Manager
Geir Granum / X / X / Dev apps (BI) lead / - Attends regular team meetings
- Member of evaluation panel
George Reid / X / X / Procurement Manager / - Attends regular team meetings
- Member of evaluation panel
Gillian Henderson / X / X / Dev Tech lead / - Attends regular team meetings
- Member of evaluation panel
Heather Larnach / X / Tec h Mgmt Lead / - Receives Monthly Project report
Iain Fiddes / X / Senior resource manager / - Attends Monthly Programme meeting
- Receives regular updates from Project / Programme Manager
John Grzesinski / X / X / Service Mgmt Lead / - Attends regular team meetings
- Member of evaluation panel
Mark Ritchie / X / Senior Supplier / - Attends Monthly Programme meeting
- Receives regular updates from Project / Programme Manager
Rob Manley / X / X / Dev Apps lead / - Attends regular team meetings
- Member of evaluation panel
Wilbert Kraan / X / X / IA Lead / - Attends regular team meetings
- Member of evaluation panel
TBA / X / Service Operations Manager / - Receives Monthly Project report

Glossary of Terms

Term / Description
Access Layer / A conceptual space within EDW.
Data is transferred from the Foundation layer, using ETL tooling.
It is the layer that contains the solution specific data marts and start schemas, and is designed for targeted, quick and efficient access.
The access layer utilises the concept of dimensional modelling.
It is normally considered an externally focussed space, with many users accessing data via tools such as Business objects etc.
BI / Business intelligence (BI) can be described as “a set of techniques and tools for the acquisition and transformation of raw data into meaningful and useful information for business analysis purposes”.
Business Objects (BO) / Part of SAP BI Suite.
Toolset used to creates the semantic layer (universe) and reports.
Dashboard / In management information systems, a dashboard is “an easy to read, often single page, real-time user interface, showing a graphical presentation of the current status (snapshot) and historical trends of an organization’s or computer appliances key performance indicators to enable instantaneous and informed decisions to be made at a glance.
Data Mart / A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data mart is a subset of the data warehouse that is usually oriented to a specific business line or team. Data marts are small slices of the data warehouse.
A popular form of Data Mart is a Star Schema.
Data Model / A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world.
Data Warehouse (DW) / A data warehouse (DW or DWH) is a system used for reporting and data analysis. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating analytical reports for knowledge workers throughout the enterprise.
Dimensional Data Mart
/ See Star Schema
DW / See Data Warehouse
EDW / Enterprise Data Warehouse
Generic term for the Corporate Data Warehouse, storing data from across the University.
The Enterprise Data Warehouse service will:
  • provide a platform to store University data translated into University terminology for management information and business intelligence purposes
  • manage a service for the extraction and translation of data from source systems ensuring that data is consistent with the source system
  • provide high quality, performant, access to the data in the EDW for consumption by BI tools
  • curate data definitions and purpose for each of the EDW elements
  • advise and respond to changes required to support strategic decision making and University reporting needs
  • reduce the coupling with source systems and BI tools, thus simplifying the migration from one vendor to another

ETL / Extract, Transform and Load (ETL) refers to a process in database usage and especially in data warehousing that:
- Extracts data from operational systems
- Transforms the data for storage in a proper, long-term format
- Loads it into the final target subject area.
Foundation Layer / A conceptual space within EDW.
Data is transferred from the Staging layer, using ETL tooling.
It is the layer that contains the enterprise data model, and is at the very heart of EDW.
The enterprise data model utilises the concept of hyper generalised modelling.
It is normally considered an internal space, with only very specialist users able to gain access.
Hyper Generalised Modelling (HG) / A technique for modelling data and its relationships.
This is used within the foundation layer.
MI / A management information system (MI or MIS) focuses on the management of information systems to provide efficiency and effectiveness of strategic decision making. The concept may include systems termed decision support system, expert system, or executive information system.
Normalisation / De-normalisation / Database normalisation is the process of organizing the columns (attributes) and tables (relations) of a relational database to minimize data redundancy.
Normalization involves decomposing a table into less redundant (and smaller) tables without losing information; defining foreign keys in the old table referencing the primary keys of the new ones. The objective is to isolate data so that additions, deletions, and modifications of an attribute can be made in just one table and then propagated through the rest of the database using the defined foreign keys.
Typically used in OLTP systems, not designed for efficient reporting.
Implies that one piece of data (e.g. Building name) is only stored in 1 place.
De-normalisation is the process of attempting to optimize the read performance of a database by adding redundant data or by grouping data.
This is a technique typically used in Dimensional (Star Schema) Data marts and are designed for efficient reporting.
Implies that one piece of data (e.g. Building name) could be stored in many places.
OLAP / On line analytical processing, or OLAP is an approach to answering multi-dimensional analytical queries swiftly. OLAP is part of the broader category of business intelligence, which also encompasses relational database, report writing and data mining.
OLTP / On line transaction processing, or OLTP, is a class of information systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing. E.g. Archibus, eFinancials.
Semantic Layer / A semantic layer is a business representation of corporate data that helps end users access data autonomously using common business terms. It maps complex data into familiar business terms such as product, customer, or revenue to offer a unified, consolidated view of data across the organization.
This is the purpose of a Business objects universe.
Staging Layer / A conceptual space within EDW.
The staging layer is used for ETL activities such as loading, validating and transforming data.
It is normally considered an ‘internal’ space, not accessible to users.
Star Schema / Star schema is the simplest style of data mart schema and is the approach most widely used to develop data warehouses and dimensional data marts. The star schema consists of one or more fact tables referencing any number of dimension tables.
Also known as dimensional data marts.
Subject Area
/ The foundation layer within EDW will conceptually be made up of business specific containers known as subject areas.
These subject areas will be built over time, as and when business driven projects deliver data into EDW. The subject areas can be linked together in many different ways, to provide enterprise wide information.
Universe
/ A semantic layer provided by Business Objects

Page 1 of 11