Project: XML-Publishing - Implementation Strategy
Contract: Specific Contract 17101.2006.001-2006.457
Prepared by: VBE, CBO Reviewed by: CBO
Version 2.0 / Date Updated: 18/05/2007
Status: Company Approved Page 3/44
STIS
Statistical Information Systems / Consortium
INTRASOFT INTERNATIONAL S.A.
and
AGILIS S.A.

European Commission – EUROSTAT/B3

Framework Contract 14200/2005/007-2005/699 - Lot 1

Specific Contract 17101.2006.001-2006.457

‘XML-Publishing’

Implementation Strategy of an XML-based publishing

in Eurostat

D1.2: Design of an XML-based production workflow for Eurostat publications and the related implementation strategy

May 2007

Document Service Data

Type of Document / Project deliverable
Reference: / D12 Design of XML-based workflow v2.0.doc
Issue: / 2 / Revision: / 0 / Status: / Company Approved
Created by: / Victorio Bentivogli, Christian Boudot / Date: / 18/05/2007
Distribution: / EU-Eurostat, Intrasoft International S.A.
Contract Full Title: / XML-Publishing - Implementation Strategy
Service contract number: / Specific Contract 17101.2006.001-2006.457
For Internal Use Only
Reviewed by: / Christian Boudot
Approved by: / Mario Fendler

Document Change Record

Issue/Revision / Date / Change
0.1 / 15/01/2007 / Draft document
0.2 / 26/02/2007 / Updated draft
0.3 / 02/03/2007 / Updated draft
0.4 / 07/03/2007 / Updated draft
1.0 / 01/04/2007 / Delivery version
1.1 / 10/04/2007 / Updated draft
1.2 / 12/04/2007 / Updated draft
1.3 / 25/04/2007 / Updated draft
1.4 / 02/05/2007 / Delivery version
1.5 / 08/05/2007 / Updated version incorporating PB comments (email 08/05/2007)
2.0 / 18/05/2007 / Delivery version


Table of contents Page

Project: XML-Publishing - Implementation Strategy
Contract: Specific Contract 17101.2006.001-2006.457
Prepared by: VBE, CBO Reviewed by: CBO
Version 2.0 / Date Updated: 18/05/2007
Status: Company Approved Page 3/44

1 Introduction 6

1.1 Purpose 6

1.2 Scope 6

1.3 Structure 6

1.4 References 6

2 An XML based publishing solution for Eurostat 8

2.1 Solution components 8

3 Background technical information 9

3.1 Content management services / Enterprise Content Management 9

3.2 Workflow services / Workflow Management Systems 9

3.2.1 Types of workflows 10

3.2.2 Workflow Management Systems 10

3.3 Transformation services 10

3.3.1 Types of transformation applications 11

3.4 Collaborative services 11

3.4.1 Collaborative applications and Content Management 11

4 Content Management framework selection 13

4.1 Requirements 13

4.1.1 Non-functional requirements 13

4.1.2 Functional requirements 15

4.2 Selection criteria 16

4.3 Alternatives 16

4.3.1 Alfresco 16

4.3.2 Drupal 17

4.3.3 Jahia 17

4.3.4 Joomla! 17

4.3.5 Plone 18

4.3.6 Comparison 18

4.3.7 Documentum 22

4.3.8 Our experience 22

4.4 Conclusion 23

4.4.1 Alfresco at a glance 23

5 Workflow framework selection 25

5.1 Requirements 25

5.1.1 Non-functional requirements 25

5.1.2 Functional requirements 26

5.2 Selection criteria 27

5.3 Alternatives 27

5.3.1 JBoss jBPM 27

5.3.1.1 Key advantages of using JBoss jBPM 28

5.4 Conclusion 29

5.4.1 JBoss jBpm + Alfresco at a glance 29

5.4.2 Workflow example 30

5.4.3 A unified application environment 31

6 Transformation framework selection 33

6.1 Requirements 33

6.1.1 Non-functional requirements 33

6.1.2 Functional requirements 34

6.2 Selection criteria 34

6.3 Alternatives 35

6.3.1 OpenOffice 35

6.4 Conclusion 35

7 Collaboration framework selection 36

7.1 Requirements 36

7.1.1 Non-functional requirements 36

7.1.2 Functional requirements 36

7.2 Selection criteria 37

7.3 Alternatives 37

7.4 Conclusion 37

7.4.1 A unified user experience 37

8 Requirements correlation matrix 38

8.1 Objectives 38

8.2 Requirements 39

8.3 Functionalities 40

9 Implementation strategy (pilot) 42

9.1 Prototypes 42

9.1.1 Prototype 0 42

9.1.2 Prototype 1 42

9.1.3 Prototype 2 42

9.2 Construction of pilot application 43

9.3 Coverage 43

9.4 The complete set of products 43

9.5 Planning 43

9.6 Acceptance criteria 43

9.7 Cost and effort estimation 44

Project: XML-Publishing - Implementation Strategy
Contract: Specific Contract 17101.2006.001-2006.457
Prepared by: VBE, CBO Reviewed by: CBO
Version 2.0 / Date Updated: 18/05/2007
Status: Company Approved Page 3/44

Table of Figures

Project: XML-Publishing - Implementation Strategy
Contract: Specific Contract 17101.2006.001-2006.457
Prepared by: VBE, CBO Reviewed by: CBO
Version 2.0 / Date Updated: 18/05/2007
Status: Company Approved Page 3/44

Figure 1 - JBoss jBpm process designer 28

Figure 2 – Alfresco Workflow definition 29

Figure 3 – Alfresco publication workflow (diagram) 30

Figure 4 – Alfresco publication workflow (source) 31

Project: XML-Publishing - Implementation Strategy
Contract: Specific Contract 17101.2006.001-2006.457
Prepared by: VBE, CBO Reviewed by: CBO
Version 2.0 / Date Updated: 18/05/2007
Status: Company Approved Page 3/44

1  Introduction

1.1  Purpose

This documents stands for the deliverable “D1.2 Design Document” and is the outcome of Task 1 “Design of an XML-based production workflow for Eurostat publications and the related implementation strategy”.

The main objectives of this work are:

·  Design of an XML-based production workflow

·  Strategy for the implementation of a prototype in order to deploy the proposed design;

1.2  Scope

The overall project scope covers the tasks and sub-tasks listed below.

The current document focuses on Task 1 – Implementation Strategy, more specifically on sub-task 1.2, in which an XML-based workflow is proposed:

1.3  Structure

The structure of this document is as follows:

Section 2 introduces the XML based publishing solution and its required components.

Section 3 revises some key background elements like “Enterprise Content Management Systems“ and “Document oriented workflows”.

In sections 4, 5, 6 and 7 we evaluate different alternatives and propose the key building components to support the XML based publication workflow of Euroscript.

Finally in section 6, we present an implementation strategy for deploying a prototype that can be used to validate the overall design of the proposed solution.

1.4  References

This document references:

Reference / Document/Resource Name / Filename
[R1] / D0.1: Minutes of the Project Kick-Off Meeting on 13. Nov. 2006 / Kickoff Meeting 2006-11-13 v1.2.doc
Reference / Document/Resource Name / Filename
[R2] / Eurostat publishing system (EPS) – Working document – 06/12/2006 / 2006 12 06 XML based publication system.doc
[R3] / D1.1: Analysis of the publications programme, dissemination process and data life cycle / D11 Analysis XML-Publishing v1.1.doc

2  An XML based publishing solution for Eurostat

As stated previously, the purpose of this project is to design of a strategy that allows the gradual introduction of an XML based publishing process for Eurostat publications.

A key component for the project is an XML based publishing “solution” or software application designed to help Eurostat to achieve its goal.

2.1  Solution components

The XML based publishing solution for Eurostat can be seen as a set of cooperating modules or components designed to support Eurostat’s business process.

In principle, these components should provide:

·  Content management services, in order to store, secure and manage the different kinds of documents (publications) produced by Eurostat.

·  Workflow services, in order to enforce the business rules required by the publication production process.

·  Transformation services, in order to facilitate the conversion process required by Eurostat’s distribution process including HTML, Web optimized PDF and Print optimized PDF versions of publications.

·  Collaboration services, in order to facilitate the collaborative production of publications by Eurostat’s contributor base (employees, contractors, etc.).

It is important to highlight that the XML based publishing solution will be a “key” system of Eurostat, but not the only one involved in the production process. Statpub, a custom developed application that manages the administrative process associated to the production of publications will have to cooperate with the proposed solution.

Eurostat specifically emphasised the importance of providing a “user friendly application environment” for its community of users.

3  Background technical information

This chapter will present background information about the different technologies required in order to construct an XML publishing solution to support Eurostat’s publishing workflow.

3.1  Content management services / Enterprise Content Management

Enterprise Content Management (ECM) is any of the strategies and technologies employed in the information technology industry for managing the capture, storage, security, revision control, retrieval, distribution, preservation and destruction of documents and content. ECM especially concerns content imported into or generated from within an organization in the course of its operation, and includes the control of access to this content from outside of the organization's processes.

Enterprise Content Management services typically include among others:

·  Versioning (check-in/check-out)

·  Metadata management (properties that describe documents)

·  Rendition management (instances of a document in different formats)

·  Lifecycle management

·  Workflow

·  Security

There are a great variety of Content Management Systems available, including IT applications that weren’t created to manage content but evolved in order to support this process.

! / After an initial review (during the kick-off meeting) of the business process associated to Eurostat publications (See [R3]), it is clear that an application supporting this process would benefit from ECM services.

3.2  Workflow services / Workflow Management Systems

Workflows are the popular means of coordinating or composing activities in side and across organizations. This section will introduce this concept in the context of the publication process.

3.2.1  Types of workflows

Workflows have supported document-oriented exchange of information in an enterprise. There are several “types” of workflow patterns:

·  A common type of workflow is the one based on business-component scripting. Flow in such systems is represented as business component interfaces. Each business component encapsulates its internal business operations. Access to these operations is provided through the invocation of operations. It contrasts with the document-oriented system because the “document" or actual data in a business component system is modified by the encapsulating entity. Furthermore, the workflow in such systems is managed as a process model.

·  Another form of workflows is based on messaging. In message-oriented systems, the individual business entities collaborate by sending and receiving messages from other member entities. These messages may be conveyed directly, or through a publish/subscribe methodology.

·  In a document-oriented workflow system, participants access a document and update the part that they are concerned with. This document is then circulated to all interested parties.

An Enterprise Content Management System usually implements some variation of document-oriented workflow.

! / A document-oriented workflow system is clearly the best option to support the publication process of Eurostat.

3.2.2  Workflow Management Systems

A workflow management system (WfMS) is a system that completely defines, manages, and executes workflows through the execution of software whose order of execution is driven by a computer representation of the workflow logic. In other words, the WfMS is responsible for implementing the workflow model.

A process instance is an execution of the workflow model and it is composed by logical units of work, usually known as activities.

3.3  Transformation services

Transformation services can be seen as a set of tools and technologies that facilitate the exchange of information or documents among different applications or between an application and a consumer (person).

3.3.1  Types of transformation applications

There are different approaches to content transformation:

·  The programmatic approach, where a source file is translated into a target format by interpreting its binary content. This approach requires complete understanding of the source and target formats, being usually quite complex while offering better performance.

·  The “helper approach”, where existing applications that can manage the source and target formats are “driven” to cope with the transformation process. This approach is simpler and widely adopted in the field of Enterprise Content Management (for example: Documentum Content Transformation Services and Alfresco’s OpenOffice integration).

The helper approach offers an advantage since it can be easily adapted to support new versions of existing formats (since they do not rely on the binary formats but on public APIs that do not tend to change as fast).

! / A “helper” approach is in principle the best fit for the XML based publishing solution.

3.4  Collaborative services

The publication process is definitely of a collaborative nature. The XML based publishing solution will have to offer a “collaborative user experience” either by using an existing framework (Wiki, dedicated portal) or by a customised user interface.

3.4.1  Collaborative applications and Content Management

Enterprise Content Management Systems are not the only applications that manage content. Some collaborative applications have evolved in order to let contributors work on content assets. The most popular kind of application in this area is the Wiki.

Even though Wikis are a great collaborative tool and that they could be adapted in order to cope with XML content, they have certain disadvantages in the context of this project compared to an Enterprise Content Management System:

·  Difficulty to manage content in a structured way,

·  Difficulty to formalize the collaboration process using a workflow,

·  Lack of the “Content type” concept

·  Limited transformation capabilities

·  Difficulty to integrate it with custom applications to operate on content

! / Even though a publication process can be seen as a collaborative task, the “freedom” provided by a “Wiki” could produce a negative impact on important issues like overall control and security, two key requirements of Eurostat’s publication process.

4  Content Management framework selection

This chapter will:

·  summarise the requirements of the XML based publishing application for Eurostat,

·  present a selection criteria,

·  enumerate and compare available frameworks/technologies/products, and

·  finally propose a candidate application.

4.1  Requirements

This section will present each functional/non-functional requirement. The list of requirements will be completed with additional non-functional requirement deduced from the technical sections of this document. The requirements will be grouped in categories and a priority will also be assigned.

4.1.1  Non-functional requirements

Architecture

Id / Priority / Name / Description
C-001 / High / Service Oriented Architecture (SOA) / The solution should support a flexible Service Oriented Architecture (preferably including support for: SOAP, WSRP, WebServices and Spring Remoting).

Infrastructure

Id / Priority / Name / Description
C-002 / High / RDBMS support / The solution should use a current version of Oracle (10g) as its RDBMS.
C-003 / High / Operating Systems support / The solution should support multiple Operating Systems (Windows, Solaris).
C-004 / Medium / Container support / The solution should be deployed on a JSP (Tomcat preferably) container without the requirement of a J2EE Application Server.
C-005 / High / High availability / The solution should be capable of providing zero downtime through the use of techniques like Hardware/Software Load Balancing, HTTP Failover, Session Replication, and Distributed Caching.
Id / Priority / Name / Description
C-006 / High / Clustering / The solution should support multi-tier clustering, using any combination of multiple tiers presentation, service, business logic, and database.

Standards