Introduction

Information retrieval in the area of chemical legislation is nowadays relatively complicated and time consuming process. It is very important for the general public as well as for the scientific community to be able to quickly find and organize the required information. That is why we have decided to create a freely accessible web site, in English, concerning the chemical legislation, especially dealing with the new European Union Regulation No 1907/2006 concerning Registration, Evaluation and Authorisation of Chemicals (REACH).

REACH is due to come into force June 1, 2007 and will significantly change the contemporary situation concerning chemicals. It forces manufacturers and importers of chemicals to minimize the potential negative impact of their production, by forcing them to use the best available technologies (BAT) with the aim to improve human health and environment.

Our goal was to transform the REACH Regulation into interactive electronic form and to create a web site dealing with REACH problematics. This web site should provide many additional features like keyword search, cross references and many other features facilitating the work with the Regulation and saving the time.

This publication documents the whole process of the transformation, describing all the additional features and concluding its possible contribution for the general and scientific public.

Summary

The main purpose of my work was to create a freely accessible web portal concerning the new European Union Regulatory Framework REACH (Registration, Evaluation and Authorisation of Chemicals) and to transform the REACH Regulation into electronic form.

The transformation into electronic form was successfully performed by using the XML, XSLT and Python technologies. Within the frame of my bachelor's work I focused on the legal part of the Regulation without proceeding the Annexes.

The electronic form of the REACH Regulation is nowadays part of our international law database Law-Ref featuring keyword search, cross references and having great attendance about one thousand of visitors per day.

The REACH project is still being in progress mostly by transforming the remaining Annexes and gathering additional information and references about this subject.

SOUHRN

Hlavím cílem práce bylo vytvořit volně přístupný webový portál zabývající se novým evropským systémem kontroly REACH (Registrace, Evaluace a Autorizace Chemických látek) a snaha transformovat toto nařízení do elektronické interaktivní podoby.

Pro transformaci do elektronické formy byly použity technologie XML, XSLT a Python. V rámci mé bakalářské práce jsem se zaměřil na zpracování pouze právní části nařízení REACH, přílohy k tomuto nařízení budou zpracovány později.

Elektonická verze nařízení REACH je prozatím začleněna do naší databáze mezinárodního práva Law-Ref, kterou je možné prohledávat pomocí systému klíčových slov a křížových odkazů a mající velmi dobrou návštěvnost okolo tisíce návštěvníků denně.

Projekt REACH se stále vyvíjí, především průběžným zpracováváním zbývajících příloh a získáváním dodatečných informací o tomto tématu.

1 Registration, Evaluation and Authorization of CHemicals (REACH)

1.1 REACH in general

Registration, Evaluation and Authorisation of Chemicals (REACH), is European Union Regulation (EC) No 1907/2006, adopted by the European Union Council of Ministers on 18 December 2006. REACH aims to improve the protection of human health and the environment while maintaining the competitiveness and enhancing the innovative capability of the EU chemicals industry. European Chemicals Bureau (ECB) has the responsibility of developing methodologies, tools and technical guidance needed for REACH through a number of REACH Implementation Projects (RIPs) [1, 2].

1.2 History of REACH

On 13 February 2001 the European Commission adopted a White Paper setting on the Strategy for a future Chemicals Policy.This White Paper has been subsequently modified, developed and extensively discussed with major stakeholders, resulting in the release on 29 October 2003 of the Commission's proposal (REACH). Since that REACH has been amended many times. It passed the first reading in the European Parliament on 17 November 2005 and the Council of Ministers approved it on 13 December 2005. After the second reading which was held on 13 December 2006, was REACH formally adopted by the the Council of Ministers on 18 December 2006 [1, 2].

1.3 REACH corner-stone objectives

Right from the beginning was REACH proposed with two principal aims; to improve protection of human health and the environment from the risks of chemicals whereas trying to enhance the competitiveness of the EU chemicals industry [3].

The White Paper on the Strategy for a future Chemicals Policy, published in February 2001, set seven objectives that had to be achieved within the overall framework of sustainable development. These objectives were the key elements of the creation of REACH.

The objectives are following, as stated in the White Paper [4]:

Protection of human health and and promotion of a non-toxic environment

Maintenance and enhancement of the competitiveness of the EU chemical industry

Prevention of fragmentation of the internal market

Increased transparency

Integration with international efforts

Promotion of non-animal testing

Conformity with EU international obligations under the WTO

The Regulation proposed by the Commission on 29 October 2003 achieved all the objectives identified in the White Paper and thus represented a model of sustainable development by pursuing its three main goals: economic (industrial competitiveness), social (health protection and jobs) and environmental [3].

1.4 REACH crucial ideas

The REACH system is based on the idea that industry itself is best placed to ensure that the chemicals it manufactures and puts on the market in the EU do not adversely affect human health or the environment. This requires that industry has certain knowledge of the properties of its substances and manages potential risks.

The basic elements of REACH are described below as they are set out in the Council’s Common Position [3]:

  1. All substances are covered by this regulation unless they are explicitly exempted from its scope.
  2. Registration requires manufacturers and importers of chemicals to obtain relevant information on their substances and to use that data to manage them safely.
  3. To reduce testing on vertebrate animals, data sharing is required for studies on such animals. For other tests, data sharing is required on request.
  4. Better information on hazards and risks and how to manage them will be passed down and up the supply chain.
  5. Downstream users are brought into the system.
  6. Evaluation is undertaken by the Agency to evaluate testing proposals made by industry or to check compliance with the registration requirements. The Agency will also co-ordinate substance evaluation by the authorities to investigate chemicals with perceived risks. This assessment may be used later to prepare proposals for restrictions or authorization.
  7. Substances with properties of very high concern will be made subject to authorization; the Agency will publish a list containing such candidate substances. Applicants will have to demonstrate that risks associated with uses of these substances are adequately controlled or that the socio-economic benefits of their use outweigh the risks and there are no suitable alternative substitute substances or technologies.
  8. The Restrictions provide a procedure to regulate that the manufacture, placing on the market or use of certain dangerous substances shall be either subject to conditions or prohibited. Thus, restrictions act as a safety net to manage Community wide risks that are otherwise not adequately controlled.
  9. The European Chemicals Agency (ECHA) will manage the technical, scientific and administrative aspects of the REACH system at Community level, aiming to ensure that REACH functions well and has credibility with all stakeholders.
  10. A classification and labelling inventory of dangerous substances will help promote agreement within industry on classification of a substance. For some substances of high concern there may be a Community wide harmonization of classification by the authorities.
  11. Access to information rules combine a system of publicly available information over the Internet, the current system of requests for access to information and REACH specific rules on the protection of confidential business information.

REACH creates a bridge between existing chemicals (listed in EINECS – European INventory of Existing Commercial chemical Substances) produced in tonnage over 1 ton and new chemicals (listed in ELINCS – European LIst of Notified Chemical Substances). It simplifies EU legislation by repealing approx. 40 existing Directives and Regulations and creating a single system for all chemicals. REACH will provide information on both their acute and long-term effects.

For the chemical industry, there will be an incentive to produce safer substances which will lead up to use modern and best available technologies while maintaining flexibility for chemicals used for the purposes of research and development.

All in all REACH will contribute to reduce the air, water and soil pollution as well as to reduce the pressure on the biodiversity. Improved control of substances of high concern and persistent bio-accumulative and toxic substances will ensure these substances are prevented from polluting the environment.

2 Transformation technologies

2.1 eXtensible Markup Language (XML)

MOTTO: XML isn't always the best solution, but it is always worth considering [5].

2.1.1 History of XML

The eXtensible Markup Language, abbreviated XML, was developed by XML Working Group (originally known as the SGML Editorial Review Board) [6] of 11 members headed by James Clark, who served as Technical Lead of the Working Group. This group was supported by an Interest Group of approx. 150 members, all covered by the World Wide Web Consortium (W3C). The development of XML lasted 2 years between 1996 – 1998. On February 10, 1998 XML 1.0 was finally standardized and became W3C Recommendation [7].

It might seem that XML is rather juvenile technology, but in fact this technology is not very young. Before XML there was Standard Generalized Markup Language (SGML) which was developed in early 1980s and became an ISO 8 879 standard since October 15, 1986. SGML was originally used to enable sharing of machine-readable documents in large governmental, legal or aerospace industrial projects, which have to remain readable for decades. It was extensively used in printing and publishing industry, but the primary intentions were to use SGML for text and database publishing [5, 8, 9, 10].

In 1990s Sir Timhoty John Berners-Lee (later the director of W3C) from CERN (European Organization for Nuclear Research) invented World Wide Web and Hypertext Markup Language (HTML), originally used for sharing and updating information among researchers. Nowadays the HTML is predominant markup language for the creation of web pages. On May 15, 2000 HTML became an ISO 15 445 standard and the HTM 4.01 became a W3C Recommendation [5, 11, 12, 13, 14].

2.1.2 XML in general

Generally speaking, eXtensible Markup Language (XML) is a set of rules and recommendations for designing text formats that let you structure your data. Luckily, it is not a programming language, and you do not have to be a programmer to use it or learn it. XML makes it easy for a computer to generate data, read data, and ensure that the data structure is unambiguous [5]. As mentioned before, XML is a human-readable text format markup language allowing users, if necessary, to look at their data and check them for errors in any available text editor, without the need of the program that produced them [5]. Text format is much easier to read for developers as well, enabling them more transparent and effective debugging.

2.1.3 Features of XML

The principal purpose of eXtensible Markup Language (XML) is to facilitate the sharing and transferring of data across different information systems, particularly systems connected via the Internet [15]. Great advantage is that XML supports Unicode, allowing almost any information in any written human language to be communicated.

XML became also favorite for its robustness, hierarchical structure, which is suitable for most types of documents, and logically-verifiable format. It is being increasingly used on many different systems because it is platform-independent, that makes it relatively immune to almost any technological changes [7].

2.1.4 Primary goals of XML

When the designers of XML started, they had ten design goals as stated at official site of W3C [15]:

  1. XML shall be straightforwardly usable over the Internet.
  2. XML shall support a wide variety of applications.
  3. XML shall be compatible with SGML.
  4. It shall be easy to write programs which process XML documents.
  5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
  6. XML documents should be human-legible and reasonably clear.
  7. The XML design should be prepared quickly.
  8. The design of XML shall be formal and concise.
  9. XML documents shall be easy to create.
  10. Terseness in XML markup is of minimal importance.

2.1.5 Basic XML syntax

The basic syntax of an XML document is relatively simple to understand. Each document must contain at least one, but mostly more elements. Names of elements are case sensitive, which means that the name of starting element must be exactly the same as the name of ending element. This may cause little trouble to the HTML used user.

The examples of simple XML documents are shown on the schemas below:

<simple_element>Hello, world!</simple_element>

Schema 2.1: Simple well-formed XML document with only one element.

<simple_element>Hello, world!</SIMPLE_ELEMENT>

Schema 2.2: Simple not well-formed XML document (mismatched case).

<simple_element>

<xxx>Hello, world!</xxx>

<yyy>How are you?</yyy>

</simple_element>

Schema 2.3: Simple well-formed XML document, having one root element (simple_element) and two child elements (xxx, yyy) containing text.

Each element, even the root element, may contain one or more attributes. Each attribute must have its name and value. The attribute value must be written in proper form and must be double quoted.

<friends count=”3”>

<person age=”20”>Mark</person>

<person age=”25”>John</person>

<person age=”18”>Paul</person>

</friends>

Schema 2.4: Well-formed XML document, having one root element (friends) with its attribute (count) and three child elements (person), each having attribute (age).

As obvious from the previous examples, each document must contain exactly one root element, sometimes called document element. Root element is the top element of the document tree in which all the other elements are nested.

2.1.6 Rules of XML

Even though XML is relatively easy to learn, there are still some unavoidable rules how an XML document may look like. The two main criteria determining weather the document is correct or not are well-formedness and validity.

Well-formed document conforms to all of XML's syntax rules. There are only two possibilities; the document is well-formed or the document is not well-formed, no other option is admissible. That is one of the main features that differs XML from HTML. HTML allows writing a document that contains unclosed or mismatched elements, illegal characters or many other faults which are unallowable in XML. This makes XML so stable, reliable and outstanding for data-storing and data-manipulation.

Valid XML document have to be definitely well-formed and must conform to a particular schema like DTD (Document Type Definition), Relax NG, XML schema etc. These schemas bear the information about the document type and structure. They define names, content, types and the number of elements allowed in the document as well as element attributes and their values and many others fundamental rules [7].

2.1.6.1 Illustration of different types of schemas
Document Type Definition (DTD)

The oldest, but still widely used schema for validation of XML documents is Document Type Definition (DTD) schema, inherited from SGML. DTD is quite simple to understand and its main advantage is that the DTD code is relatively short and easy to write, which saves time.

By reason of DTD is the oldest schema and was originally developed for SGML validations, it has some limitations, especially in conjunction with the latest XML 1.0 and 1.1 versions [7]:

It has no support for newer features of XML, most importantly namespaces.

It lacks expressivity. Certain formal aspects of an XML document cannot be captured in a DTD.

It uses a custom non-XML syntax, inherited from SGML, to describe the schema.

Relax NG

The newer and more powerful schema for validating XML documents, the Regular Language for XML Next Generation (Relax NG), is one of the most used by the XML community.

The key features of Relax NG are that it: [16, 17]

is simple

is easy to learn

has both an XML syntax and a compact non-XML syntax

does not change the information set of an XML document

supports XML namespaces

treats attributes uniformly with elements so far as possible

has unrestricted support for unordered content

has unrestricted support for mixed content

has a solid theoretical basis

can partner with a separate datatyping language (such W3C XML Schema Datatypes)

Relax NG was based on TREX designed by James Clark and RELAX designed by MURATA Makoto. The Relax NG specifications have been developed within OASIS by the Relax NG Technical Committeee [16].

2.2 Extensible Stylesheet Language Transformation (XSLT)

MOTTO: Process like a tree, think like a document, and you will be fine … [18]

2.2.1 XSLT in general

Extensible Stylesheet Language Transformation (XSLT) is an XML-based language designed to transform XML documents into other XML documents or into HTML, XHTML or many other types of documents [19].

2.2.2 Why XSLT

There are many ways to transform an XML document. You may create your own program that will cooperate with applications, whose objective is to analyze the XML code, but this requires the creation of your own programming code. By choosing XSLT you may achieve the same results without programming. Instead of writing your own code in e. g. Visual Basic, Java, or C++, you may manipulate with the content of an XML document in a different way: via XSLT easily choose, what you want to do and the XSLT processor will do it on behalf of you. And that is the nub of XSLT. With regard to its simplicity XSLT become very important thing in the world of XML [20].

2.2.3 XSL vs. XSLT

There might be a little confusion on what is the real difference between XSL and XSLT. The answer is relatively simple, as you can see from the scheme below: