Anhang: Beschreibung Der Abzubildenden Datenbereiche (Datenmodell / Objektmodell)

Annex X.2Refined object models for C+E Data

Ch. U. Germeier, Julius Kühn Institute

Introduction

Materials and Methods

Results and Discussion

A Pattern – Oriented Approach to Scientific Information Systems

The Domain Model

A)Credentials: Partner – Person – Address – User

B)Workingcollection - Transaction – Seedtransfer

C)Seed Multiplication and Seed Stock

D)Germination testing: GerminationMethod – GerminationTest – Germination – GerminationSample

E)Planning of evaluation experiments

F)Evaluation methodology: Methodology – Descriptor – Method

G)Field experiments: FieldExperiment – Treatment – ExperimentSite

H)Fieldplan: Experiment – Lane – Plot

I)Observation data: EvaluationArchive – Measurement – Rawdata – Observation

Examples for patterns used throughout the application

1)A data access pattern as generated by SeamGen amended to serve the identity checking pattern

2)The Sorter and Comparator Pattern

3)The Identity Checking Pattern

Introduction

An information system is developed, which primary task is to support coordinated action in field and lab work. All observations should be documented on a plot base, which will allow for calculation by various biometrical approaches (e.g. mixed model analysis of augmented designs, neighbour analysis, and geo-statistical approaches. Services should be provided especially for the set up of field experiments, to assure orthogonal sets of accessions and a uniform randomisation approach at the different sites.

Materials and Methods

The European Avena Database is available in a relational design of 75 tables on an Oracle Relational Database Management System (RDMS). An additional project database has been developed and set up on a MySQL RDBMS. Web applications are developed in Java Enternprise Edition 5 technology, based on the frameworks Hibernate (Bauer and King, 2007), JSF (Schalk and Burns, 2007) and SEAM (Allen, 2009). SeamGen was used to generate an application framework.

The Unified Modeling Language (UML) is a standard for graphical representations of object oriented software systems. It contains 13 types of diagrams. Class diagrams are very close to object oriented code, which consists of classes, their attributes and relationships. Software tools are available to create code from class diagrams (forward engineering) and class diagrams from code (reverse engineering). For generating class diagrams the Sparx Systems Enterprise Architect ( was used. It supports forward (from UML to source code) and reverse (from source code to UML) engineering for multiple programming languages. The UML gives a graphical representation of the object oriented code blocks (classes) and allows generation of the code structure in the various supported object oriented programming languages by the UML tool.

Results and Discussion

A Pattern – Oriented Approach to Scientific Information Systems

Reusability is an important issue to reduce software development effort and to base it on well debugged and stable components. Horizontal and vertical differentiation into small, well defined and independently exchangeable components is the main strategy. Horizontal differentiation leads to a multi-tier system with the user interface (view) at the top, a tier of backing and controller (control) classes below, then a tier of domain classes and the business logic (model) and a tier managing database access (DAO) at the bottom. A vertical differentiation represents the various domain objects: scientific concepts like accessions, field observations, observation traits and descriptors, experiments and plots.

One of the important gains of modern software engineering approaches is making software communicable. The object oriented programming paradigm (OOP) divides a software system into encapsulated parts (objects), which are instances of user defined types (classes) and contain attributes and methods. These objects fulfil certain technical tasks or mimic objects of the real world. The design pattern approach (Gamma et al. 1995) groups objects with defined functionality into patterns, which represent typical solution strategies for typical requirements.Design patterns have been described on various levels, e.g. architectural patterns describing large subsystems of a software system, classical design patterns as patterns of interacting objects and idioms as low level patterns.

The concept of different tiers in a software system is a typical architectural pattern. Objects representing different types of functionality are grouped into different layers – e.g. the layer of data access (DAO), the layer of business or domain logic (model) and the layer of presentation (view and backing controllers). The model layer ideally represents a scientific sound model of the domain and should be based on domain ontology. Entity relationship modelling (data model) and domain modelling (classes in the model tier) will in most cases be in close correspondence. Bauer and King (2007) describe strategies to use in cases of so called object relational mismatch (e.g. concepts of object orientation, which are not represented in relational modelling like extension). The presentation layer is largely determined by features and design of the graphical user interface.

A general pattern is seen in information systems dealing with scientific objects, which is targeted to their comparison and the question whether they represent different or identical concepts or objects in the real world. Typical examples are in taxonomy, where with each organism found in nature the question arises, whether it represents an already described taxon or a new one. In case of characterisation and evaluation it is of interest, whether the trait observed has been already observed and the observation methodology used is identical with observation methodology observed in other studies. This is normally not known, when entering the data. To make observations comparable the information system should be able to trigger the detection of such identities when entering the data. Patterns used throughout the application are described in the last paragraph.

The Domain Model

The following paragraphs focus on the data model in comparison with the classes in the model (domain logic) tier. Each paragraph treats domain objects, which should be considered as separate module in the information system.

A) Credentials: Partner – Person – Address – User

This module is currently only used for user authentication. It could be expanded to an address and contacts managing system, but this was not considered in the focus of the AVEQ project. As person a natural person is considered with attributes of first names, last name etc. Address represents a postal address with attributes like country, zip code, city, street, organisation name. Partner represents affiliations of persons to addresses. One person can have multiple affiliations, multiple persons can be affiliated to one address (n:m relationship). Attributes like department, office, telephone, telefax, email belong to a certain affiliation of a person and thus are attributes of partner. User represents an account of a person in an information system with the attributes user account, password and role. The role determines the privileges a user has in the information system. Figures 1 and 2 show the data and object model respectively.

Figure 1: Data model of a contacts and user management module

Figure 2: Object model of a contacts and user management module

B) Workingcollection - Transaction – Seedtransfer

The project working collection is part of an accession table of the central crop database. It represents those accessions, which have been selected for potential use in the project and only those passport attributes, which have been considered relevant for the project. Figures 3 and 4 show data and objects for the working collection and the distribution of accessions.

The transaction models the relationship between a sender and a receiver. It normally involves a list of accessions (seed transfers). Seed transfers relating to a project contain accessions from the working collection of this project.

Figure 3: Data model of a working collection and seed distribution module

Figure 4: Object model of a working collection and seed distribution module

C) Seed Multiplication and Seed Stock

To avoid effects of seed provenance and different seed quality of accessions coming from different genebanks seed multiplication at one or few multiplication sites should precede the evaluation experiments. In the work flow of multiplication or evaluation a planning phase needs to be separated from the realisation phase. The multiplication represents a plan for seed multiplication at a certain organisation in a certain year for a certain project. It may involve a list of multiple accessions (seed multiplications). The realisation of a multiplication is a (multiplication) field experiment (see H).

Seed multiplications have to start from initial seed stocks originating from seed acquisition by seed transfers and they result in setting up new or increasing seed stock. A seed stock is a sample of seed in a store. Normally harvests from different years or even from different multiplication plots are kept separate in the store. Data and object model are shown in Figures 5 and 6.

Figure 5: Data model of a multiplication planning module

Figure 6: Object model of a multiplication planning module

D) Germination testing: GerminationMethod – GerminationTest – Germination - GerminationSample

In order to adjust sowing densities for getting uniform stand densities over all plots in a field experiment with different accessions, germinability should be known for each accession before sowing of a field experiment. The germination module holds information on germination methodology, germination tests, germination samples and germination results (Figures 7 and 8). This module serves also for planning and managing germination tests.

A germination test is defined here as one unit of work, e.g. samples to be tested set up at the same day, which also means they have to be counted at the same days. It contains multiple accessions, which have to be characterised for their germination. This is done by counting multiple (eight according to ISTA rules) germination samples. Results of these samples are averaged to determine the germination

Figure 7: Data model of a germination testing module

Figure 8: Object model of a germination testing module

E)Planning of evaluation experiments

As already described for multiplication in all field experimental work the workflow contains stages of planning and realisation. The evaluation represents a plan for the evaluation at a certain organisation in a certain year for a certain project. It may involve a list of multiple accessions (accession evaluation). The realisation of an evaluation is a (evaluation) field experiment (see H). Again accession evaluations have to start with seed coming from a seed stock.

In Figure10 is shown, how multiplication and evaluation extend a super-class plan and seed multiplication and accessions evaluation implement a common interface entry. The entry is a seed sample representing an accession, which is planned to be used for a multiplication or evaluation.

Figure 9: Data model of an evaluation planning module

Figure 10: Object model of an evaluation planning module

F) Evaluation methodology: Methodology – Descriptor – Method

The classification of characterisation and evaluation data follows a hierarchical approach to facilitate finding and comparing results with high flexibility in coverage of variable methodological approaches. The highest level is called trait, which refers to classes of descriptors like disease resistance, stress tolerance, phenology, agronomy, habit or morphological features at certain plant organs like stem, leaf, flower, seed etc. The second level is called descriptor and refers to published descriptor lists. Descriptor lists from the genetic resources domain (IBPGR or IPGRI, Comecon) cover a comprehensive set of descriptors relevant for plant breeding and identification of the genotype. UPOV descriptors are focussed to the latter. A third level covers specific methodology used for observation of the descriptors. It is strongly affected by different methodological approaches and scientific progress in observation methodology.

The second (descriptor) and third (method) levels are modelled in different classes or entities, while the first level is covered as attribute of descriptor. A descriptor can be measured with various observation methods, while a certain method may be applicable to several descriptors (n:m relation). A specific combination of descriptor and method is called methodology.

In most cases observations should be made in certain stage of development of the plant. A universal coding scheme for plant development is given in BBCH (Meier 2001). Stages are explained for various crops (genus stage).

Figure 11 Data model of an evaluation and characterisation methodology module

Figure 12 Object model of an evaluation and characterisation methodology module

G)Field experiments:FieldExperiment – Treatment – ExperimentSite

The field experiment is defined as a randomisation unit on one field at one time. The randomisation process itself is explained in more detail in paragraph H.

Figure 13 Data model of a field experiment description module

An important factor for the interpretation of field experiments is their geographic location, described in the experiment site. Minimum descriptors for the experiment site would be the geographic coordinates (latitude, longitude, altitude), which would allow to layer them onto pedologic, geologic and climatologic maps. Ideally these would be amended with reference to geo-statistical units (NUTS and LAU in Europe) to facilitate search also on geographic names. The experiment description itself should make specific references to specific data of weather, biotic and abiotic stresses during experiment time.

An experimental treatment refers to agronomic measures, which may be applied differently to parts of the plots, like inoculation, fertilisation, irrigation, spraying etc., in a multi-factorial field experiment to reveal genotype x environment interactions. Treatments should be orthogonally applied to the tested genotypes.

Figure 14 Object model for a field experiment description module

H: Fieldplan: Experiment – Lane – Plot

As a randomisation unit the field experiment involves a randomised list of plots, which contain seed multiplications (cf. C) or accession evaluations (cf. E), each of them representing the seed stock of one accession. In case of replicated or partly replicated experiments one accession from the planning phase (seed multiplication or accession evaluation) may go into multiple plots. Currently block designs and augmented block designs can be randomised.

The matrix forming a field experiment can be described in lanes (tracks by the sowing device) and plots rowed up along the lane.

Figure 15 Data model of a field plan generation module

Figure 16 Object model of a field plan generation module

I)Observation data: EvaluationArchive – Measurement – Rawdata – Observation

Observation data result from measurements. A measurement applies an observation methodology to a field experiment. Raw data will be normally available in Excel files. This is the file type preferred by many field workers, exported by hand held computers and software of analytical automates. Usually a spreadsheet format is used in these files where observations for different descriptors are listed in different columns. This has to be transferred to a single observation format to be included into the data base. Though Excel files can be downloaded from the web application, a generalised approach has been chosen for import, which allows arbitrary naming of columns.

Data import has to follow a mapping of spreadsheet columns to database object. This mapping is laid down in an evaluation archive structure.

Figure 17Data model of spreadsheet import module

Figure 18Object model of spreadsheet upload and mapping to database objects

Figure 19Object model of spreadsheet import

Examples for patterns used throughout the application

Patterns, as considered applicable for many domain objects throughout the application are described here with the example of observation methodology. The data and domain models for observation methodology are described in paragraph F.

1)A data access pattern as generated by SeamGen amended to serve the identity checking pattern

Figure 20 shows classes making up the data access tier for domain objects describing observation methodology. Objects of the “Query” classes, which are extended from an EntityQuery provided with Hibernate compile select statements, fetch result sets and compile them to single objects or lists of objects. Objects of the “Home” classes extended from EntityHome are responsible for database updates and inserts.

Figure 20Classes for data access and communication with the database as created by SeamGen using Hibernate

They are controlled by UpdateControllers, which have been added to avoid duplication of database entries. They have checked by Comparators, whether an entered object is already existing in the database and in this case start user interaction to clarify the identity of the entered in relation to already existing data.

2)The Sorter and Comparator Pattern

Sorting by various attributes is a typical requirement in a tabular representation of data and a first step of checking the similarity of entries. It partly requires similar functionality as identity checking, which is described below.Sorted representation of data is a requirement of the user interface, which is controlled by objects from the “Backing” classes. For having the sorting performed they communicate with Sorters. Sorting in Java is normally implemented by using classes extended from Comparator. They compare objects by their attributes according to the implemented sorting criteria and decide on their order in a sorted list. In this case comparators for a hierarchy or related objects (stage – descriptor – methodology; method – methodology) must be included. The domain object methodology is additionally wrapped for performing additional user interface operations within the list (showing additional details).

Figure 21Classes for sorting by comparison

Figure 22Overview over classes for identity checking of an observation methodology

3)The Identity Checking Pattern

A pattern for identity checking can be used in all cases, where the user interface for entering or editing an object gives a full representation of all its attributes. Thus the identity checking pattern can be established for each domain class residing in the model tier and affects respective objects in the data access and presentation tier. It is shown in Figure 22 with the more complex example of observation methodology, which itself consists of two related classes: trait or descriptor in a sense of IPGRI or UPOV descriptor lists and method, which describes technical details, how the descriptor is observed (measured or estimated).