WORKING DRAFT

9-19-04

LegalXML Integrated Justice Technical Committee

Methods and Options for Building Reference Documents

What are our goals?

Goal #1 Define Reference Documents & Exchanges to facilitate Justice Integration across domains (organizations, systems)…

Goal #2 Expand/refine the GJXDM via applied experience feedback

Goal #3 Identify governance strategy for determining when a Reference Document becomes a standard

What are our Current Strategies?

Utilize the Justice XML Data Dictionary to apply and test GJXDM common definitions and terms.

Model existing data exchanges and physical documents (versus theoretical) to:

Identify proposed additions/deletions/changes to GJXDM contents

Identify Best Design Practices for constructing Reference Documents

Recommend standards for documenting Reference Documents

Identify Quality Control Steps including:

Review extension elements and ensure they are not already in GJXDM under another Element name

Eliminate or accommodate synonym Elements in GJXDM

Refine GJXDM definitions as needed for existing GJXDM elements.

What Scenarios must we address in mapping business data/documents to GJXDM

Scenario #1 - Every property I need is in the GJXDM namespace where I need it

Scenario #2 - I need additional “property” elements within a currently defined GJXDM Type

Scenario #3 - I need to create new relationships between GJXDM types or elements by:

Inclusion (defining a “relational” Element to extend GJXDM type)

Reference(use GJXDM ReferenceType Elements or add ReferenceType Elements)

GJXDM Relationship Element (explicit named referencing )

Scenario #4 - I need a new type that doesn’t inherit or extend any elements of GJXDM

Scenarios #2 - #3 represent methods for extending/restricting the GJXDM components elements and relationships to define the content of a specific document or data exchange set of elements.

What are our Options for Scenario(s) #2-#3 for extending/restricting the GJXDM elements we require for our Reference Documents?

-  Add elements via extension methods

-  Delete elements via restriction methods

-  Update elements through substitution

What comparison(s) can be made between GJXDM (Justice XML Data Model) and traditional Relational Data Modeling employing Entity Relationship (E/R)models?

(See Attachment 1 – Comparing GJXDM to traditional Relational Data Modeling)

What Methods or Design Patterns are there for extensions to GJXDM components?

Although there are probably more, a combination of the following seven methods need to be discussed with OASIS members as to their applicability to creating GJXDM compliant Reference Documents and Data exchanges:

Method 1 : Extension using “Type Substitution”

Method 2: Extension using a “Cascading Extensions” construction to “Avoid Type Substitution”

Method 3: Extension using ReferenceType elements for Many-to-Many relationships

Method 4: Extensions using GJXDM Relationship Element for Many-to-Many relationships

Method 5: Extension using “Element Substitution” for synonyms, alternate languages

Method 6: Extension using <Redefine> Schema element to “Avoid Type Substitution”

Method 7: Extension using <Any> Schema element

Methods 1,2,6 and 7 address methods to modify GJXDM components and Methods 3,4,5 address selection of methods for resolving many-to-many relationships and element synonyms.

(See Attachment 2 – “Methods or Design Patterns for Extending GJXDM Components” for more detailed discussion of Method-1 to Method-7)


What schema methods are available to prevent XML/Schema developers from extending, restricting and/or using type or element substitution on our Reference Schema component(s) or element(s)?

Schema provides the xsd:attribute named “block” which can be used to block extensions/restrictions and/or substitutions to Components (ComplexTypes & SimpleTypes) and Elements. Following is the block attribute syntax which would be used in Schema at the element or component level as deemed appropriate:

block = “#all” No extension, restriction or substitution allowed

block = “restriction” No restriction allowed

block = “extension” No extension allowed

block = “substitution” No substitution allowed

block = “restriction,extension” No restriction or extention allowed (but substitution okay)

block = “restriction,substitution” No restriction or substitution (but extension okay)

block = “extension,substitution” No extension or substitution (but restriction okay)

What are the design tools and where are they best applied?

GJXDM model viewer / GTRI Spreadsheet / Wayfarer

Mapping Spreadsheet

UML Class Diagram/ Entity-Relationship diagram/Powerpoint Graphic

Use of XSLT in Constraint Schema Construction

Extension Schema Construction

Document Schema Construction (GTRI Document Template)

Observations made on comparing Protection Order, Field Interview Report, Sentence Order, Court Disposition, Citation (Mapping only) and NIST Rapsheet:

Reviewed each document for the following:

(Note: The review is in general and the numbers/elements listed are representative not exact)

Design Method(s) Used:

Number of Spreadsheet Elements defined:

Number of Extension Elements defined:

Set of columns of information used in the Spreadsheet Mappings:

Elements on Graphical Display of Document models in ppt or UML diagrams:

Comments / Observations on Extension Elements:

Comments / Observations on GJXDM Elements selected:

(See Attachment 3 for Observations on work-in-progress Reference Documents)

Observations on Level of Complexity of GJXDM

In programming practice and in Object Oriented design, a rule of thumb is don’t nest IF-statements greater than 3 levels deep and don’t inherit objects more than 3 levels of inheritance because “even programmers get confused after 3 levels.” The GJXDM has roughly 101 components(ComplexType(s)) at the basic Level 1 or “Parent Level”, 80 components at Level 2 “Child Level”, 12 components at Level 3 (GrandChild level), and 1 component at Level 4 (GreatGrandChild level).

Note: “Limit type hierarchy to a maximum of 3 levels: I have seen people building type hierarchies 6, 7, 8 levels deep. Understanding the type at the bottom requires understanding everything above it. In my experience anything beyond 3 levels becomes unintelligible.” Quote taken from reference http://www.xfront.com/composition-versus-subclassing.html

Overall GJXDM appears to be keeping to the 3 levels maximum as recommended by convention. Less than 7% of the total components are defined at Level 3 and only one component is defined at Level 4.

Therefore, method 2 will likely only require 1 to 2 levels of “Cascading Extensions” back up the tree to the document container. For example, since SubjectType is at level 2, a Method-2 extension will require a my:SubjectType and the data modeled parentType say my:CitationType before it makes it back to the document container my:CitationDocument.

Conclusions/Guidelines for defining Reference Documents

A.  Description of Current Process (Scott Came already completed)

B.  Proposed StandardsL(authors opinion only, TC to debate)

Naming Standards:

Require all Complex/SimpleType names to end in the suffix Type for consistency with GJXDM naming standards. Use my:objectType for your derived Complex/SimpleType names instead of just my:object

Require sample data be provided via a XML instance. It may be possible to generate the XML instance from xmlSpy or extract sample data from the spreadsheet or UML, whichever is easiest to develop.

Require a naming standard be used for all filenames involved in the Reference Documents or Reference DataExchange. (Catherine Plummer / Scott Came Action Item) For example:

Organization_ReferenceDocName_DocReference_versionNumber_date.ext”

Where .ext could be .xsd, .xml, .doc. .xcl, .uml, .ppt , .xsl and .zip to name a few of the filetypes we have defined for the standard Reference Package.

Require <annotation> be included in all schemas, including a definition for any new extension elements, reference(s) to spreadsheet/diagrams/supplemental docs using xsd:<annotation> and xsd:<documentation source=”url/filename.ext such as .ppt, .xcl, .doc etc.”/>.

Standardize spreadsheet columns. colors, and scope of information to minimally provide for the spreadsheet and diagrams (UML Class, Powerpoint, Visio etc). tbd

Design Guidelines:

Use Method-2 as the default method for extensions/restrictions to GJXDM components. (comment: “Type Substitution” is a barrier to adoption because current vendor tools don’t support it (even though XML schema defines it) and extensions are performed in the XML instance outside the control/view of the reference schema)

Only use Method-3, References, when a Many-to-Many relationship needs to be resolved. (Default is to use “Inclusion” as described in Attachment l for relationships between classes (entities) Action Item Jim Cabral and company to expand on when to use References instead of inclusion

Don’t use method-4, Relationship element, to resolve a Many-to-Many relationship but instead use method-3 with the GJXDM reference elements or new “user-named ReferenceType elements” elements with your own element Name where content type=j:ReferenceType. Reference elements are most similar to familiar to relational developers and the concept of “foreign” keys and “primary” keys.

Debate / discuss whether Reference Documents & Reference Data Exchange schemas should disable “substitution” using the xsd:block attribute (specifically “type substitution”.)

Debate / discuss whether Element substitutionGroups (Method-5) for defining synonyms should be supported by GJXDM only, Reference Documents/Data only, both or neither.

Don’t use Method-1 (type substitution), Method-5 (element substitution), Method-6 <redefine> or Method-7 (<any>) for extension/restrictions/substitution. (Debate / discuss this recommendation)

Utilize the GTRI recommended document container template for all Reference Document schema

Debate whether we discourage using the “Russian Doll” design for Reference Documents versus the “Salami Design” recommended by GTRI. (See Attachment 3 NIST Rapsheet section for reference on Russian Doll and Salami Design models). Use the “Salami Design” schema design method as recommended by GTRI when applying Method-2 extensions to GJXDM components and any new user-defined components (see reference http://www.xfront.com/GlobalVersusLocal.pdf for description of “Salami Design”).

Requests:

Request creative developers to design automated tools to generate schema from our Spreadsheets or from UML syntax which takes into account users M:M data models versus 1:M relationships and the difference between an entity(Class) element name, a property element within a Class and a relationship element within the Class. (see Attachment 1 for more on these terms)

Have GTRI add ability to specify constraints within the subset schema generator so that developers don’t have to edit the constraint schema each time with the min/max occurs clauses. (Scott Came has already submitted this request to XML Committee chair Mike Hulme, based on OJP’s request to the committee for prioritization of GTRI potential activities)

Develop a tool to extract from the extension schema the names/types/definitions into a format that XSTF, GTRI or other vetting group can Quality Control as to whether the extensions are duplicates of existing dictionary elements, new elements to be added to GJXDM or local elements to remain as local.

References (will go here which include)

GTRI Atlanta Training Material

OASIS IJTC Denver Workshops material

Xfront.Com reference site articles, schema tutorial

Scott Came , Seattle

Other References which I’ll look for on review of document

(Need to add a Title page, page numbers, update header to final after TC member beats this paper up)

Attachment 1

Comparing GJXDM to traditional Relational Data Modeling

Relational Models in GJXDM

ComplexType and SimpleType schema (components) in GJXDM are most similar to Entities in E/R diagrams. For example, the GJXDM component name (PersonType) is like an Entity named Person Entity in relational data models.

E/R Terminology

PERSON ENTITY RESIDENCE ENTITY

Person Residence

(PersonID=PrimaryKey “xxx”) (1:M) (ResidenceID=PrimaryKey “yyy”)

Name StreetAddress

Age City

EyeColor State

Entity Names are Person and Residence with a one-many relationship from Person to Residence(s). The database designer would have a Person Table with field names Name,Age,EyeColor and Residence Table with field names StreetAddress,City,State. The designer would then add a unique key for Person (eg:PersonID or PersonRecordID) as a primary key and a similar unique key for Residence (eg:ResidenceID). These primary keys in XML are represented by ID=”unique_key value” such as <Person j:id=”xxx”>. The “foreign key” to relate the PersonTable to the Residence table would be the ResidenceID. The “foreign key” pointer to the unique key in the table you have the (1:M) relationship to is represented in XML with the attribute IDREF. Basically, IDREF=”reference to unique key in related table” (<Person j:ref=”yyy”>.

GJXDM Terminology

PersonType(Component Name) ResidenceType(Component Name)

Person

PersonResidence

type=”ResidenceType”

minOccurs=”1”

maxOccurs=”Unbounded”

PersonName

type=”NameType”…

PersonAge… PersonEyeColor…

The PersonResidence element states the PersonType component has-a mandatory 1:M relationship to the ResidenceType component to relate Person to Residence(s) via the minOccurs and MaxOccurs attributes values. (note: The defaults for minOccurs and maxOccurs =1.) The elements Person and Residence have PersonType and ResidenceType as their content and would be representative of the Person Tablename and Residence Tablename in the E/R diagram.

Similarily, in GJXDM, elements that would represent E/R entity(table names) are the Elements whose objectTypes contain all of the elements of a component such as elements Case, Charge, Subject, Citation, Residence, Person, Vehicle… The Case element has object type CaseType so its’ content is all of the elements of the CaseType component including itself. GJXDM links tables together through relationship like elements called PersonCharge, CaseCharge, CitationSubject etc.

This example of relating two GJXDM components (1:M) is done through “Inclusion” as opposed to the relational technology approach of key/foreign key table pairs or equivalently ID/IDREF within XML instance documents. This “Inclusion” method of our Scenario #3 works fine for a variety of relationships including mandatory one-to-many (1:M), one-to-one (1:1) relationships and optional zero-to-many (0:M) and zero-to-one (0:1). For resolving many-to-many relationships, Relational designers create a new intersection table of the key pairs from each table. In Justice XML you would use ID/IDREF pairs supported by the ReferenceType element(s) or Relationship element.

To summarize, Entity Relationship(s) are expressed as named mandatory|optional arcs on an E/R diagram representing relationships of one-to-many(1:M), one-to-one(1:1) and many-to-many(M:M) between the entities. In GJXDM the components element(s) include a minOccurs and MaxOccurs to convey optional|mandatory 1:1, 1:M relationships to other GJXDM components.

Entity elements that are not labeled as relationship arcs to other entities are typically called attributes or fields which depend on the primary key, nothing but the key, so help me CODD in Relational Technology parlance. PersonAge, PersonSex, PersonEyeColor are attributes of a person and not a relationship to another entity type like PersonResidence. In GJXDM, PersonAge/Sex/EyeColor all have SimpleType(s) for their content which basically means data such as a text, numeric, or boolean etc. These Elements with SimpleTypes content tend to be attributes of the component and Elements with ComplexType content such as PersonResidence tend to be “relationship” type elements or the arcs/arrows connecting entities in E/R diagrams.