1

Chapter 2 Representation for Regulation Codes

2.1 Introduction

2.1.1 Legal Framework for Hazardous Waste Regulation Compliance

There are two major sources related to the federal law, namely statutes and regulations, for hazardous waste compliance checking [Kindschy, Draft and Carpenter, 1997]. The federal statutory law is codified in the United States Code (USC), and federal regulations are compiled in the Code of Federal Regulations (CFR). In addition, each state also has its own codified statutes and regulations for hazardous waste compliance.

Hazardous waste statutory laws are passed by a legislative body, federal or state. The enacted federal statutes are often referred to as Resource Conservation and Recovery Act (RCRA), and the actual text can be found in the U.S. Code (USC). Statutory laws are often drafted in general terms, leaving the responsibility to the relevant agency to develop specific and detailed regulations. Typically, statutory law is broad and vague to cope with unforeseen eventualities, and to avoid frequent rewriting [Zeleznikow and Hunter, 1994, Kindschy, Draft and Carpenter, 1997]. The Environmental Protection Agency (EPA) is responsible for developing and enforcing the hazardous waste regulations. In the federal level, the hazardous waste regulation is described in the 40CFR Section 260-281. For each state, the state EPA provides detailed regulations according to both RCRA and state statutes. State regulations may be slightly different from the federal regulations; for example, hazardous waste regulations in California are generally stricter than the federal regulations.

RCRA is the root of the legal source for hazardous waste regulation. State regulations are drafted based on the RCRA, the state statute, and the federal regulations. The relations among the different hazardous waste regulations are depicted in Figure 2.1


The organization and representation of hazardous waste regulation codes play a significant role for hazardous waste regulation compliance checking. In this chapter, we discuss the representation of the regulation codes for hazardous waste management. The discussion focuses on Part 262, Title 40 in the Code of Federal Regulation (40 CFR 262), which contains the regulation applicable to hazardous waste generators, and the related parts 260 and 261 (40 CFR 260 and 40 CFR 261) of the code.

Figure 2.1: Legal resources for hazardous waste management in U.S.

2.1.2 Organization of Regulation Codes

This section provides an overview on the organization of the regulation codes and their characteristics. We focus our discussion on the sections of provisions that are applicable to generators for determining hazardous waste in 40 CFR 262 as shown in Figure 2.2.

TITLE 40--PROTECTION OF ENVIRONMENT CHAPTER I--ENVIRONMENTAL PROTECTION AGENCY (CONTINUED)

……

PART 262--STANDARDS APPLICABLE TO GENERATORS OF HAZARDOUS WASTE

Subpart A--General

Section 262.10 Purpose, scope, and applicability.

……

Section 262.11 Hazardous waste determination.

A person who generates a solid waste, as defined in 40 CFR 261.2, must determine if that waste is a hazardous waste using the following method:

(a) He should first determine if the waste is excluded from regulation under 40 CFR 261.4.

(b) He must then determine if the waste is listed as a hazardous waste in subpart D of 40 CFR part 261. Note: Even if the waste is listed, the generator still has an opportunity under 40 CFR 260.22 to demonstrate to the Administrator that the waste from his particular facility or operation is not a hazardous waste.

(c) For purposes of compliance with 40 CFR part 268, or if the waste is not listed in subpart D of 40 CFR part 261, the generator must then determine whether the waste is identified in subpart C of 40 CFR part 261 by either:

(1) Testing the waste according to the methods set forth in subpart C of 40 CFR part 261, or according to an equivalent method approved by the Administrator under 40 CFR 260.21; or

(2) Applying knowledge of the hazard characteristic of the waste in light of the materials or the processes used.

(d) If the waste is determined to be hazardous, the generator must refer to parts 261, 264, 265, 266, 268, and 273 of this chapter for possible exclusions or restrictions pertaining to management of the specific waste.

Section 262.12 EPA identification number

……

Figure 2.2: Parts of Provisions from 40 CFR 262

There are three distinct characteristics in the regulation code:

  1. There is an explicit organization in the regulation code. For instances, Part 262 has Subpart A through H, Subpart A has Section 262.10 through 262.11, Section 262.11 has Subsection 262.11(a) through 262.11(d), Subsection 262.11(c) has two provisions 262.11(c)(1) and 262.11(c)(2).
  2. Provisions are described as sentences. The sentences may be vague in meaning or contain vague terms. For example, in the provision 262.11(c)(2), the phrase “knowledge of the hazard characteristic of the waste”. This phrase is vague and indeterminate and the information contained in the sentence is incomplete within its own context. That is, a decision for determining whether a waste is hazardous cannot be made by simply relying on the content of the above provision, even though the purpose of using 40 CFR part 262 is to determine the hazardous waste for the generators.
  3. There exist many embedded reference links in the text. A provision may reference the provisions in other parts, subparts, sections, or subsections. For instance, provision 262.11(c)(1), which reads "Testing the waste according to the methods set forth in subpart C of 40 CFR part 261, or according to an equivalent method approved by the Administrator under 40 CFR 260.21", has references to 40 CFR part 261, and Section 260.21. Typically, a reference link is made to a certain part or section, but not individual provision. For the above example, 262.11(c)(1) makes reference to section 40 CFR 260.21, but no information as to which provisions of part 260.21 it refers to. As another example, 40 CFR 262.11(d) refers to 40 CFR part 264, 265, 266, 268, and 273, but makes no clear reference to the exact provisions.


Figure 2.3 Representation for a part of a regulation code

The representation for the organization of 40 CFR 262 is graphically depicted in Figure 2.3. The solid directed lines represent the containing relations, that is, each regulation part contains subparts, each subpart contains sections, and each section contains subsections. In addition, the subsections contain regulation code content as well as reference that points to other portions of the regulation. All the other internal nodes in the figure, such as section and subpart nodes, function as containers. The dotted directed lines represent the reference relations, that is, the subsections usually make reference to other parts, subparts, sections, or subsections of the regulation code. Cross references make the structure of the regulation code complex. Without cross references in the subsections, the organization of the regulation code can be structured as a tree. With cross references made in the subsections, the structure becomes a graph.

Following the explicit organizational layout of the original code, we can define several formal information structures. More specifically, we define the following object classes to represent the regulation code: A part of a regulation code is represented by a object class called RegulationPart; similarly, the subparts, sections, and subsections are represented by object classes RegulationSubpart, RegulationSection, and RegulationSubsection, respectively. These object classes are related by a “containing” relationship. For example, for the relationships for 40 CFR 262 as shown in Figure 2.4, the RegulationSubsection may contain one or more RegulationPovision objects. A RegulationProvision is the object that contains the text content and embedded references within the content.


Figure 2.4: Relations among parts, subparts, sections and subsections of CFR

While the above containing relationship is sufficient to represent the well-structured hierarchical containing structure of the code, the reference relationship among the provisions cannot be represented by this simple structure, however. The references in the provisions introduce less-structured inter-provision linkages for the information structure in the regulation code. This reference relationship can be represented by a directed graph structure. The containing relationship mentioned above can also be represented by edges in a directed graph. For example, Figure 2.5 illustrates a directed graph representation for part 260, part 261, and part 262 of the regulation codes in 40 CFR.


Figure 2.5: A directed graph for the organization of a regulation code 40 CFR

2.2 A Directed Graph Model for Organization of Regulation Codes

To represent the organization of a regulation code, a model based on a directed graph structure to enable organizating information in the regulation is employed. The issue is closely related the use of directed graph for representing semi-structured information [Nestorov, Ullman, Wiener, and Chawathe, 1996, Chawathe, Abiteboul, and Widom, 1997]. For semi-structured data, the information organization has some structure, but may be irregular and does not necessarily conform to a fixed schema [Chawathe, Abiteboul, and Widom, 1997]. For a regulation code, the "containing" relationship among parts, subparts, etc. are regular, but the "reference" relations among provisions are irregular and have no fixed schema. In this section, we discuss a directed graph model for the information representation of the provisions in a regulation code.

2.2.1 Definition of Vertices and Their Relations In a Directed Graph Structure

A node in a directed graph for the representation of a regulation code can be classified into two categories: the node that represents either a part, a subpart, a section, a subsection and the node that represents a provision.

In the first category, the nodes consist of a "containing " relation, and each node is defined by (1) the index of the node, which is the numbering of individual parts and sections in a regulation code such as a part "40 CFR 262", or a subpart "40 CFR 262.11", (2) the title of the part, subpart, etc, represented by the node, and (3) a list of the nodes it contains. The node that consists of the "containing" relationship can thus be represented by:

CNode(i) := (i, n, L)

where CNode(i) represents a node that is indexed by the numbering i, n is the title name, L is the list of nodes it contains. For example, part "40 CFR 262" can be represented by:

CNode( "40CFR262") :=

i="40CFR262",

n="Standards applicable to generators of hazardous waste",

L = {CNode("40CFR262.subpartA"),

CNode("40CFR262.subpartB"),

CNode("40CFR262.subpartC"),

CNode("40CFR262.subpartD"),

CNode("40CFR262.subpartE"),

CNode("40CFR262.subpartF",

CNode("40CFR262.subpartG"),

CNode("40CFR262.subpartH")}

>.

For the above example, the part 40 CFR 262 has eight subparts: 40 CFR 262 subparts A through H. The "containing" relation defined in this example is illustrated in Figure 2.6.

Figure 2.6: The relation among an internal parent node and its child nodes

A node representing a provision consists of three basic attributes, PNode(i) := (i, t, L), where PNode(i) represents a provision node for index i, t is the regulation code text, and L is the set of reference linkages in the provision. For example, provision 40 CFR 262.11(a) can be expressed as:

PNode( "40CFR262.11(a)") :=

"40CFR262.11(a)",

"He should first determine if the waste is excluded from regulation under 40 CFR 261.4",

{"40 CFR 261.4"}

>.

Provisions in a regulation code has vague terms and may have incomplete information. The above definition of a node object does not provide the mechanism for resolving the issue of incomplete information. We need an additional attribute field for holding explanations. In addition, the directed graph model will also be employed to deal with both information retrieval and rule-based compliance checking, we need to include attribute fields for embedding the related rules for this provision, and for holding the classification information about the provision. As a result, a provision node is defined as

PNode(i) := (i, t, L, e, r, c)

where the additional fields e represents the interpretation for the provision which may be provided by an expert; r represents the related rules for this provision; and c represents the classification information for the provision. For the example of provision 40CFR262.11(a), we have:

PNode("40 CFR 262.11(a)") :=

"262.11(a)",

"He should first determine if the waste is excluded from regulation under 40 CFR 261.4",

{"40 CFR 261.4"},

"The generator must use the related excluded provisions 40CFR261.49(a) through (f) to see if the waste is excluded from being a hazardous waste", "isExcludedFrom40CFR261.4",

"normative provision"

>.

The representation of a regulation code employs partially the representation commonly used in semi-structured data model. Specifically, the general purpose object-exchange model (OEM) [Nestorov, Ulllman, Wiener, and Chawathe, 1997] for dealing with semi-structured, hierarchical data can be employed to deal with the representation of a regulation code. In OEM, an object is identified by a unique object identifier and a value as <object_id, value>. The value is either an atomic quantity, such as a string, or a set of object references, denoted by a set of <label, subobject> pairs. We can formally map the relation of the nodes in the above directed graph model of code organization into the object-exchange model. The mapping can be constructed as shown in Figure 2.7. The six rules described in Figure 2.7 can be used to construct the directed graph structure in an OEM representation of the semistructured, hierarchical data model for a regulation code.

1. Splitting a code node Cnode :=(i, n, L), into two parts, (i,n) and (L),

2. Mapping the (i,n) to the object_id identifier in OEM,

3. Mapping (L) to the value, which is a set of references to subsection code, in OEM.

4. Splitting the provision node, Pnode:=(i, t, L, e, r, c), into two part, (i,t,e,r,c) and (L),

5. indexing (i,t,e,r,c) using the object_id in OEM,

6. Mapping (L) to the value, which is a set of references to the subsection in the code, in OEM.

Figure 2.7: The mapping of interconnection of regulation code objects to Object-exchange model

2.2.2 Representation of Interconnection by Extensible Markup Language

One possible choice to implement a directed graph model for representing the semi-structured information in a regulation code is to use hyperlink to connect the related regulations. For example, when subsection A refers to another subsection B, a directed link can be created from Subsection A to Subsection B. This hyperlink model however, leads to well known problem of “lost in hyperspace” [ Cleary and Bareiss, 1996; Kent and Neuss, 1997]. Without a global organization of the unstructured hyperlinks, the states of the user’s “activities” are lost in search and traversal of a regulation code.

The eXtensible Markup Language (XML) provides a means to model the organization of the semi-structured regulations [Light, 1998]. XML is a textual language for information representation. Nested and tagged elements are employed to represent semi-structured information content. An XML based information representation can be accompanied by a Document Type Definition (DTD), which can be regarded as a schema for restricting the tags and structure of a XML representation. An XML document satisfying a DTD schema is said to be valid.

<!--XML DTD for representation of an part of Code of Federal Code of Regulation(CFR) for Hazardous Waste Management.

-->

<!-- the organization of the "containing" relation of a regulation code -->

<!ELEMENT part (part_id, part_name, , subpart+)>

<!ELEMENT part_id (#PCDATA)>

<!ELEMENT part_name (#PCDATA)>

<!ELEMENT subpart (subpart_id, subpart_name, , section+)>

<!ELEMENT subpart_id (#PCDATA)>

<!ELEMENT subpart_name (#PCDATA)>

<!ELEMENT section (section_id, section_name,subsection*)>

<!ELEMENT section_id (#PCDATA)>

<!ELEMENT section_name (#PCDATA)>

<!ELEMENT subsection (subsection_id, provision*)>

<!ELEMENT subsection_id (#PCDATA)>

<!-- the representation for provisions of a regulation code, where ? means zero or one element,

* means zero or more element.-->

<!ELEMENT provision (pro_id, text, explanation?, rule_name?, link*)>

<!ELEMENT pro_id (#PCDATA)>

<!ELEMENT text (#PCDATA)>

<!ELEMENT explanation(#PCDATA)>

<!ELEMENT rule_name (#PCDATA)>

<!ELEMENT link (#PCDATA)>

Figure 2.8: An XML DTD schema for the regulation code 40 CFR 262 without the provision classification

The organization of a regulation code can be defined by a Document Type Definition (DTD) schema as shown in Figure 2.8, which maps the directed graph organization of the part of 40 CFR 262 as shown in Figure 2.5 into a XML representation.

In the DTD schema, we explicitly model the relations among a regulation part, and its subparts, sections, subsections, and provisions. For example, a regulation part can contain exactly one "part_name" that is used to hold the name of the part, and one or more subparts as denoted by "subpart*". In addition, we explicitly model the links in the provisions, i.e., the provision element has the subelement links, as denoted by "link*", that can be used to index zero or more referenced provisions. The link provides the capability for a provision to make reference to other regulations and as a result, the encoded links can be used to establish a chain of provisions that are cross-referenced in a compliance checking procedure. In the DTD schema, optional elements are explicitly defined. For examples, "zero or one" element is denoted by "?", "one or more" elements are denoted by "+", "zero or more" elements are denoted by "*".

An XML DTD has the flexibility to augment an existing model. For instance, if we decide to integrate the classification meta information into the DTD as shown in Figure 2.8, we can simply replace the statement

<!ELEMENT provision (pro_id, text, explanation?, rule_name?, link*)>

by the following

<!ELEMENT provision (pro_id, text, explanation?, rule_name?, classification?, link*)>

<!ELEMENT classification (#PCDATA)>

The new DTD, as shown in Figure 2.9, has the information about the classification of a regulation code in addition to the basic organization of the regulations. The new DTD schema remains compatible with all the information representation modeled by the previous DTD schema, although the information about the classification is not modeled in the previous DTD. This is because the element provision can contain zero or more classification subelement denoted by "?". If we choose to put zero classification subelement in an XML, which is described by the schema of the previous DTD, it is still a valid representation according to the DTD.

<!--XML DTD for representation of an part of Code of Federal Code of Regulation(CFR) for Hazardous Waste Management.

-->

<!-- the organization of the "containing" relation of a regulation code -->

<!ELEMENT part (part_id, part_name, , subpart+)>

<!ELEMENT part_id (#PCDATA)>

<!ELEMENT part_name (#PCDATA)>

<!ELEMENT subpart (subpart_id, subpart_name, , section+)>

<!ELEMENT subpart_id (#PCDATA)>

<!ELEMENT subpart_name (#PCDATA)>

<!ELEMENT section (section_id, section_name,subsection*)>

<!ELEMENT section_id (#PCDATA)>

<!ELEMENT section_name (#PCDATA)>

<!ELEMENT subsection (subsection_id, provision*)>

<!ELEMENT subsection_id (#PCDATA)>

<!-- the representation for provisions of a regulation code, where ? means zero or one element,