SPARCE: Superimposed Pluggable Architecture for Contexts and Excerpts

Sudarshan Murthy, David Maier

Department of CSE, OGI School of Science & Engineering at OHSU

{smurthy, maier}@cse.ogi.edu

May 5, 2003

Page 14 of 14

SPARCE: Superimposed Pluggable Architecture for Contexts and Excerpts

Abstract

People often impose new interpretations onto existing information. In the process, they work with information in two layers: A base layer, where the original information resides, and a superimposed layer, where only the new interpretations reside. Referring to base information in this manner reduces redundancy and preserves context. Superimposed information management (managing layered information) preserves these layers of information. It facilitates superimposed applications in which people can interact with heterogeneous information sources.

Superimposed application developers like to interact with base layers using a common interface regardless of base-layer types or access protocols. Also, users like to perform operations such as navigation and querying with any base layer. Unfortunately, base layers differ in their support for these operations. Appropriate architectural abstractions placed between these layers can ease communication between the two layers and make up for some of the deficiencies of base layers. They can also allow independent evolution of components in these layers.

An architecture called SLIM (Superimposed Layer Information Management) has already been defined to demarcate and revisit information inside base layers. However, some superimposed applications need to do more; they also need to access content and context of information inside base layers. We have defined a successor to SLIM, called SPARCE (Superimposed Pluggable Architecture for Contexts and Excerpts), for this purpose. In this paper, we describe the goals, design and implementation of SPARCE. We also present results of evaluating SPARCE.

Keywords: Information modeling, superimposed information, software architecture, hypertext, compound documents, excerpts, context, SLIM, SPARCE.

1. Motivation and Introduction

The United States Department of Agriculture, Forest Service (USFS) routinely makes decisions to solve (or prevent) problems concerning forests. The public may appeal any USFS decision [6]. The appeal process begins when an appellant sends in an appeal letter. An appeal letter raises issues with a USFS decision and the decision-making process. It frequently cites documents such as the Decision Notice (DN) and Environmental Assessment (EA). A USFS Editor processes all appeal letters pertaining to a decision and prepares an appeal packet for a reviewing officer. An appeal packet contains all documents a reviewing officer might need to recommend a decision about issues raised in appeal letters. The RID letter (RID stands for Records, Information, Documentation) is one of the documents in an appeal packet. This letter lists the issues raised in appeal letters and a summary response for each issue. An Editor synthesizes a RID letter using information in documents such as the DN, EA, and specialists’ reports. In this letter, the editor presents information from other documents in a variety of forms such as excerpts, summaries, and commentaries. In addition, the editor documents the location and identity of the information sources used in synthesizing the letter.

Composing a RID letter requires an editor to maintain large working sets. Since it is not unusual for an editor to be charged with preparing appeal packets for several decisions simultaneously, the editor may need to maintain several threads of organization. Though using documents in electronic form can be helpful, such use does not necessarily alleviate all problems. For example, the editor still needs to document the identity and location of information. In using electronic documents, the editor has to cope with multiple document windows open at once. Also, locating information in large documents can be tedious (using traditional search utilities available in applications).

We hypothesize that the USFS appeal process (and similar processes in other domains) can benefit from superimposed information management. Superimposed information is information placed over existing sources (called base layers). Word-processor documents, spreadsheets, presentations, databases, and web pages are examples of base layers. Delcambre and others [8] have described the Superimposed Layer Information Management architecture (SLIM) to manage superimposed information. SLIM facilitates addressing of information inside base layers, but it does not facilitate retrieval of content or context from base layers. Some superimposed applications (such as those that might be developed to assist in preparation of RID letters) need such information.

We have defined the Superimposed Pluggable Architecture for Contexts and Excerpts (SPARCE) to allow access to content and context information in base layers. We have built two applications—RIDPad and Schematics Browser—to exercise this new architecture. RIDPad is intended to assist a USFS editor to collect and organize material to prepare RID letters. The Schematics Browser allows USFS personnel to view past appeal decisions as instances of entities and relationships in a structured data model. In this paper, we describe the goals for SPARCE, its design and implementation, and the two SPARCE applications. We also present results of architecture evaluation.

The rest of this paper is organized as follows: Section 2 gives an overview of superimposed information and SLIM. Section 3 demonstrates the need for excerpts and contexts. In Section 4 we describe the goals, design, and an implementation of SPARCE. In Section 5 we describe two SPARCE applications and present results of evaluating SPARCE. We present related work, future work, and a summary in Sections 6, 7, and 8 respectively.

2. Superimposed Information and Applications

Maier and Delcambre [11] define superimposed information as “data placed over existing information sources to help organize, access, connect and reuse information elements in those sources.” They call existing information sources base layers, and data placed over the base layers the superimposed layer. Figure 1 shows these two layers of information. An application that manipulates base information is called a base application; an application that manipulates superimposed information is called a superimposed application.

Maier and Delcambre note that superimposed information has been used since before the advent of computers (for example, concordances and commentaries), but that there is a new need to reexamine this concept in the electronic setting. Improved addressability and accessibility of base layers, and increased digitization of information are some reasons for the renewed interest.

The rest of this section briefly describes an earlier architecture and application for superimposed information management. Complete details are available in the work of Delcambre, Maier, and others [7, 8, 11].

2.1 SLIM and SLIMPad

Delcambre and others [8] have defined an architecture called Superimposed Layer Information Management (SLIM) for management of superimposed information and applications that use it. SLIM defines an abstraction called a mark to represent an occurrence of information inside base layer. Figure 1 shows how a superimposed layer uses marks to address base information. This figure is adapted from a description of SLIM [8].

Marks provide a uniform interface to address base information, regardless of base-layer type or access protocol. Several mark implementations, typically one per base-layer type, exist in SLIM. A mark implementation decides the addressing scheme appropriate to the base layer it supports. For example, a mark for a selection in a spreadsheet might contain the row and column numbers for the first and last cell in the selection, whereas a mark for a selection in an XML document might contain an XLink.

Figure 1: Marks in SLIM architecture. / Figure 2: SLIM reference model.

Superimposed applications may represent superimposed information in any model, regardless of the base-layer types they interact with. They only need to store IDs of marks, which they can exchange for mark objects (from a Mark Management module as we will see shortly). Superimposed applications work seamlessly with any base-layer type due to the uniform interface supported by all marks.

Figure 2 shows a reference model for SLIM as a UML Conceptual Diagram. The Mark Management module (subsystem) is responsible for creating and storing marks, retrieving marks (given a mark ID), and for navigating back to the base layer. The Superimposed Information Management module provides storage service to superimposed applications. Use of this module is optional. The Clipboard is used for inter-process communication (at mark creation time).

SLIMPad is a superimposed application that employs SLIM. It allows users to create marks in a variety of base layers and paste them as scraps in a SLIMPad document. Scraps are superimposed information elements associated with a mark ID and a label. Users can also create bundles, which are named groups of scraps and other bundles. Figure 3 shows a SLIMPad document. The rectangles within the workspace denote bundles. The first label inside a bundle (left, top corner of the bundle) is its name. Other contents of the workspace are scraps. A user can activate (double click) a scrap to return to the corresponding base layer. /
Figure 3: A SLIMPad document.

2.2 Limitations of SLIM

Mark management in SLIM consists of mark creation, retrieval, and resolution (navigation) operations. These operations are necessary, but not sufficient for superimposed applications such as those we envision for the USFS appeal process. As we will show in Sections 3 and 4, users sometimes like to see contents and context of a mark (such as the containing paragraph and section heading), from within a superimposed application. SLIM does not support these operations.

The SLIM implementation also has some packaging and deployment limitations. For example, the mark management module must run within the process of any superimposed application. As a result, several instances of the mark management module may be loaded at once (one for each instance of a superimposed application) even though one instance would suffice.

Some usability issues also exist in SLIM implementation. For example, the mark creation process is modal. That is, a user must create a scrap immediately after creating a mark in a base layer. This limitation is mainly due to the way SLIM uses the shared Clipboard service. The USFS editors we have interacted with would like to create multiple marks in a base layer, without having to use them immediately. Our representative users have also asked for new facilities (such as reusing marks) in superimposed applications. Providing some of these facilities requires new architecture and application capabilities.

3. Excerpts and Contexts

An Excerpt is the content of a marked base-layer element. An excerpt can be of various types. For example it may be plain text, formatted text, or an image. An excerpt of one type could be transformed into other types. For example, formatted text in a word processor could also be seen as plain text, or as a graphical image. Context is information concerning a base-layer element. Presentation information such as font name and color are examples of context. Information such as the containing paragraph and section heading are also examples of context. Because we use the same mechanism to support both excerpts and contexts, we will often use the term “context” to refer to both kinds of information about a mark.

3.1 Need for excerpts and contexts

A USFS editor excerpts contents from base layers in the process of preparing a RID letter. In addition, the editor sometimes examines features related to excerpts. For example, the editor may want to determine the heading of the section an excerpt is in or examine the information (such as previous or next sentence) surrounding an excerpt in the base layer. USFS editors would like to see both excerpts and contexts from within a superimposed application.

Fulfilling some user needs might require superimposed applications to access context information that a user might not explicitly access. We demonstrate such needs using an example. We have noted some variations in how editors use excerpts in RID letters. An editor sometimes reproduces excerpts exactly as they are in base layers. At other times, the editor retains only parts of the original formatting, or no formatting at all. Compliance with documentation guidelines, consistency, and personal preferences are some of the factors that influence the choice. The need to display an excerpt (formatted exactly or partly as in its base layer), requires a superimposed application to examine the excerpt’s context.

Consider the fragment of a web page as displayed by a web browser shown in Figure 4(a). The highlighted portion of the fragment is the marked region. Figure 4(b) shows three possible HTML markups corresponding to the marked region. The first markup specifies the font name and size, the second markup specifies only the font size, and the third markup does not specify any font characteristic. Font characteristics not specified are inherited from an enclosing element (transitively) in the second and third markups. If no enclosing element specifies the necessary font characteristic, the web browser uses a default value, possibly from an application setting. That is, a superimposed application might need to examine markup all the way up to the start of the web page, or even examine an application setting to display the marked region.

/ Cheatgrass,  Bromus tectorum, grows near many caves in this project area.
Cheatgrass,  Bromus tectorum, grows near many caves in this project area.
Cheatgrass, Bromus tectorum,  grows near many caves in this project area.
(a) Web-page fragment. / (b) Three possible HTML markups for the marked region.
Figure 4: A fragment of a web page and some possible markups.

3.2 Architectural considerations in supporting excerpts and contexts

We observe that several kinds of context are possible for a mark. The following is a representative list of context kinds along with examples for each kind. We also note that the context of a mark may include elements of many kinds.

· Content: Text, graphics.

· Presentation: Font name, color.

· Placement: Line number, section.

· Sub-structure: Rows, sentences.

· Topology: Next sentence, next paragraph.

· Container: Containing paragraph, document.

· Application: Options, preferences.

Contexts can vary across base-layer types. For example, the context of a mark to a region on a painting (in a graphics-format base layer) includes background color and foreground color, but it does not include font name. However, the context of a mark to a selection in a web page includes all these three elements. Contexts can also vary between marks of the same base-layer type. For example, an MS Word mark to text situated inside a table may have a “column heading,” but a Word mark to text not situated in a table does not include that kind of context. Lastly, the context of a mark itself may change with time. For example, the context of a mark to a figure inside a document includes a “caption” only as long as a caption is attached to that figure.