Proceedings Template - WORD s10

Display-agnostic Hypermedia

Unmil P. Karadkar, Richard Furuta, Selen Ustun, YoungJoo Park,

Jin-Cheon Na*, Vivek Gupta, Tolga Ciftci, Yungah Park

Center for the Study of Digital Libraries and Department of Computer Science
Texas A&M University

College Station, TX 77843-3112

Phone: +1-979-845-3839

*Division of Information Studies,

School of Communication & Information,

Nanyang Technological University,

31 Nanyang Link, Singapore 637718

Phone: +65-6790-5011

ABSTRACT

In the diversifying information environment, contemporary hypermedia authoring and filtering mechanisms cater to specific devices. Display-agnostic hypermedia can be flexibly and efficiently presented on a variety of information devices without any modification of their information content. We augment context-aware Trellis (caT) by introducing two mechanisms to support display-agnosticism: development of new browsers and architectural enhancements. We present browsers that reinterpret existing caT hypertext structures for a different presentation. The architectural enhancements, called MIDAS, flexibly deliver rich hypermedia presentations coherently to a set of diverse devices.

Categories and Subject Descriptors

H.5.4 [Information interfaces and Presentation]: Hypertext/Hypermedia – architectures.

General Terms

Design, Human Factors

Keywords

Display-agnostic Hypermedia, Multi-device Integrated Dynamic Activity Spaces (MIDAS), context-aware Trellis (caT)

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

Conference’04, Month 1–2, 2004, City, State, Country.

1. INTRODUCTION

Over the last decade, the characteristics of information access devices have grown dramatically more diverse. We have seen the emergence of small, mobile information appliances on one hand and the growth of large, community-use displays like SmartBoard [SMART 2003] and Liveboard [Elrod 1992] on the other. Desktop computer displays also sport a variety of display resolutions. While PDAs and cell phones are widely used for Web access, several other devices like digital cameras [Nikon 2004] and wristwatches [Raghunath 2002] are acquiring network interfaces to become viable options for information access. These devices vary in terms of characteristics such as their display real estate, network bandwidth, processing power and storage space. Optical LED (OLED) displays that can be tailored for individual applications and embedded into various daily use items will soon be widely available [Howard 2004], thus further diversifying the display properties of information appliances.

Despite the diversity in appliance characteristics, most Web pages are created and optimized for viewing from desktop computers. To address this issue, a significant body of research has focused on developing methods to tailor this information for presentation on mobile devices. Projects like WebSplitter [Han 2000], Power Browser [Buyukkokten 2002], Proteus [Anderson 2001] and the Content Extractor [Gupta 2003] filter Web content to facilitate its presentation on mobile devices. Popular Web portals like Yahoo! [Yahoo 2004], The Weather Channel [Weather 2004], and CNN [CNN 2004] also provide interfaces and services for mobile devices. Typically, the Web and mobile services are based on independent architectures and retrieve information from a common data store. While this approach caters to mobile devices, it requires the service providers to maintain multiple system architectures and synchronize content across these architectures. These services must be periodically reconfigured in order to accommodate new devices and changes to characteristics of existing devices or risk losing their patronage. Furthermore, these mobile services, much like Web site design practices, focus on delivering information to specific classes of devices.

The pro-desktop bias of the Web information access model is not limited to technology alone. As most desktop computers are located in home or office environments, this model inherently assumes that Web access clients browse the information from these environments. Mobile service architectures, whether they filter information or replicate services, focus solely upon the technological issues regarding information delivery. However, the needs and expectations of mobile users are different from those of desktop users. Few present-day models tailor the modality of delivery based on characteristics of the surrounding environment without explicit action from the user.

In this paper we present two approaches to separating the information content of context-aware Trellis (caT) [Na 2001] hypertexts from their mode of presentation. The first approach involves development of new browsers that reorient and repurpose the hypertext content for novel presentations. The second approach enhances the caT architecture to support dynamic integration and co-use of devices with different characteristics for rich information interactions. To accommodate differences in the strengths of devices that render them, MIDAS separates the information content from the mode of its presentation. MIDAS-based hypertexts take the form that their rendering device can best present and are thus, display-agnostic.

MIDAS supports co-use of various devices available to a user, say devices that users carry with them, like cell phones, PDAs, pagers, and notebook and tablet computers, or in some cases, publicly available desktop computing resources in airports and malls, to augment the information delivery environment. While the smaller mobile devices may be individually restricted by their physical characteristics, MIDAS can use these in combination with others to overcome their individual limitations to make feature-rich presentations. For instance, a user who carries a cell phone and a networked PDA may view annotated illustrations when neither of these devices have enough display space to visually render this information. The cell phone may aurally render the annotation while the PDA displays the corresponding images. Textual annotations could be rendered easily as audio via freely available software, such as the Festival Speech Synthesis System [FSSS 2004], in order to overcome the lack of display space. While MIDAS jointly uses the cell phone and PDA for the presentation, this association is temporary and extends only for the duration of this presentation.

The rest of the paper is organized as follows: in the following section we review the work this research builds upon. The next section presents our approaches to tackle the issues involved in presenting hypermedia content effectively over multiple devices. We then describe the MIDAS architecture and discuss how it connects to other relevant research projects. We conclude this paper with directions for continuing our work.

2. context-aware Trellis (caT)

The context-aware Trellis (caT) hypertext model [Na 2001], an extension of Trellis [Stotts 1988], affords simultaneous rendering of diverse, multi-modal presentations of information content via a variety of browsers. caT documents can be presented differently to various users, based on their preferences, characteristics, and a wide variety of environmental factors such as their location, time of access and the actions of other users perusing the document [Furuta 2002]. The caT model differs from that of the Web in several respects. We highlight the salient differences between these models as we describe the caT interaction model.

In Figure 1, two users, John and Bob, are simultaneously browsing a hypertext from a caT server. John is browsing sections of the document from his desktop computer via two different browsers and from his notebook computer via yet another browser. Bob is accessing parts of this document from his notebook computer via two browsers. While each browser may present different but related sections of the document, it is equally likely that John is viewing the same section of the document via two of his browsers. Unlike Web browsers, caT allows its browsers a great deal of flexibility in presenting information. While all Web browsers render a given document identically, caT browsers present documents differently based on the properties of the browser. The caT server only tells the browsers what to present but leaves the finer aspects of the presentation to the browsers. Browsers have some flexibility in deciding how to present this information. Thus, John may actually be viewing a part of the document in multiple media formats; while browser A displays images, browser C may present information textually, and browser B may only present information that can be tabulated. caT also supports synchronized information presentation to a set of browsers. Bob may thus watch a video of a product demo in browser D, while browser E presents the salient points about each feature as it is presented in the video.

The other interesting aspect of this interaction is that user actions are reflected in all browsers currently viewing the information, even if they belong to different users. When John follows a link in one of his browsers the caT server, unlike a Web server, propagates the effects of this action to all five browsers connected to it. While this action will almost certainly affect the display of John’s other browsers, it also has the potential to influence Bob’s browsing session. If John and Bob were browsing a Web document, the effect of John’s action would be reflected in Browser C alone. Typically, it would not affect other browsers whether they belong to the same user or another, and whether they run on the same computer or a different one.

The browsing experiences of caT users may vary depending upon a variety of environmental factors, for example, location. If Bob is working from home he may see the document differently than John, who may be working from his office. The document may also be shown differently depending upon personal characteristics such as the roles they play in their organization. John, being a developer, may see the technical details of a project, while Bob, a project manager, may get a quick overview of the status of various project deliverables.

In the WWW hypertext model, individual browsers maintain the state of browsing via techniques that are generally reliant upon cookies. Closing a Web browser often results in loss of state and the user must restart the browsing session. In contrast, the caT model maintains the browsing state for all users on the server. Browsers may connect or leave the server without affecting their browsing state. If John were to open another browser to view this document, it would instantly reflect his current state in the new browser as well. In fact, John could close all his browsers, return the next day and continue browsing from where he left off today.

caT allows users to view hypertext materials from different devices in a variety of modes. This flexibility makes caT an ideal vehicle for building display-agnostic hypermedia.

3. APPROACH

Display-agnostic hypermedia structures naturally lend themselves to multiple forms of presentation. We include display-agnosticism in hypertexts in two different ways: by developing browsers that can interpret information content in diverse ways and via architectural enhancements to caT.

3.1 Browser Development

We have expanded caT’s repertoire of browsers with an audio-video [Ustun 2004] and a spatial browser. Before we developed these browsers, caT supported textual and image browsers and a Web interface that presents text and image composites [Na 2001]. The audio-video browser renders textual information aurally, thus providing a different rendering for existing information. The spatial browser renders a hypertext’s information contents as widgets on a canvas.

3.1.1 Audio-Video Browser

The audio-video browser serves a two-fold purpose: it renders audio and video information associated with caT hypertexts; it also renders textual materials aurally [Ustun 2004]. Auditory browsing serves as the primary browsing interface for visually impaired users; sighted users can also use it in conjunction with other browsers to avail themselves of an additional mode. This browser uses the Xine multimedia player for audio playback [Xine 2004]. It generates audio from text via the Festival Speech Synthesis System [FSSS 2004], which includes English as well as Spanish voices and supports the integration of MBROLA [MBROLA 2004] voices for greater versatility.

The user interface employs a simple keyboard-based interaction mechanism and confirms user input via audio prompts if the user so chooses. The user interface works in two modes—navigation and audio content. Initially the browser starts up in the navigation mode. This mode supports users in browsing the hypertext by following links and selecting the pages to visit. Once a user visits a page, she has the option to listen to the contents of the page. The audio content mode is initiated when the information associated with the page is presented to her. This mode provides an interface for controlling the content presentation to suit her preferences. She can play or pause the rendering and skip or skim the contents in either direction. Ending the audio content mode returns her to the navigation mode.

The selection of sounds and voices to use is crucial for helping users differentiate between audio prompts, user action confirmation and content presentation. The interface employs recorded audio prompts for notification of user actions and synthesized voice for audio prompts and content rendering.

Frequently used actions are mapped to the numeric keypad so that the most common functions are grouped together. Other actions are mapped to letters that best represent them. Some inputs are available to the user regardless of the mode. Table I displays the inputs available to users in both navigation and audio content modes. The ‘T’ toggles whether to provide audio feedback of user actions. The escape key is used to break out of the active audio stream, whether it is file contents, help menu, or information options. The actions “help” and “information” both return context-sensitive information. The help feature reminds the users of various key mappings available in that mode. The information command presents a brief summary of the user’s current context. In the navigation mode, users hear information about the user’s location in the hypertext and the actions available to her. In the audio content mode it returns information about page contents, for example, name of the file associated that is being presented, the duration of audio presentation of the file, and the user’s current position in the file.

Table II displays the commands available to a user in the navigation mode. The up and down arrow keys are used to cycle through the list of available links. The right and left arrow keys let the users cycle through the list of active pages. The ‘P’ and ‘L’ keys present a complete listing of the pages and links available. The ‘S’ key returns this description or summary associated with the current page. The user may select a page or link from the presented list by its number via the numeric keys located above the alphabetic characters. The “Return” key selects the current link or page. If the user selects a link she navigates to the next set of pages and the interface presents her with the corresponding information. On the other hand, if a page is selected, the system switches to the audio content mode and the key mappings change to those shown in table III.