Using Map-Based Visual Interfaces to Facilitate Knowledge Discovery in Digital Libraries

OlhaBuchel
Faculty of Information and Media Studies
University of Western Ontario
/ Kamran Sedig
Department of Computer Science
Faculty of Information and Media Studies
University of Western Ontario

ABSTRACT

In recent years there has been growing interest in supportingknowledgediscoveryactivitiesusing map-based visual interfaces. The goal is promising and ambitious, but not very easy to achieve due to the lack of understanding of cognitive factors involved in how information is transformed into knowledge. In this paper we present a map-based visual interface,VICOLEX (VIsualCOLlection Explorer),aimed at facilitating and supporting knowledgediscovery and users’ cognitive activities by means of integrated visual representations coupled with interactions.

Keywords

Map-based visual interfaces, design, knowledge discovery, visualrepresentationofgeoreferenced collections, digital libraries.

INTRODUCTION

In 1999 MacEachrenet al. (1999)suggested thepossibilityofintegrating geographic visualizations with knowledge discovery (KD). The possibility of bringing thesetworesearchareastogether sounds ambitious, yet promising.The two disciplines can both contribute insight to the joint venture.Oneofthepossibleareasofinvestigationthatcanemergefromthisjointventureisthedevelopmentof map-based visual interfaces (MVIs) that can support KD activities. At the outset ofsuchinvestigation, it seems that themainconcernof both researchareasisthefollowing:discovering useful knowledge within given information[1].Thisconcerninvolvessuchtasksas discovering patterns in large volumes of data, identifying new patterns of data distribution and dispersion, formulating hypothesis based on observed patterns and trends, and finding new unsuspected correlations and relationships (Fayyad, Grinstein, & Wierse, 2002; MacEachren et al. 1999).This observation, however,does not resultina simple borrowingofideasandmethodsfromthetwoareas, and does not translate to readymade, easy design choices.

Fayyad, et. al. (2002) and MacEachren et al. (1999) emphasize the interactive and iterative nature of KD. In KDhumansneedto interpret information and make many decisions intheprocessofrefiningknowledge. From the point of view of cognition, information interpretation and decision making are complex activities in their own right. In feature interpretation, for example, users have to link abstract representations of data with the prior knowledge of their own. Although a great portion of interpretation takes place in the human mind, people often help their thinking by performing small external actions with information such as selecting, filtering, rearranging, reformulating, and simplifying representations (Kirsh, 2009). At first glance, such actions might seem superfluous, but their value can be better understood when they are considered in the context of some activity (e.g., in the context of performingKD activities usinghighly-cluttered maps). To cope with complexity and to interpret encoded information, users’ visual system samples visual information on maps by some inherently selective perceptual acts that direct the attention to restricted regions of the visual field. Byprocessingmaps selectively, people visually extract a pool of hot spots(Amit & Geman, 1999; Yang, Yuan, & Wu, 2007), suppress the distracters, apply spatial filters, group similar items, and perform operations on entire groups (e.g., reject, classify) (Luck & Hillyard, 1994). To compute all these operations in themind byrelying solelyon vision is difficult. For thisreason when people work with paper maps they often perform many external actions: they fold maps in order to better focus on hot spots; they annotate them; they mark locations of interest; and carry out other actions (Knapp, 1995). This example demonstrates that with external actions people help their vision and prepare information for higher-level cognitive activities such as interpretation, decision making, and KD. Therefore, MVIs thatareintendedtofacilitateKD activitiesshould provideuserswithmechanismsbywhichtheycanactupon visual representations(MacEachren et al. 1999).

The goal of this paper is to examine the roleof interactions with visual representations in KD activities. In particular, we explain the value of interactions in MVIs as front ends to complex digitallibraries(DLs).

BACKGROUND

An MVI is made of interactive representations that provide access to information and facilitate andsupporta KD activity.A representation here refers to an integrated set of visual encodings of entities (such as documents and locations) and their properties. Such representations can take various forms: maps, graphs, tables, and so on. The main representation in a georeferencedcollection is a map. However, as an information space can be very complex, a map can only encode a subset of the space’s entities and relationships. As a result, other information elements and relationships can be encoded and communicated using different representations. These representations can then be integrated to work as a unit at the interface level. For example, ingeoreferenced DLs, other representations, such as tables and graphs, can be placed on top of a map to communicate other aspects and properties of information. Even though representations encode information elements and their relationships and properties, all static representationshave limitations, can support only certain tasks, and can provide answers to certain questions. Finally, due to the amount and complexity of encoded information, a representation may become cluttered and dense, and hence ineffective at communicating information. This is certainly true of maps representing complex DL collections.

To compensate for some of the limitations of static representations and to increase their utility, the MVIs should provide support for users’ actions by means of computer interactions. Computer interactions have two components: actions and reactions(Fast, & Sedig, Accepted). A user acts upon a representation and the representation reacts and gives a response. An interface can reduce its complexity and density by making certain representations of information latent. Interactions, then, can allow users to perform physical actions on the interfacein order to bring latent information to a more observable level in order to simplify mental unpacking and elaboration associated with representations (Kirsch, 2003). More specifically, interactions enable different properties, relations, and layers of static visual representations to be probed, and available on demand, thereby making the information representations better suited to the individual and contextual needs of users; this can potentially enhance users’ ability to explore, navigate, and transform different elements and features of map-based visual representations, all important cognitive tasks involved in KD activities.

Besidesinformation latency,informationcontextalsoplaysanimportantroleinKD. Any particular object, document, data, or event can be informative only under certain circumstances depending on the inquiry and on the expertise of the inquirer (Buckland, 1991). It follows from this that the designers of visual representations, whose goal is to facilitate KD, have to surmise situations of information use. In this paper weassumetheposition that situations can be predicted for particular contexts, and interactions can support situational use of information. Interactionsserve as a gluethat binds a series of low-level actions to support different situational tasks that can be performed with representations.Inthissense,interactions allow informationto behave dynamically and situationally so that it can facilitate users’ needs more effectively. This in turn plays an important role in transforming information into personal knowledge in KD situations.

PROTOTYPE COLLECTION

Our testbed collection is about the local history of Ukraine. It is comprised of 349 MAchine Readable Cataloguing (MARC) book records from the Library of Congress Catalogue. All these records have call numbers that belong to DK508 class of the Library of Congress Classification. This class contains placenames and has many MARC records linked to them via call numbers. Among the selected MARC records there are the entire collections of records for 32 Ukrainian cities which we treat as sub-collections of the whole collection. This collection is highly contextual: documents in this collection are interconnected by subjects and have similarities in bibliographic descriptions, forms/genres, languages, and places of publication. Context in this collection is inferred by the ontological properties of documents in the collection such as physical descriptions, languages, subjects, and authors. All of the above-mentioned properties were chosen to be visually represented.

VICOLEX

In this section we present our prototype MVI, VICOLEX (VIsualCOLlection Explorer). VICOLEX is designed to allow usersto

explore georeferencedcollections. It is designed with close attention to representations and interactions with the purpose of making collection structure more salient; providing users with multiple perspectives on the data; and therefore facilitating KD and sense making.

Representations

As to representations, we chose a variety of different representations, each of which represents a collection from a different perspective. More specifically, all metadata records were mapped onto Google Maps (GM) (see Figure 1 below). Each marker of GM represents the number of metadata records in each sub-collection. Since some collections for individual locations have quite a large number of records (e.g., Lviv – 78, Kyyiv – 92), additional graphical representations were usedtorepresentontological properties of sub-collections. An example is shown in Figure 1. The scatter plot is utilized for showing book heights, number of pages, and languages (Figure 1.a); the pie chart, for displaying languages (Figure 1.b); the histogram, for showing years of publication (Figure 1.c); the embedded map, for visualizing places of publication (Figure 1.d); the Kohonen map, for representing subjects (Figure 1.e); and the tag cloud, for displaying authors (Figure 1.f).

Figure 1.Representingcollections on Google Maps.

Overall, VICOLEX has 193 representations. These representations encode the entities in the prototype collection and their properties. These representations help users gain insight into the various aspects of the collection which are hidden from view on the main map. Each representation encodes only small portions of the information about the collection and supports only specific tasks, hence making the main map in VICOLEX less cluttered. Some representations assign additional meaning to data (e.g., histogram of the years explains years of publication in terms of historical periods). Each set of representations for each location encodes storybooks about that sub-collection, related to subjects, years of publication, languages, book sizes, authors, and where the sub-collection was published.

Interactions

Despite obvious advantages, the above approach of using different representations to communicate properties and entities in the collection still has shortcomings. In particular, it is difficult to understand how properties are related to each other; how they are distributed spatially and temporally in the collection; how properties of the collections can be combined and viewed together; and how people can adapt VICOLEX’s MVI to their own needs.To overcome these shortcomings in VICOLEX, we augment representations with interactions, particularly linking, filtering, selecting, and grouping which we discuss next.

Filtering

Filtering allows users to sift out document properties. Users can query the ontological properties of acollection by ticking off checkboxes and by setting limits on timelines that show time of acquisition and publication (shown in Figure 1). Property-based filtering reduces the complexity of high-dimensional data, reduces cluttering, gives users flexibility in selecting properties, and generates a number of easy-to-understand displays, each focused clearly on a particular aspect of the underlying data. Ingeneral,filtering helps inhibit the processing of task-irrelevant information, increases the speed and accuracy of information processing, and reduces cognitive effort required to complete property-related tasks (Enns & Akhtar, 1989). InVICOLEX,theresults of filtering can be observed not only on the surface of the map, but also on the representations of ontological properties of individual sub-collectionsthatare linked to markers. Because of filtering on the map, the representations of properties in sub-collections become more legible and easier to understand. Such filtering allows completing tasks not only at the level of informationentities, but alsoatthelevel of properties.

Selecting

Variable selection and feature extraction are regarded to be crucial steps in KD (Fayyad, Grinstein, & Wierse, 2002). Selecting objects with certain properties from unnamed geographic areas (e.g., north or south of some region) from MVIs can be quite challenging because such regions are rarely described in systemsexplicitly. To facilitate thistypeofselection, VICOLEX allows selecting regions with markers by drawing a bounding box around markers with a drag-and-drop rectangle corner technique (Figure 2.b).

This selection is intended to provide a sandbox like feel to the MVI, with the capability to dynamically adjust properties of objects since such a selection can be performed bothon an entire collection aswellas on a filtered collection. For example, a user can make visible only books about history and select only those from the Western Ukraine using the bounding box (Figure 2.a and b). Properties that are suppressed by filtering cannot be selected with the bounding box. Moreover, the area selection mechanism in VICOLEX is coupled with grouping interaction which results in representing the selected documents with the same set of additional representations as documents that are linked to individual markers (Figure 2.c). Such selections with groupings can be useful for answering the following questions: a) In which area of Ukraine do collections have more illustrations? b) Are places of publication in collections about small locations different from places of publication about large locations? c) Is there a difference in subjects in collections about different parts of Ukraine? And other queries.

Figure 2.Example of selection with filtering and grouping.

KNOWLEDGE DISCOVERY USING VICOLEX

In this section, we briefly discuss how representations of the entities in the prototype collection along with the implemented interactions in VICOLEX support KD.WereportanumberofdiscoveriesthatwemadeusingVICOLEX, particularly with regard to changes in the collection during the 1980ies and 1990ies. In general, the discoveries can be classified as quantitative and qualitative.

Quantitative

One of the things that we were able to discover was that the larger half of the entire collection was published after 1991. This is evident from filtering the main map by the years of publication: before 1990 and after 1991. Second, we found that, in publications prior to 1990 maps were rarely included in books. Moreover, books with maps published before 1990 are about large cities only, whereas books with maps after 1991 are about both small and large places. Third, the number of publications in Ukrainian significantly increased after 1991. Fourth, books in Polish about Ukraine were nonexistent before 1981. But beginning with 1981 the number of publications in Polish started increasing incrementally, especially about Lviv. Fifth, after 1991 certain subjects startedto demonstrate significant growth (e.g., Biographies, Archaeological Excavations).

Qualitative

The majority of qualitative changes are associated with subjects. Just as the number of published books increased after 1991, the variability of subjects became greater after 1991 too. Subjects that emerged after 1991 are “Ethnic Relations,” “Ukrainian, Nationalism,” “Minorities,” “Jews,” “Economic conditions,”“International Executive Service Corps,”“Vinnytsia Massacre, Vinnytsia, Ukraine, 1937-1938,” “Political Prisoners,” “Rehabilitation,” “Political Prosecutions,” “Prisoners of War,” “Massacres,” and others. Many of these subjects were banned during the period when Ukraine was part of the Soviet Union, and therefore they do not appear in books published before 1990. Second, it appears that books about locations with population size smaller than 200,000 people appear to be smaller and fewer in total than books about larger locations. Third, Russian-language books are distributed more in the East and South than in the West. In addition, we were able to discover a few sub-collections with unusual language distributions other than Ukrainian and Russian.

CONCLUSIONS

In this paper, we have presented VICOLEX, a prototype front-end interface that provides ample support for userstoperformKD by means of interactingwith MVIsoflibrarycollections.AfewoftherepresentationsusedinVICOLEXincludedmaps,piecharts,scatterplots,andtagclouds.Multiplicityofrepresentationsisintendedtokeepinformationlatent, nottooverwhelmuserswithtoomuchinformationatonce.Thelatentinformationremainshiddenwaitingforusers’ interactions.Afewoftheinteractionspresentedinthispaperwere linking, filtering, and selecting.These interactions are intended to support KD activities. Assuch,they simplify interpretation and understanding of information in various situations, facilitate transformation of information into personal knowledge forusers, and ultimately support higher-levelKDactivities. The VICOLEX conceptualization can be utilized in the design of front ends to complex DLs with georeferenced collections. With numerous representations coupled with interactions DLs will become more suitable for KD.

REFERENCES

Amit, Y., and Geman, D. (1999). A Computational Model for Visual Selection. Neural Computation, 11, 7, 1691-1715.

Buckland, M. (1991). Information and information systems. New York, NY: Greenwood Publishing Group, Inc.

Enns, J. T., and Akhtar, N. (1989). A Developmental Study of Filtering in Visual Attention. Child Development (60), 1188-1199.

Fast, K., and Sedig, K. (Accepted). Interaction and the epistemic potential of digital libraries. International Journal of Digital Libraries .

Fayyad, U., Grinstein, G., and Wierse, A. (2002). Information visualization in data mining and knowledge discovery. London, UK: Academic Press.

Kirsch, D. (2009). Interaction, External Representations and Sense Making. Proceedings of the 31st Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society.

Knapp, L. (1995). A task analysis approach: to the visualization of geographic data. In T. L. Nyerges et al. (Eds.), Cognitive aspects of human-computer interaction for geographic information systems (pp. 355-371). Springer Verlag.

Luck, S. J., and Hillyard, S. A. (1994). Spatial Filtering During Visual Search: Evidence From Human Electrophysiology. Journal of Experimental Psychology, 20, 5, 1000-1014.

MacEachren, A. et al. (1999). Constructing Knowledge From Multivariate Spatiotemporal Data: Integrating Geographic Visualization (GVis) with Knowledge Discovery in Database (KDD) Methods. International Journal of Geographical Information Science, 13 (4), 311-334.

Swanson, L. (1986). Organization of mammalian neuroendocrine system. In V. Mountcastle, F. E. Bloom, & S. Geinger (Eds.), Handbook of physiology. Sec. 1, The nervous system, Vol. IV, Intrinsic regulatory systems of the brain (pp. 317-363). Bethesda, MD: American Physiological Society.

Yang, M., Yuan, J., and Wu, Y. (2007). Spatial selection for attentional visual tracking. Computer Vision and Pattern Recognition, 1-8.

[1]In this paper, we use the terms data and information interchangeably.