Tag Wrangling for Information Professionals 1

Tag Wrangling for Information Professionals:

Suggested Criteria for Evaluating Folksonomies

Ellen MacInnis

Rutgers University

Digital Libraries

17:610:553

Fall 2014

December 1, 2014

Abstract

With the growing prevalence of folksonomies comes a growing need for information professionals to understand and evaluate these user-created taxonomies. A survey of literature determines how folksonomies compare with professionally-created controlled vocabularies, why users choose to contribute metadata by tagging content, and what past evaluations of folksonomies have revealed. This paper then proposes criteria for evaluating folksonomies and concludes by suggesting future research.

Introduction

The past decade has seen increasing interest in the nature of folksonomy. Also known as user-developed classification (Shirky 2005), user-created taxonomies (Smith 2007), collaborative tagging (Macgregor & McCulloch 2006;Golder & Huberman 2006), social tagging (Daglas et al. 2012; Gupta et al. 2010), and collective tagging (Golder & Huberman 2006), folksonomies occur where resource users rather than professional catalogers assign metadata to digital objects. Often this metadata is in the form of tags, hence the proliferation of terms like “social tagging.” Though individuals may use these tags for personal information management, when those tags are viewed in the context of other users’ they provide a snapshot of a document or object’s aboutness. Users collaboratively and asynchronously decide on the best language with which to describe documents. While some authors describe folksonomy in terms of stark opposition to professionally created controlled vocabularies, others see the two as complementary pieces to a larger information organization puzzle.

This paper examines literature surrounding folksonomy and digital library evaluation to satisfy three points of inquiry: how folksonomies compare with traditional controlled vocabularies, what makes users willingly participate in the construction of folksonomies, and how one might evaluate folksonomies. To determine evaluation criteria for folksonomy, this paper draws from past evaluations by Zubiaga et al. (2013), Daglas et al. (2012), Yoo et al. (2013), Choi (2014), and Smith (2007) as well as evaluations of digital libraries in general by Saracevic (2000) and Xie (2006). The primary evaluation criteria this paper suggests are usability, metadata content and structure, collection quality, and user community. Along with discussion of each criteria are examples of possible evaluation questions. Finally, this paper concludes and suggests avenues for future research.

Literature Review

Folksonomy versus controlled vocabularies

It is important to note that many authors reject, or at least question, the idea that folksonomy must be inherently opposed to controlled vocabulary. Smith (2007) acknowledges past discourse has framed this “as an antagonistic either/or debate” (p. 3). Zubiaga et al. (2013) offer the reminder that professionals such as writers and librarians may very well tag on sites like LibraryThing and Goodreads (p. 1803). Gupta et al. (2010) make the assertion that folksonomies are a compromise netted between traditional classification systems and no classification system at all, and that folksonomies are certainly better than nothing (p. 59). Guy & Tonkin (2006) agree that social tagging cannot replace the function and necessity of formal classification systems but believe this is a feature rather than a bug of folksonomy.

However, Shirky (2005) addresses what he sees as “a radical break with previous categorization strategies, rather than an extension of them” in collaborative tagging. While ontological classification, of which controlled vocabularies are an example, are necessary for organizing books on physical shelves, their hierarchical structure begins to fall apart in a digital environment where hyperlinks can “shelve” books in multiple places at once. Shirky (2005) does acknowledge the strengths of ontological classification, suggesting that it performs best in an environment with an overall small body of work involving formal and stable categories that can be cleanly defined, created and used by experts and authoritative sources; he gives the example of the DSM-IV as an example of successful ontological classification. However, in environments with a large, ill-defined body of constantly shifting work created and maintained by amateurs with no central authority - in other words, the internet -collaborative tagging thrives

Macgregor & McCulloch (2006) characterize controlled vocabularies as having the ability to control synonyms, discriminate between homonyms, minimize “noise” from grammatical variations, and unite related terms in a hierarchy (p. 292). A controlled vocabulary has accounted for the fact that some resource users prefer the term “car” to “automobile” or may search for “recipes” while others prefer the singular “recipe.” One common issue with folksonomies is that such a structure is not in place; items tagged with the singular form of a noun could be kept fully separate from those tagged with a plural form. However, the structure of controlled vocabularies is not necessarily a boon when it comes to online research. There is simply too much online content proliferating too quickly for any single method of controlled vocabulary to keep up. The authors predict that a “dichotomous co-existence” between folksonomy andcontrolled vocabularies will emerge wherein information seekers can choose the method best appropriate for their information need: controlled vocabulary for formal research requiring precise searching and folksonomy for informal research involving the exhaustive browsing of broader subject areas (p. 296).

This “dichotomous co-existence” is an attitude shared by Dagnas et al. (2012) who suggest that metadata culled from social tagging may be used to supplement the controlled vocabulary of a library’s subject index (p. 247). The authors propose a methodology of examining the value of tags in context with related bibliographic records with an aim to preserve the functionality of a library’s OPAC (p. 249). The authors caution against the unchecked impact of tags on libraries’ pre-existing authority files and are of a mind that the two taxonomic approaches are naturally in opposition to each other. However, the authors believe that both approaches would benefit by operating in tandem with one covering the other’s weaknesses and vice versa (p. 257-258).

Reasons for user engagement

One of the defining characteristics of folksonomy is the participation of users to create and share metadata, but why should users want to participate at all? Several authors have noted the incredibly low “cost” of entry into folksonomic systems versus those with controlled vocabularies, both in terms of viewing and creating metadata. McShane (2011) writes on the low entry costs of Web 2.0 technologies in general and believes they explicitly “encourage participatory culture” (p. 384) while Gupta et al. (2010) note the low cognitive costs associated with social tagging (p. 59). This comparatively low cost of entry may be partially ascribed to the fact that folksonomies are by necessity built around users’ own linguistic preferences while controlled vocabularies are not. As Guy & Tonkin (2006) state, “The strength of a folksonomic approach is often described to be its openness, the ability of any given user to describe the world as he or she sees it.”

Because controlled vocabularies do not, by their nature, use searchers’ own linguistic preferences, Shirky (2005) believes there is an inherently higher cost associated with ontological classification than folksonomy. Users must guess how guess how information has been previously organized by the cataloger and institutions must make an effort to educate those users on their pre-determined classification schemes, both of which absorb time and resources. Gupta et al. (2010) refer to users’ cognitive cost as “post activation analysis paralysis,” in which technology users are presented with too many possible information retrieval options or too little education for choosing among those options and are temporarily unable to determine a path forward at all (p. 59).

Spiteri (2006) believes that users are fully aware of the shortcomings of folksonomy and yet are willing to tolerate them precisely because of these low barriers to entry (p. 80). Though acknowledging the potential for frustration when navigating and negotiating linguistic and cognitive difficulties, Golder & Huberman (2006) point out that collective tagging also provides the opportunity for users to learn from each other (p. 201) and is useful in environments where no trained information professionals are present (p. 198). Smith (2007) wonders if the fact folksonomies provide high recall and low precision would be good enough for hobbyists who prefer social tagging to controlled vocabulary (p. 15).

Finally, users may tag resources simply because it is enjoyable. Macgregor & McCulloch (2006) characterize folksonomy as an exciting and interactive information service that gives users a direct role in knowledge organization (p. 296). Spiteri (2006) notes the inherently social aspect of folksonomy, where the very act of tagging also creates virtual communities connecting users with shared interests (p. 76).

Past evaluations of folksonomies

Zubiaga et al. (2013) examine the resemblance between user-created tags and professional categorization. Through support vector machines (SVM), a “state-of-the-art classification algorithm,” the authors analyze three data sets derived from social tagging systems Delicious, LibraryThing, and Goodreads (p. 1801). This analysis shows social tagging to consistently outperform other data sources, namely the content of the object being tagged and user reviews, in classifying content. Tagging mechanisms and the presence of suggested annotations have an effect on this performance; for example, because Goodreads does not require uses to tag books added to their collections and because tagging is a multi-step process, Goodreads tags showed poorer performance in subject analysis than either LibraryThing or Delicious when compared with other data sources (p. 1809-1810). The authors conclude there is “great potential of social tags” in classifying resources, especially in conjunction with other data sources such as reviews (p. 1812).

Daglas et al. (2012) offer another methodology for evaluating folksonomy by analyzing overlap among three variables: tags presented within a tag cloud, an authority file describing subject headings, and the vocabulary of user queries. Tags that are not deemed “worthy,” or those that do not “enrich” the subject descriptions within the authority file, are discarded while the remainder are compared with bibliographic data found in Google Books and LibraryThing to create a weighted metric. This metric seeks to assist information professionals in selecting user-generated tags which enhance the subject description of the authority file (p. 249-250). The results indicated that some tags, such as “urban planning,” matched subject headings within the authority file while other tags, such as “Jung,” needed closer evaluation to determine if they overlapped with the authority file. These results suggest that even with the implementation of this methodology, “information professionals still face the problem of assessing the quality of the recommended tags” (p. 256).

Yoo et al.(2013) suggest that the inherently dynamic and “overloaded” nature of online environments requires equally dynamic classification schemes to support information indexing and retrieval (p. 594). In flat folksonomy, completely lacking in hierarchical structure, all tags have the same value regardless of breadth or specificity. There have been attempts to create a hybrid between the chaotic equality of tags in flat folksonomy with the rigid, pre-determined structure of controlled vocabularies. The authorspropose a “categorized tag knowledge organization system” (CTKOS) which offers both an uncontrolled vocabulary from which users may select any terms they wish, yet also creates a hierarchical structure within that uncontrolled vocabulary which dynamically adapts to each new tag introduced. The primary method of evaluation was the Technology Assessment Model (TAM), which has been used previously to measure information technology acceptance among users. The primary criteria of the CTKOS evaluated were perceived usefulness, perceived ease of use, and behavioral intention to use (p. 603).

Another method of evaluating folksonomy involves comparing tagged terms with those in controlled vocabularies. Choi (2014) points out that consistency studies meant to determine the quality of subject indexing have routinely included only a small number of professionals while excluding users, thereby precluding an analysis between professionals’ terms and those of the users themselves (p. 1). In a comparison of tagging consistency versus that of professional catalogers’, evaluation showed “consistency over all subjects among taggers while there was inconsistency over all subjects between two groups of professionals” (p. 18). Smith (2007) reports similarly in her analysis of LibraryThing user-created tags compared with Library of Congress Subject headings. The amateur metadata generated for Harry Potter and the Half-Blood Prince more accurately reflected that book’s aboutness than did it associated LCSH data (p. 10). There was further discrepancy with subject indexing of the graphic novel Persepolis, which received only a single LCSH entry; Smith (2007) suggests this is an example of timeliness issues that make controlled vocabulary less capable of reacting to current terms and interests. LibraryThing may be the more “linguistically forgiving” option in subject searching because the synonym redundancy present in its folksonomy improves retrieval by natural language search (p. 15).

Evaluating Digital Libraries

To determine important criteria to include in a method of evaluating folksonomies, this paper consulted literature on the evaluation of digital libraries in general was. Saracevic (2000) typifies such an evaluation as “a complex undertaking” rife with challenges (p. 352) partially exacerbated by the “broadly and vaguely” defined meaning of “digital library” for research purposes (p. 353). Citing Borgman (1999)’s definition of digital libraries as those which “are constructed, collected, and organized by (and for) a community of users, and their functional capabilities support the information needs and uses of that community” (p. 230), Saracevic (2000) offers the following possible criteria for evaluation:

  1. electronic resources - digital data in any medium
  2. technical capabilities for creating, searching, and using information
  3. information retrieval
  4. metadata; and
  5. community of users - their information needs and uses (p.361).

These criteria seem obviously relevant to the evaluation of folksonomies. However, there is a second definition of digital libraries offered by the Digital Literacy Foundation (1999) which defines digital libraries as organizations staffed by professionals who select, interpret, distribute, and actively maintain digital works. This definition led Saracevic (2000) to suggest another list of potential evaluation criteria:

  1. professional staff
  2. collection of digital works
  3. selection, structure, and access
  4. interpretation and distribution
  5. preservation, and
  6. use and economic availability for a defined community (p. 362)

Though this second set of criteria seem less attuned to the lack of structure and professionalism in folksonomy, it does provide important questions regarding a folksonomy’s value to its users. For example, when considering the “professional” staff of folksonomy, one could inquire about a website’s user base, or perhaps the individuals responsible for creating and maintaining the website which aggregates tagged data.

Xie (2006) writes that “the ultimate goal of the development of digital libraries is to serve users and to facilitate their effective use of information and services,” (p. 434) and thus conducts a study to determine users’ own criteria for what makes for a good or poor digital library. From her results she created the following list of criteria types:

  1. Usability
  2. Collection quality
  3. Service quality
  4. System performance efficiency
  5. User opinion solicitation (p. 440).

Of these criteria, nearly all participants in the study chose usability as the most important when evaluating digital library use. Interface usability was especially important in areas of search and browse functions, which 71% of participants considered “essential” in evaluation (p. 439). There was also a stated desire for digital libraries to perform both traditional library services and those services found nowhere else (p. 442-443). Participants seemed to prefer precision over recall as “they only need enough information to solve their problems” which needed to be addressed “efficiently” (p. 445). Finally, participants valued some sort of community associated with digital libraries for the exchange of ideas (p. 445) as well as the opportunity to provide feedback as “a critical element for digital library evaluation” (p. 448).

Suggested Evaluative Criteria for Folksonomies

After considering the digital library evaluation criteria suggested by Saracevic (2000) and Xie (2006), I propose the following criteria for evaluating folksonomies: usability, metadata content and structure, collection quality, and user community. I will explore each of these criteria in detail and offer specific questions for use in future evaluations.

Usability

Xie (2006) notes that overwhelmingly, the participants in her study rated usability as the most important criteria for evaluating digital libraries. Folksonomy literature has shown that a key aspect of folksonomies is their low cost of entry and thus high usability. Therefore, a website or application that relied on folksonomy but did so without usability as a primary goal, such as a website with poor navigation, would not be able to benefit from one of folksonomy’s strongest attributes. Because so often in folksonomy navigation is information retrieval, I include that criteria from Saracevic (2000) here as well as that of “use and economic ability for a defined community” (p. 362). Zubiaga et al. (2013) also note that successfully representing tagged data depends on the technical aspects of that tagging system, particularly if that system offers suggested annotations to users as they tag (p. 1601-1602).

Evaluative questions regarding a folksonomy’s usability might include:

1)What is the cost of participating in this folksonomy?

a)Do users need to make an account to create or collect tags?

b)Can participation be done anonymously?

2)Is the folksonomy system connected with other sites (e.g., logging into Pinterest or Goodreads through one’s Facebook account)?

3)Does the system suggest terms to users as they tag?

4)Do users have access to other data sources besides tag (e.g., content reviews)?

5)Do users have the opportunity for easily visualizing tag data, as with a tag cloud?

Metadata content and structure

As folksonomy is user-derived metadata, it is important to analyze both the content of tags and how they are created and displayed. In their examination of tag usage patterns, Golder & Huberman (2006) identify several functions tags perform on Delicious URLS: identifying the subject of the URL, identifying what it is (such as “article” or “book”), identifying ownership, refining categories, identifying subjective qualities (such as “funny” or “inspirational”), self reference (such as “mystuff” or “mycomments”), and task organization (such as “toread” or “jobsearch”) (p. 203). Tags may be only narrowly applicable, such as the self referential tags which presumably are of primary importance to the individual who created them, or may have broader implications as in those identifying subject matter and subject ownership. After a certain number of Delicious users have bookmarked a particular URL, a “nascent consensus seems to form” regarding the persistence of tag proportions (p. 206). Choi (2014) seems to agree, stating that “the average number of tags per document by subject categories can help demonstrate the exhaustivity of indexing across different subjects” (p. 6). Therefore, determining a tag’s aboutness should aid in determining its related document’s aboutness.