Folksonomy – the significance of the least effort

Unpublished Working Paper

Timme Bisgaard Munk
Phd- Student, New Media
Copenhage University / Kristian Moerk
Editor, DR IT
National Danish broadcasting

Purpose: The purpose is to theoretically discuss and conclude on the value of folksonomies on the basis of a literature review and an empirical study of a great number of tags in the del.icio.us computer program, focusing on possible theoretical explanations of the imitation patterns found by analysis and the stability of the user's tags.

Design/methodology: Literature review, theory discussion and empirical statistical analysis.

Empirical findings: The existence of a statistical power law and a number of common stable patters in the user-created metadata in del.icio.us.

Practical implications: The need to teach the users how to tag in order to realize the potential of folksonomies.

Originality/value: Cognitive economizing and information cascades as a central dynamic behind the power law and the information dynamic in the user-created metadata, which makes it impossible to realize the full benefit of the system.

Keywords: Folksonomy, Social tagging, Classification, Collaborative tagging, Knowledge organization

Paper type: Empirical analysis and theoretical discussion

Abstract:

Today it is possible to find programs on the Internet that provide the users with the possibility to freely mark out information by creating their own personal metadata. This is called folksonomy.This article discusses the categorizing concept of folksonomies with the empirical point of departure being an analysis of the social bookmarking system del.icio.us. The essence of folksonomies is user-created descriptive metadata as opposed to the traditional sender-determined descriptive metadata in taxonomies and faceted classification.Its supporters perceive user-created metadata as a better, cheaper and more realistic means of creating descriptive metadata than that which can be created in taxonomies and faceted classification.Descriptive metadata is precisely the determining prerequisite for better sharing of knowledge because it is the key to creating better semantic relations and search possibilities in the explosive mass of data on the Internet. From a reception-critical deconstruction of the premises of the debate on folksonomy and a series of empirical analyses of the production of descriptive metadata in the social bookmarking system del.icio.us, a pattern in the user-created description is found through analysis. The pattern is the classic law of power, which, in many complex systems is seen to unfold as an imitation dynamic that creates an asymmetry, where a few descriptive metadata are often reproduced and the majority seldom reproduced. In del.icio.us, it is the very broad and basal subject headings that are often reproduced and achieve power in the system – which in cognitive psychology is called cognitive basal categories – while the small, more specific subject headings are seldom reproduced. The law of power's underlying imitation dynamic in del.icio.us is explained from the perspectiveof two theoretical paradigms, i.e. market and cognition. The conclusion is that the law of power and asymmetry chiefly come into being as a consequence of cognitive economizing through a simplification principle in the users' construction of the descriptive metadata. The users predominantly choose broad basic categories, because that requires the least cognitive effort. The consequences are that folksonomy is not necessarily a better, more realistic and cheaper method of creating metadata than that which can be generated through taxonomies, faceted classification or search algorithms, e.g. Google. The conclusion is that folksonomy as a self-organizing system without virtue of immanent out springing of dynamics alone can create better and cheaper descriptive metadata. This can only be achieved by the user learning to better mark up and invest both more time and cognitive resources in practice. The least effort gives the least benefit.

What is folksonomy?

The so-called folksonomies are concentrations of user-generated categorization principles that are both private and public.The term was coined in 2003 by information architect Thomas Vander Wal (Smith 2004).It is a neologism consisting of a combination of the words folk and taxonomy.Taxonomy is from the Greek taxis and nomos.Taxis means classification and nomos means management.Literally, it may be translated to "people's classification management".Folksonomy may be said to be metadata from/for the masses (Merholz 2004). In the etymological and connotative meaning of the term, the intention of folksonomy is to create a better, more popular and thus more democratic alternative to the elitist and undemocratic taxonomy.

Folksonomies are thus created by the people for the people on the basis of the premise that the categorizing people can create a categorization that will better reflect the people's conceptual model, contextualizations and actual use of the data.With folksonomy, it would in this way be possible to create a more representative, natural, comprehensive, diversified, up-to-date and dynamic categorization than through the classic taxonomy.

Folksonomies are an expression of a paradigm shift away from classic cataloging for better or for worse (Gorman 2004), with people everywhere beginning to tag information with their own words on the Internet.Folksonomies are spreading exponentially on the Internet and are now created by millions of users in a system which is no longer restricted by language or geography.

As mentioned above, the categorizing of information used to be a specialist job for a producer or a librarian, but now it is not just a professional job for the few, only it has become a job for everyone and no one.The separation between user and producer implodes with the user-generated metadata, when the active, co-creating user contextualizes and categorizes the information with his new dual role as both producer and consumer, where meaning is created through self-reflective use and co-creating construction.

Data explosion, digitalization and democratization of information have thus lead to privatization, socialization and individualization of the cataloging of data through metadata.This has resulted in new systems for categorizing information spreading complementarily, in parallel or as a substitution for the classic semantic and hierarchical classification principles.

Folksonomy may be seen both as an individual act and as an expression of many people's collective, but independent recording of metadata.As an act, it is the individual person who categorizes and thus tags information with his or her own metadata by adding personal key words.In this way, a personomy of individual tags is created.To rephrase, folksonomy as an expression is a function of the total sum of personomies, where the individual users collect and tag in order to explore, remember and retrieve their own knowledge, thus creating a shared opportunity to explore and retrieve.

This sharing and copying of knowledge is of no great cost to the tagger; however, it is of great benefit to all.The individual taggers will invest their time and effort in helping themselves, and without further cost they will implicitly also help other taggers.In this sense, folksonomy is based on a more or less explicit social contract, where the individual tagger invests his or her personomy in order to relate his or her own personal categorization to that of others.Everyone can see and use the folksonomy, but if you choose to be excluded from the contract and not contribute with your own metadata, you will be forced to follow other people's keywords at random.However, if you choose to register as a user and start tagging pages as an active and contributing participant in the folksonomy universe, you will be rewarded, as the tags and links that you add are contextualized dynamically in a user-generated system and become transparent.Metaphorically it may be said that the benefit of participating in the social system of the network is that you can always see an updated "exchange rate" of your individualized world picture relative to other people's world picture.Here, the "exchange rate" is an infinite, relational social semiosis where everyone associates each other's associations further.

Since Thomas Vander Wal coined the term folksonomy, there has been a consensus that folksonomy is the collective term for a type of social classification on the Internet.However, this is an imprecise and incorrect definition.The distinctive feature of folksonomies is that it is not classification in a strict sense, but loose, horizontal social categorizations(Jacob 2004).Folksonomy consists of disconnected and loosely related keywords, which ideal-typically exist in a coordinated horizontal universe, only connected by associative relations.Here, there is no hierarchy between superior and subordinated concepts.No keywords are children, parents, twins or synonyms as in classic taxonomy.In theory, the user may thus choose freely without considering the hierarchy.In practice, however, there are forms of categorization in the folksonomies where the relations between the individual tags are hierarchical, because the user chooses tags that are not coordinate but subsets of each other. All words in the folksonomy are thus in theory unrelated.Only the associative relations between keywords are generated on the basis of the collaborative recording of tags.Contrary to formal taxonomy and classification, folksonomy thus lacks the explicit relations with predefined, consistent, descriptive and shared terms expressed as a controlled vocabulary(Mathes 2004). A distinctive feature of folksonomies is thus the possibility of adding own keywords unsupervised and viewing the keywords added by other users unsupervised.By using other popular keywords, your own tags are rendered visible and you have the opportunity of following the tags of others with the same popular keywords.

In this sense, folksonomy may be regarded as a popular shift away from the hierarchical, controlled and authoritarian ways of categorizing information, where the user chooses not to learn a hierarchy but instead releases his or her own personal association chain in a common social forum (Quintarelli 2005).This is based on the notion that this forum makes it easier to find the relevant information, provides more transparency in respect of other people's knowledge, contains more representative, rational knowledge due to the number of participants, and that the knowledge in this system is more up-to-date, because it is more dynamic/social and is created through widespread collaboration over the Internet (Surowiecki 2005)(Sunstein 2006).

As mentioned, the principle in folksonomies is that the user adds information on his or her own and other people's information sources.This type of data about data is called metadata, and the process of adding metadata is called tagging.The user thus tags information with metadata and can subsequently use the generated metadata to organize the data.

Metadata are the pillars of all taxonomies and categorizations.Metadata are the guiding principle of many content management systems in which the pages are compiled and placed according to the dimensions of the metadata.In connection with searches, metadata often also play an important part as a categorization tool in combination with free-text search.

There are three types of metadata:inherent, administrative and descriptive metadata.The inherent or structural metadata are information used to describe the nature of information.This may include a file type or size etc.The administrative metadata are information describing the handling of the file.This may be a publication date, date of latest change, status etc.Finally, there are the descriptive metadata, which is information on content:subjects, context, related information, recipient etc.On the Internet, these three types of metadata are often used in combination for organizing content, where the descriptive data organize the general structure (typically a classification of subjects and sub-subjects), while the administrative and, to some extent, the inherent metadata are used to organize list material and an overview (typically the most recently added content within a subject, other content by the same author etc.).

Descriptive metadata are very different from the other two types, as they relate to the content of the information, i.e. to the meaning that may be deduced, and for this reason, they are very hard to deduce automatically.This is a problem, since the descriptive metadata for most people are the most natural entry point to large volumes of information, where information about subject, context and content is vital.At the same time, the organization of the descriptive metadata forms the basis of many of the taxonomies and classification principles that we use to arrange information.Consider the subject index in a library or the sections in a newspaper as classic examples of descriptive metadata categorizations. Basically, there are three strategies for creating descriptive metadata and thus organizing content:hierarchical, polyhierarchical or horizontal(Quintarelli 2005).

In its pure form, taxonomy is vertically constructed in a hierarchical structure, in which descriptive data are assigned on the basis of predefined rules.Different types of information fit into different places in the often very comprehensive hierarchy of classes such as supercategories or subcategories or synonyms.Everything has its place, and if you know the system, it is fairly easy to retrieve the information.Because integrity and consistency are its strength, this strategy requires a comprehensive overview, consistency and a methodical knowledge requiring professional metadata administrators (normally librarians), who will assign the information to its rightful place in the system.With a polyhierarchical strategy, it is possible to go across the hierarchical structures in a kind of faceted classification, where the same information unit may be assigned different facets which may then be used for searching.A facet is a category(Ranganathan 1962; Ranganathan 1964; Taylor 1992; Alvesson and Karreman 2001).The notion behind faceted classification is to create a higher degree of multidimensionality in the metadata(Wynar 2002).For each search, the different facets are filtered and selected until the user reaches a manageable and limited set of facets meeting the entered search criteria.In stead of navigating through a predefined hierarchy, the organization is determined on a current basis by the user's searches and thus by the dynamic polyhierarchies.Thus, the search is dynamic, but limited, since the facets have been defined centrally and as a final repertoire updated through guided navigation (Vickery 1966)in contrast to folksonomy, where the repertoire is infinite and decentralized.The difference between the faceted classification and folksonomy is that in the former there are many categories of links, while in folksonomy every link is a category.While folksonomy can clearly be distinguished from the two other principles in all dimensions as a radically different and innovative way of creating metadata, because it is the user who creates the structure, taxonomy is centralized, hierarchical and structuralist, and faceted navigation is polyhierarchical structuring with a predefined set of facets.In both cases, the user's role is more limited and structured than in folksonomy.

Literature:

Since this is a new phenomenon, the scientific literature on folksonomies on the Internet is limited to less than 10-15 articles, which have been produced within the past two to three years.Please refer to Macgregor & McCulloch for an overview(Macgregor and McCulloch 2006).

It has only recently become possible to share bookmarks with the creation and dissemination of social bookmarking software programs.Literature on this subject falls into three general groups or schools.Firstly, there is the rather narrow and apolitical information science approach which is based on library science.Here, the phenomenon is analyzed on the basis of the theories, methods and standards that apply within the library field.The central problem is the categorization of information and the relation between the new folksonomy and the classic taxonomy.The ideal-typical representatives of this perspective are Mathes and Quintarelli, who both conceptualize folksonomy as opposed to taxonomy. Secondly, there is the much wider media studies approach.Here, folksonomy is placed in a larger social policy framework and connected with the Internet's paradigmatic changing of the communication and power structure in society.The ideal-typical representative of this perspective is Clay Shirky (Shirky 2004; Shirky 2005)from Colombia University.Thirdly, we have the scientifically and mathematically inspired computer science approach.Here, the focus has been on how to analyze folksonomies on the basis of a number of statistical and mathematical models.The ideal-typical representatives of this perspective are Golder and Huberman(Golder and Huberman 2005).The theoretical perspective is a mathematical modeling of folksonomy in order to obtain a number of patterns and mathematical rules by analysis.Folksonomy is essentially an illustrative example of how many network phenomena may be described and modeled.The most significant contribution is the demonstration of a mathematical power law in del.icio.us (Shen and Wu 2005)as in other complex systems as well as the existence of stability in the tags used for a given website over time.