1 invest 12 / 20 Last printed 10/09/2007 12:47:00

The most highly cited Library and Information Science articles: Interdisciplinarity, first authors and citation patterns

Jonathan M. Levitt, Mike Thelwall

School of Computing and Information Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1SB, UK. E-mail:

Tel: +44 1902 321470 Fax: +44 1902 321478

Highly cited articles are interesting because of the potential association between high citation counts and high quality research. This study investigates the 82 most highly cited Information Science and Library Science’ (IS&LS) articles (the top 0.1%) in the Web of Science from the perspectives of disciplinarity, annual citation patterns, and first author citation profiles. First, the relative frequency of these 82 articles was much lower for articles solely in IS&LS than for those in IS&LS and at least one other subject, suggesting that that the promotion of interdisciplinary research in IS&LS may be conducive to improving research quality. Second, two thirds of the first authors had an h-index in IS&LS of less than eight, show that much significant research is produced by researchers without a high overall IS&LS research productivity. Third, there is a moderate correlation (0.46) between citation ranking and the number of years between peak year and year of publication. This indicates that high quality ideas and methods in IS&LS often are deployed many years after being published.

Introduction

This study identifies the most highly cited 0.01% of the 82,409 articles in the Web of Science (WoS) subject category of ‘Information Science & Library Science’ (IS&LS) published prior to 2007. These 82 articles, listed in Table 6 of the Appendix, are used to investigate characteristics of both the highly cited articles and their first authors. The rationale for investigating highly cited articles is that high citation is associated with research quality and consequently, findings on highly cited articles could increase understanding of the quality of research.

This study examines disciplinarity, first authors and citation patterns. One investigation of disciplinarity focuses on the link between multi-disciplinarity and high citation. A reason for examining this topic is that if multi-disciplinary research is cited on average significantly more often than research in a single discipline, it may be worthwhile encouraging multi-disciplinary research. The investigation of first authors examines the citation profiles of the first authors of the highest cited articles. Such profiles provide information that can be used to help identify and target resources to those who are more likely to produce highly cited research. The investigation of citation patterns examines the prevalence of late citation amongst the 82 articles. If citation is an indication of research influence, then citation patterns indicate how this influence has changed over time.

Previous research has addressed the issue of interdisciplinarity in IS&LS research. Rice & Crawford (1992) identified some possible areas of convergence between the fields of communication and library and information science. Meyer & Spencer (1996) found that library science articles were cited in computer science, medicine, psychology, the social sciences, and general sciences. Tang (2004a) found that Information and Library Science “attracts a significant wide spectrum of disciplines from the domains of science, social science, and the humanities, and that the kinds of disciplines interested in the field vary by year.” Other research on disciplinarity and Information and Library Science include Carlin (2003) and Tang (2004b). Whilst these articles point to considerable disciplinary overlaps between Information and Library Science and other disciplines, they do not examine this disciplinary overlap for highly cited articles. This current study quantifies the disciplinary overlap for a collection of highly cited articles, and compares these overlaps with the complete set of articles classified as IS&LS.

In terms of citation profiles, Cronin & Meho (2007) examined the patterns of creative output of renowned information scientists and Cronin & Meho (2006) and Oppenheim (2007) evaluated the h-indexes of influential information scientists. The h-index is defined as the largest number (h) of documents that are cited h or more times (Hirsch, 2005). Several studies, including Hirsch (2005), Batista, Campiteli, & Kinouchi (2006), Braun, Glanzel & Schubert (2006) and van Raan (2006), also use the h-index in various informetric investigations. Whilst previous research has focused on notable researchers in information science it has not examined the citation profiles of the first authors of the most highly cited IS&LS articles. The current study examines this aspect with a view to identifying whether all the first authors have high h-indexes in IS&LS.

Patterns of annual citation have also been previously researched. Aversa (1985) examined the patterns of citation of 400 papers published in 1972 that were cited 30 or more times between 1972 and 1977 and found that an early rise in being cited is associated with a more rapid decline in citation and a lower citation total, whereas a delayed rise in citation is associated with a less rapid decline in citation and a higher citation total. Cano & Lind (1991) compared the annual citation patterns of ten highly cited papers with ten low cited papers in medicine and biochemistry. They found two types of citation patterns, Types A and B. For articles of Type A the number of citations in the first six years, as a percentage of total citations, was typically 75% whereas for Type B the figure was typically 33%. All the low cited papers and some of the highly cited papers were of Type A, whereas only highly cited papers were of Type B. Aksnes (2003) examined the patterns of annual citation of 137 highly cited papers in Norwegian science published between 1981 and 1989 and found that 33% of the papers in Physical, Chemical and Earth Sciences had the citation pattern of ‘Early rise & rapid decline’, whereas none of the papers in Biology and Environmental Sciences had this citation pattern. Levitt & Thelwall (2007) found that the levels of citation of 11 of the 20 most highly cited documents in IS&LS rose sharply between 2001 and 2005.

Other investigations of citation patterns have also produced interesting results. Garfield (1985a) presents a graph that compares patterns of three highly cited papers and Garfield (1985b) presents a graph that compares patterns of four highly cited papers. Garfield’s graphs contain at least two different citation patterns: (a) Rising to a peak and then a steady decline, and (b) Continuing increase in citation level. Glanzel, Schlemmer & Thijs (2003) and van Raan (2004) established the frequencies of some unusual citation patterns of highly cited articles that they termed respectively ‘delayed recognition’ and ‘sleeping beauties’. Glanzel, Schlemmer and Thijs found that 0.3% of papers in the 1980 Science Citation Index that were cited more frequently than 15 times in total were not cited between 1980 and 1984; van Raan found that only 41 of the articles from the Institute for Scientific Information Citation Indexes published in 1988 received at most ten citations during the first ten years after publication and subsequently between 21 and 30 citations in the next four years. Other studies of delayed recognition include those of Garfield (1980) and Glanzel and Garfield (2005) and other studies of sleeping beauties include those of van Dalen and Henkens (2005) and Burrell (2005). Garfield (1975) attributed the concept of ‘obliteration phenomenon’ to Merton (1968) and used this phase to describe basic findings that have become so widely used that they are used without citing their source. Garfield (1993) used the phrase ‘obliteration by incorporation’, in which “discoveries or ideas become so fully incorporated into canonical knowledge that their source is no longer cited or even alluded to.” Whilst previous research has investigated lateness of citation in different subject areas, none has examined late citation amongst the most highly cited articles in IS&LS. The current study examines this aspect of IS&LS.

In summary, this study addresses the following issues.

1.  How does the level of disciplinarity of the most highly cited articles in IS&LS compare with the level of all the articles in IS&LS and to what extent are the disciplinary frequencies of the most highly cited articles mirrored in the frequencies of all the articles in IS&LS?

2.  The distribution of the h-indexes of the first authors of the most highly cited articles is examined with a view to establishing whether all first authors have high h-indexes.

3.  Defines an author’s h-index in IS&LS as the h-index of all documents published by the author and classified as IS&LS. How then do the citation profiles of first authors with high h-indexes in IS&LS differ from the citation profiles of first authors with low h-indexes in IS&LS?

4.  How widespread is late citation amongst the most highly cited articles?

Method and Data

This investigation defines ‘Information Science & Library Science’ as the set of all documents published prior to 2006 held in the three WoS online databases (Science Citation Index, Social Science Citation Index, and Arts and Humanities Citation Index) within the IS&LS subject category. The earliest article in IS&LS was published in 1946, which is nearly thirty years before the publication of the first Journal Citation Report (JCR) in 1975.

The wide usage of JCRs in informetric investigations is reflected by the fact that currently 52 documents published in Scientometrics contain JCR* or Journal Citation Report* in the title, abstract or keywords. With the addition of the ‘Refine your results’ and ‘Analyse’ facilities to WoS online in 2006, users can now obtain the set of all journals in a WoS subject category. This method was used here to delineate IS&LS instead of using the JCRs as more comprehensive because: (a) The IS&LS subject category contains over 16,000 articles published prior to the year of the earliest JCR (1975), and (b) For the period 2000 to 2006 the set of journals obtained by the first method contains all the journals obtained by the JCRs and 10 additional titles (see Table 7 of the Appendix).

Several previous studies have also delineated subjected categories on the basis of the Web of Science database as opposed to JCRs. For example: (a) Aksnes (2003) used WoS subject categories to investigate high citation in Norwegian science, (b) Adams (2005) used subject categories on the WoS when investigation the correlation between early and late citation ranking in UK science, (c) Zitt, Ramanana-Rahary and Bassecoulard (2005) used the entire SCI database for 1998 in their investigation of field normalisation, and (d) Porter, Cohen, Roessner and Perreault (2007) in their investigation of researcher interdisciplinarity used WoS subject categories as ‘key units of analysis’. These investigations delineated subjects through direct access to the WoS database, whereas the current investigation delineates subjects through WoS online searches.

The Thompson Scientific (formerly: Institute for Scientific Information, ISI) citation database was used for the raw data with the Web of Science interface and the ISI Information Science & Library Science category. It is not possible to directly obtain a list of the most highly cited IS&LS articles directly from the WoS and so an indirect method had to be devised. First, the complete set of all ISI publications that contain one or more IS&LS article was obtained.

The WoS only allows a subject category to be specified within the results of searches and for the results to list no more than 100,000 items. Hence, in order to identify publications containing IS&LS articles, the complete set of over 36 million WoS documents must be systematically searched for articles, with the results of each search yielding no more than 100,000 articles. For each search the time period and letter(s) with which the publications begin were selected in such a way that the search results did not exceed 100,000 items. Examples of searches yielding less than 100,000 items include: (a) For the Science Citation Index, Year 2006 and SOURCE TITLE B*, (b) For the Social Science Citation Index Year 2006 and SOURCE TITLE A* OR B* OR C* OR D* OR E* OR F* OR G* OR H* OR I* OR J* OR K* OR L* OR M* OR N* OR O* OR P* OR Q* OR R* OR S* OR T* OR U* OR V* OR W* OR X* OR Y* OR Z*. The results were searched for journal titles in IS&LS not identified in previous searches; this was implemented using a Boolean ‘NOT’ in an Advanced Search to exclude journals found in previous searches. The total number of search sequences was over 650, and the sequences on average produced 55,000 documents. Separate tests were conducted to identify journal titles that began with a number. A total of 148 journals that contained one or more IS&LS documents were identified by this method. These 148 journals (presented in Table 8 of the Appendix) are all the past and present journals that contain articles classified as IS&LS.

Note that the number of search sequences to identify the journals that contained one or more IS&LS articles would have been considerably fewer than the search for journals that contain one or more IS&LS documents; however it was decided to conduct the more extensive search as the list of journals can be used for the investigation of all IS&LS documents. If the method had only been applied to identify all journals with IS&LS articles in the Social Science Citation Index only 39 search sequences would need to be conducted (the search terms used in these sequences are listed in Table 9 of the Appendix).This list of 148 journals was used to investigate the highest cited 82 articles in IS&LS and also to identify some general properties of IS&LS that are relevant to this investigation. The 148 publications using four search sequences that collectively cover every publication in IS&LS were processed and each category contained fewer than 100,000 documents. The reason for being able to use this small number of categories is the consequence of the facility of the Advanced Search to search a Boolean combination of up to fifty publications in a single search. The 82,409 articles published in IS&LS prior to 2007 were isolated using the Advanced Search facility on the Boolean sum of the four categories restricted to ‘articles’; these articles were ranked in decreasing order of citation using the ‘Times Cited’ menu item of the ‘Sort By:’ menu on the ‘Search Results’ page. Similar features were used to obtain general findings on other highly cited IS&LS documents.