Cascading Citation Indexing in Action
T.Folias1, D. Dervos2, G.Evangelidis1, N. Samaras1
1 Dpt. of Applied Informatics, University of Macedonia, Thessaloniki, Greece
Tel: +302310891844, Fax: +30 2310891800, E-mail:{folias,samaras,gevan}@uom.gr
2 Dpt. of Information Technology, Alexander Technology Educational Institute (ATEI), Thessaloniki, Greece
Tel: +30 2310791295, Fax: +30 2310791290, E-mail:
Abstract
In this paper we present the cascading citation indexing framework algorithm (c2IF algorithm, for short) and a set of experimental results obtained by applying our algorithm on real data. The cascading citation indexing frameworkwas first introduced in (Dervos, and Kalkanis, 2005) and further elaborated on in (Dervos, Samaras, Evangelidis, and Folias, 2006). Given a collection of articles and their citation graph, our algorithm considers citations at the article level. Each one article is uniquely identified by means of the Digital Object Identifier (DOI). In addition to the citations directly made to a given article, citation paths that target each one citing article are also considered. The c2IF algorithm utilizes a relational database management system (RDBMS) both for the representation of the citation graph and for the storage of the results (citation paths). For a given positive integer value k, the algorithm computes for each one DOI all the 1-gen, 2-gen, …, k-gen citations and identifies self-citations based on simple author name comparison (in the absence of a Universal Author Identifier System). Our algorithm identifies cycles detected to be present in the citations graph and finally produces a citation standings type tabular output with one row per given DOI. To test our approach, we utilize six years of citations data (1999-2005) from the ISI Science Citation Index Expanded (ISI SCIE) made available from Thomson Scientific ( along the lines of the Cascading Citations Analysis Project (C-CAP). The dataset registers 7,364,211 research article records involving 165,822,522 citation instances. Following the data cleaning/preparation stage, 35,503,513 citation instances have been identified to satisfy the requirement that the cited articles are present in the dataset considered. Here we present the c2IF algorithm we developed to calculate all the direct and indirect citations for the above dataset, taking into consideration citation results up to level 3, i.e. up to 3-gen (self-)citations. For the provided dataset we identified 291,238,1962-gen citations and 1,164,952,784 3-gen citations as well as interesting statistical results regarding highly cited papers.
Keywords: citation analysis, citations graph, impact factor, research evaluation
REFERENCES
Dervos, D.A., and Kalkanis, T. (2005). cc-IFF: A Cascading Citations Impact Factor Framework for the Automatic Ranking of Research Publications. Proceedings of the 3rd IEEE International Workshop on Intelligent Data Acquisition and Advanced Computer Systems: Technology and Applications (IDAACS), p. 668-673, Sofia, Bulgaria, 5-7 September, 2005. Postprint version available from DLIST. Retrieved on June 19, 2006 from:
Dervos, D., Samaras, N., Evangelidis, G., and Folias, T. (2006). A New Framework for the Citation Indexing Paradigm, Proceedings, 2006 Annual Meeting of the American Society of Information Science and Technology (ASIS&T). Retrieved on February 23, 2007 from: