An Annotated Bibliography

DATA MINING

An annotated bibliography

Thesis statement. Data mining means searching for certain patterns within large sets of data, which creates a lot of possibilities for business managers and decision makers. By analyzing those patterns, better business decisions can be made in order to enable businesses to achieve greater financial and entrepreneurial success.

Keywords: data mining, knowledge discovery in databases (KDD), data mining technologies (DMT), decision support systems (DSS).

Academy for Computing Machinery. (2007). SIGKDD. ACM Special Interest Group on Knowledge Discovery and Data Mining. Retrieved February 12, 2007 from [Authoritative website].

This website belongs to one of the special interests groups (SIG) of ACM, one of the first academic societies that promoted computational research. This website is rather simple looking. It has all necessary information about this group, and lists the people involved in it, including their affiliations, most of them with academic organizations. The links all work. There is also a newsletter available from this website which contains the latest news in the field of data mining and knowledge discovery. Overall, this is a very useful website.

Bramer, M.A. (1999). Knowledge discovery and data mining.London, UK: The Institution of Electrical Engineers. [Book]

This book addresses issues of data mining within different research subjects such as chemistry, medical diagnosis, and electric load prediction. Part 1 examines a broad spectrum of technical issues in knowledge discovery and data mining; part 2 contains articles on the practical applications of knowledge discovery and data mining. These practical applications are within such fields as health-information analysis, meteorology, chemistry and the electricity-supply industry. This book can be helpful to researchers within the fields it discusses, as well as knowledge professionals in general. However, the editor emphasizes that the knowledge of a discipline of application is required in order to conduct a successful data mining experiment.

Ganguly, A.R, Gupta, A., & Khan, S. (2006). Data mining and decision support for business and science. In Encyclopedia of data warehousing and mining (Vol.I, pp.233-238). Hershey, PA ;London, UK :Idea group reference. [Reference book]

This article introduces the field of data mining for business and science. The authors are affiliated with academic and research institutions in the United States, such as the University of Arizona and the University of South Florida, which leads the reader to believe that they have extensive knowledge of data mining. The article begins with an introduction, where readers can get acquainted with the subject of data mining and the technologies and applications that are involved in data processing. The article’s main idea is introduced in Main thrust (it is a typical entry construction in this particular reference source) where the authors discuss scientific and business applications, present an overview of emerging technologies and previous approaches, and discuss common features of data mining for science and technology. This source contains a lot of references to scientific and technical literature, including journals, books and authoritative web sites (NASA’s, for example.) The article also has an extensive list of references at the end. The intended audience is academics and business professionals who are interested in data mining applications and want to find quick information that will direct them to further resources on this subject. The article contains two tables that present analytical information technologies (data mining and decision support systems) and examples of their applications.

Guernsey, L. (2003, October 16). Digging for nuggets of wisdom. The New York Times, p. G1. [Popular print article].

Written by a journalist, this article is very informative and explains data mining applications in various fields of science. Emphasizing that the amount of information available on the web and in print is overwhelming and difficult to analyze, the author turns to the practitioners who have already figured out how to search through vast amounts of data. For example, Dr. Liebman uses a statistical software called SPSS in order to do text mining, which is derived from the idea of data mining. The main idea of the article is that it is possible to deal with the vast amounts of information that are out there as long as one approaches it intelligently. The language of the article is popular, easily understood by general readers. The New York Times is one of the most widely read newspapers in the country, so the article can be of use to readers who have never heard of data or text mining. Those readers should find the subject practical and interesting, if not fascinating.

Hu, J., & Zhong, N. (2006). Organizing multiple data sources for developing intelligent e-business portals. Data Mining and Knowledge Discovery, 12(2-3), 127-150.[Print scholarly article].

Both authors are affiliated with Maebashi Institute of Technology in Japan. This article addresses applications of data mining in business. It is organized into several parts, beginning with the introduction of its main subject – creating and managing e-portals that serve as gateways to personalized information. The authors present a three-tier work-flow model. Those levels are data-flow, mining-flow, and knowledge-flow. All three of them contribute to the model of a multi-layered grid, which is essential for creating an e-portal. The article begins with a literature review and previous experience, and then proceeds to a discussion of the main subject with graphs, tables and computations. It is a scholarly article, written for professionals in data mining and knowledge discovery, so its language is fairly technical. However, the average person can make sense of the concept by reading the introduction. It is a useful article for those involved in scientific research of data mining for business applications.

Kohavi, R. & Provost, F. (2001). Applications of data mining to electronic commerce. Data Mining and Knowledge Discovery, 5, 5-10. [Secondary source].

A rather critical analysis of the current situation in data mining in e-commerce. The authors talk a lot about problems and issues in this particular field, paying especial attention to its utilization. “High potential reward, accompanied by high risk” seems to be the main theme of this article. Written in clear, understandable language, it could be useful to business managers and information specialists with broad interests. It is also a literature review that tries to summarize what has been written, and what current problems are. (One theme that seems to be present in every reviewed paper is problem-specific knowledge and how to incorporate it into the knowledge discovery process.) At the same time, it is a philosophical essay rather than simply a technical article. There are no formulas, or graphs, or charts, just analysis and critical opinions – this is what differentiates this article from the majority of others written by scientists. At the end, the authors express cautious optimism about future studies of data mining in e-commerce, pointing out that there are a lot of issues to be solved. This is a secondary source because it addresses previous research instead of proposing a new and original methods or ideas.

Kutz, G.D. (2003). Data mining: Results and challenges for government program audits and investigations. Testimony before the Subcommittee on Technology, Information Policy, Intergovernmental Relations and the Census, Committee on Government Reform, House of Representatives.Washington, D.C. : United States General Accounting Office. Retrieved January 30, 2007, from [Government document]

This document covers the issues of internal control within certain government agencies, such as the Department of Defense (DOD). The use of government credit cards was traced using data mining techniques in order to scrutinize the vendors and the appropriateness of the expenses by government employees. This process helped to uncover many abuses and waste of government funds, and helped to improve control over travel spending. Even though this document is written in somewhat bureaucratic language, it is easy to understand for the student or lay person. A summary at the beginning of the document and conclusions at the end help readers to get a clear picture of the problem and its solution. A list of related publications by the General Accounting Office (GAO) is available at the end of the paper. This source is very helpful for those who want to learn about the practical applications of data mining.

Lee, J.H., & Park, S.C. (2003). Agent and data mining based decision support system and its adaptation to a new customer-centric electronic commerce. [Electronic version]. Expert Systems with Applications, 25(4), 619-635. [Online journal]

Electronic commerce (e-commerce, EC) is a rapidly developing means of conducting business. In order to be competitive, manufacturing companies use the Internet not only for promotion, but also to buy and sell. This article is devoted to a customer-centric e-commerce model using a concept called process transparency. “Transparency is a knowledge-based concept that implies participants have intelligence about market around them” according to the authors. It’s crucial for manufacturers to learn about their potential customers’ buying behaviors and preferences in order to market their products. The data mining process was successfully integrated into the proposed EC model for the generation of an optimal sampling method.

Mukherjee, S., Chen, Z., & Gangopadhyay, A. (2006). A Privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms. [Electronic version]. VLDB Journal, 15, 293-315. [Primary source].

This article is a good example of a primary source. It is written by researchers from the University of Maryland, Baltimore. The authors propose their own algorithm for the improvement of data mining methods. It is an important issue, especially when dealing with large amounts of data. The problem is that often data is stored in one place and analyzed in another, and then a third party is responsible for analyzing this data. This means that data should be stripped of personal characteristics in order to preserve the privacy of customers’ information. The authors of this article come up with their original idea using already existing Fourier-related transforms. The article is intended for professional researchers in the field of data mining, hence the language of the article is technical and specifically oriented to people working within field of data mining. There are a lot of charts and mathematical algorithms that prove and illustrate the idea of the proposed method. Published in an academic journal, this article is a good example of a primary source in sciences.

SPSS, Inc. (2007). SPSS. Data mining. February 11, 2007 from [Authoritative website].

This is information about data mining provided by a company that introduces

pioneering software for statistical analysis (SPSS stands for Statistical Package for the Social Sciences). Now SPSS is considered as one of the leading companies in data mining research. Their product, Clementine, was one of the first data mining tools back in 1994. This web site is well organized, and the address and contact information are clearly shown. The list of related business problems that can be addressed by SPSS products makes the search clear and straightforward. The links all work and the only advertisement present is within the links to the company’s products.