Textbook
Software
Methods of Instruction
Evaluation
Student Responsibilities
Attendance Policy
Academic Dishonesty
ADAAccommodation Notice
Instructor:Dr. Vladimir Zanev
Office Location/Phone Number: CCT 442/ 569-3056
Office Hours: Mon-Thu, 3:00 p.m. - 4:00 p.m., Fri: 10:00 a.m.-11:00 a.m.
E-mail: WebCT class e-mail or
Website:
This course is offered as an online class in the Spring semester 2008. Class meets 100% online at
( )
Online Interface:
WebCT Vista will be the primary method of online interaction in this course. Course materials (course outline, schedule, assignments, projects, course notes, datasets, discussions, resources, and grading will be available through WebCT Vista. You can access WebCT at Vista:
or
At this page, click on the "Log-in" link to activate the WebCT Vista logon dialog box, which will ask for your WebCT Vista username and password. Your WebCT Vista username and password are:
Username: lastname_firstname
Password: DDMMYY
where DDMMYY is the student birth date. (Example - Birthday of Oct. 25, 1978 is 251078)
If you try the above and WebCT Vista will not let you in, please use the "Comments/Problems" link at the bottom of the WebCT Vista home page to request help. If you are still having problems gaining access a day or so after the class begins, please e-mail me. Once you have clicked on the course's name and accessed the course itself, you will find a home page with links to other sections and tools, and a menu on the left-hand side. This course homepage and the left-hand menu will give you access to all course materials.
Course Description and ObjectivesCourse Description:
Prerequisite - CPSC 5115. Algorithm Analysis and Design, CPSC 5138 Advanced DBMS.
These prerequisites will not be enforced. Consider them as a suggested background, which you should have to pass this course in a breeze. It is not required that you must have taken the courses above. However, completing the following courses and/or having a working knowledge in the respective areas will greatly help you to succeed in this class.
This course is an introduction to data mining. Recent advances in database technology along with the phenomenal growth of the Internet have resulted in an explosion of data collected, stored, and disseminated by various organizations. Because of its massive size, it is difficult for analysts to sift through the data even though it may contain useful information. Data mining holds great promise to address this problem by providing efficient techniques to uncover useful information hidden in the large data repositories. Data mining is a modern area of computer science concerned with automated or convenient extraction of patterns that represents previously unknown knowledge implicitly stored in large databases, data warehouses, and other massive information repositories. In this course we will approach the data mining problem from the position of database design and programming. We will discuss suitable data models, data preparation, and finally - different methods and algorithms one can implement to discover new knowledge from raw data. The key objectives of this course are two-fold: (1) to teach the fundamental concepts of data mining and (2) to provide extensive hands-on experience in applying the concepts to real-world applications. The core topics to be covered in this course include:
- data and exploring/preprocessing data
- classification data mining algorithms and methods
- association analysis data mining algorithms and methods
- cluster data mining algorithms and methods
- SQL Server 2005 data mining environment
Expected Outcomes
At the completion of this course, students will have an understanding and knowledge of:
- What is data mining?
- Data and exploring data: sampling, data cleaning, feature selection, and dimensionality reduction
- Classification: basic concepts, decision trees, model evaluation
- Classification: naive Bayes, time series, neural networks,
- Association analysis: basic concepts and algorithms, Apriori algorithm,
- Cluster analysis: basic concepts and algorithms, partitional and hierarchical clustering methods,
- SQL Server 2005 environment, tools, and algorithms
- How to use SQL Server 2005 for data mining
Textbook
Textbooks - required
/ Title: Introduction to Data Mining
Authors: Pang-Ning Tan, Michael Steinbach and Vipin Kumar
Edition: 2006
Publisher: Addison-Wesley
ISBN: 0-321-32136-7
/ Title: Data Mining with SQL Server 2005
Authors: ZhaoHui Tang
Edition: 2005
Publisher: Wiley Publishing Inc.
ISBN: 0-471-46261-6
Software
Software
To complete all lessons, the data mining project, assignments, discussions, and exams, you will need a computer with:
- Windows 2000/XP, Internet Explorer, PowerPoint, and Word
- SQL Server 2005 (see Resources Web page for details how to obtain SQL Server 2005).
- Access to WebCT Vista at CSU
Methods of Instruction
Methods of Instruction:
- Online Study
- Assignments
- Data Mining Project
- Discussions
- Midterm Exam
- Final Exam
Online Study
Each student is expected to complete all readings from the textbooks following the course schedule. Make your own notes. You can use your own notes during the exams.
Assignments
Four to six assignments will be given that build upon the concepts covered in the textbooks and have to be completed on your own time. Assignment deadlines are not flexible for any reason. Late assignments are not accepted for credit. Assignment submissions are usually via WebCT Vista email.
Data Mining Project
The purpose of this project is to give you experience with a Data Mining implementation. The data mining project is an opportunity to apply on real data the concepts, techniques, and tools studied in class. This project is a data mining project developed individually. The objective is to implement and run a data mining algorithm analyzing real data sets. You can use SQL Server 2005 as implementation tool or another data mining software (see the Resources Web page).
Discussions
A special Discussion Board with three discussions will be opened in the course WebCT site. Online discussions will be based on the discussion questions posted by the instructor (threaded discussions). Your participation in the discussions will be evaluated through your contributions (questions, answers, remarks, and essays) in the three discussions. For details see the Discussion area on the WebCT class Web site.
Exams
Your performance in this class will be measured by two online exams - Midterm and Final Exam. No make up tests will be given unless an exam was missed due to a documented emergency. The exams will be closed textbook but you can use your own notes. Questions on the exams may include the following:
- problem solving
- essay questions
- multiple choice answer selection
- filling in the blanks
Evaluation
Evaluation
The final grade will be obtained from the following:
Discussions / 15%
Project / 20%
Midterm Exam / 20%
Final Exam / 25%
The letter grade will be assigned as follows:
Grade / PointsA / 90-100
B / 80-89
C / 70-79
D / 60-69
F / 0 -59
Grading Example:
Assignments / 85, 90, 80, 70Discussions / 85, 90, 90
Project / 95
Midterm Exam / 80
Final Exam / 94
G = (85+90+80+70)/4*0.2 + (85+90+90+)/3*0.15 + 95*0.2 + 80*0.2+94*0.25 = 88.69
It is a B.
Student Responsibilities
- Each student is responsible to manage his/her time and maintain the discipline required to meet the course requirements.
- Each student is responsible to read from the textbooks all topics covered in the class
- Each student is responsible to read from the textbook all chapter topics, bibliographic notes, and summaries
- Each student is responsible to execute the data mining project and all discussions
- Each student is responsible to adhere to all course deadlines
- Each student is responsible to take the exams as they are scheduled in the course schedule.
“I didn’t know” is no an acceptable excuse for failing to meet the course requirements. Students who fail to meet their responsibilities do so at their own risk.
Attendance Policy
Attendance at all classes and other activities (lecture periods, laboratory sessions, tests, examinations, or other schedule meetings is required of every student at ColumbusStateUniversity. The attendance record begins with the first meeting of the class, and one who registers late is responsible for class work missed. Student should note that the Computer Science Faculty does not initiate "class drops". A student wishing to drop should complete the official procedure before the deadline. Those who violate the attendance policy after that deadline may receive an "F" at the discretion of the instructor. After the midpoint of the quarter, no drop slip will be signed by the Dean unless extreme circumstances can be proved.
Academic Dishonesty: Academic dishonesty includes, but is not limited to, activities such as cheating and plagiarism ( Misconduct). It is a basis for disciplinary action. Any work turned in for individual credit must be entirely the work of the student submitting the work. All work must be your own. You may share ideas but submitting identical assignments (for example) will be considered cheating. You may discuss the material in the course and help one another with debugging; however, any work you hand in for a grade must be your own. A simple way to avoid inadvertent plagiarism is to talk about the assignments, but don't read each other's work or write solutions together unless otherwise directed. For your own protection, keep scratch paper and old versions of assignments to establish ownership, until after the assignment has been graded and returned to you. If you have any questions about this, please see me immediately. For assignments, access to notes, the course textbooks, books and other publications is allowed. All work that is not your own, MUST be properly cited. This includes any material found on the Internet. Stealing or giving or receiving any code, diagrams, drawings, text or designs from another person (CSU or non-CSU, including the Internet) is not allowed. Having access to another person’s work on the computer system or giving access to your work to another person is not allowed. It is your responsibility to keep your work confidential.
No cheating in any form will be tolerated. Penalties for academic dishonesty may include:
- a zero grade on the assignment or exam/quiz
- a failing grade for the course
- suspension from the Computer Science program
- dismissal from the Computer Science program.
All instances of cheating will be documented in writing with a copy placed in the Department’s files. Students will be expected to discuss the academic misconduct with the faculty members and the chair person. For more details see the Faculty Handbook: the Student Handbook:
ADA Accommodation Notice
If you have a documented disability as described by the Rehabilitation Act of 1973 (P.L. 933-112 Section 504) and Americans with Disabilities Act (ADA) and would like to request academic and/or physical accommodations please the Office of Disability Services in the Center for Academic Support and Student Retention, Tucker Hall 100 or at (706) 568-2330, as soon as possible. Course requirements will not be waived but reasonable accommodations may be provided as appropriate.