CSCE4200 Spring 2018

Information Retrievaland Web Search

  • Days & Times: Monday, Wednesday 4:00PM - 5:20PM
  • Room: NTDP B190
  • Instructor: Dr. Wei Jin
  • Office:NTDP F292
  • Phone No. : (940) 369-7172
  • Email:
  • Office Hours:Monday, Wednesday 3:00PM - 4:00PM

TA:Shaik, Arshad

Email:

TA office hours:Tues and Thurs 2-3:30 PM

Prerequisite: Programming with C, C++, or Java. Introductory courses on data structures and algorithms, linear algebra and probability theory.

Textbook

Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze, Cambridge University Press

Note: an online version of this book is available at

Reference Book:

Modern Information Retrieval, by Ricardo Baeza-Yates, Berthier Ribeiro-Neto

Course Description:

This course will introduce students to text-based information retrieval (IR) techniques, i.e. search engines (e.g., Google, Yahoo, etc.). Various IR models such as the Boolean model, vector space model, and probabilistic models will be studied. Efficient indexing techniques for both general document collections and specialized collections (strings, digital library) will be examined.The course will also cover web search engines techniques, such ashyperlink analysis (e.g., PageRank (used in Google
search)), Web technologies and representations, and query languages, with a focus on
techniques that can be used to access, retrieve, organize, and present
information. Students will work on programming projects to gain hands-on experience in building an IR system.

Grading Policy

The letter grade will be assigned based on the following scale:

Grade / 4200
A / 85 and Above
B / [75-85)
C / [65-75)
D / [55-65)
F / Below 55

CSCE4200

Items / Weight
Homeworks / 10 %
Programming Assignment 1 / 10 %
Programming Assignment 2 / 15 %
Programming Assignment 3 / 15 %
PaperReading &Presentation / 10 %
Midterm Exam / 20 %
Final Exam / 20 %

Course Details

  1. You are expected to attend all lectures and to complete all readings on time. In case a student is absent from a lecture due to unavoidable circumstances, the student is still responsible for the material covered in the class. Furthermore, the student is expected to find out about in-class announcements from their classmates/instructor.
  2. There will be homework assignments based on the required readings and will be of a short answer question or involve some computation.
  3. There will be 3 programming assignments in this course. These three assignments are three phases of designing a simple search engine (one whole project).
  4. All students registered for the course will be required to sign up for Blackboard. Class notes will be posted there prior to class. Projects and announcements will also be posted on this site.

Technical PaperReading &Presentation

Students (Individual or a group of two students) are required to read a recent research paper in the area of information retrieval and Web search. The paper list will be given later. You need to prepare a presentation and talk about the main content of the technical paper.

Academic Policies

No cheating or plagiarism is allowed in assignments, projects, and exams. Academic dishonesty will result in a final course grade of “F”. "Sharing/reuse" of solutions to assignment problems and projects is strictly prohibited. All work turned in with your name on it must be your own work.

TENTATIVE LIST OF TOPICS

01 / Boolean retrieval
02 / The term vocabulary & postings lists
03 / Dictionaries and tolerant retrieval
04 / Index construction
05 / Scoring, term weighting & the vector space model
06 / Computing scores in a complete search system
07 / Evaluation in information retrieval
08 / Relevance feedback & query expansion
09 / Probabilistic information retrieval
10 / Language models for information retrieval
11 / Web search basics
12 / Web search and retrieval
13 / Web link analysis