Data warehousing (CS614)
Assignment # 3 (Non-Graded)
Total marks = 20
Deadline Date = 15-01-2014
Please carefully read the following instructions before attempting the assignment.
Rules for Marking
It should be clear that your assignment would not get any credit if:
§ The assignment is submitted after due date.
§ The submitted assignment does not open or file is corrupt.
§ The assignment is copied. Note that strict action would be taken if the submitted assignment is copied from any other student. Both students will be punished severely.
1) You should consult recommended books to clarify your concepts as handouts are not sufficient.
2) You are supposed to submit your assignment in .doc format. Any other formats like scan images, PDF, Zip, rar, bmp, docx etc will not be accepted
3) You are advised to upload your assignment at least two days before Due date.
4) This assignment file comprises of Four (4) pages.
5) This is non-graded assignment and marks will not be included in final Grading (CGPA)
6) You can also discuss the assignment in online skype session to be held on Wednesday, Jan 08, 2014.
Important Note:
Assignment comprises of 20 Marks. Note that no assignment will be accepted after due date via email in any case (whether it is the case of load shedding or emergency electric failure or internet malfunctioning etc.). Hence, refrain from uploading assignment in the last hour of the deadline, and try to upload Solutions at least 02 days before the deadline to avoid inconvenience later on.
For any query please contact:
Reference:
Please refer the case study given in assignment no. 1
Scenario:
Consider the following table structures:
ProjectBacklog
BackLogID / Integer / Primary KeyCreatedBy / String / NOT NULL
DateCreated / Date / Format: dd/mm/yyyy , NOT NULL
ProjectID / Date / Foreign Key
Story:
StoryID / Integer / Primary KeyStoryName / String / NOT NULL
StoryDescription / NOT NULL
StoryStatus / Valid values: C , IP, IQ, NO,D
BackLogID / Integer / Foreign Key (REFERS BackLogID in “ProjectBackLog” table)
Developer:
DeveloperID / Integer / Primary KeyDeveloperName / String / NOT NULL
DeveloperEmail / String / Format: , NOT NULL
DeveloperJoiningDate / Date / Format: dd/mm/yyyy, NOT NULL
DeveloperCurrentSalary / Integer
Now consider the following data in these tables:
ProjectBacklog
BackLogID / CreatedBy / DateCreated / ProjectIDBLG01 / Ali / 24/03/2013 / INV785
BLG113 / Asim / 07/04/2013 / MIS341
BLG124 / Waseem / 01/02/2013 / DSS478
Story:
SRCH98 / Search_Inv / User should be able to search any inventory item / C / BLG01
LGN12 / Login / User should be able to login / IP / BLG113
DRW17 / Draw_SH / User should be able to physically draw shape / C / BLG127
LGOUT145 / LogOut / User should be able to logout / IQ / BLG124
DISPM76 / Disp_Menu / A popup menu should be displayed on right click / NO / BLG134
SESSM23 / Session_Maintain / Session should be maintined properly / Deleted / BLG113
Developer:
DeveloperID / DeveloperName / DeveloperEmail / DeveloperJoiningDate / DeveloperCurrentSalary01 / Ali / / 25/12/2008 / 50000
02 / Waseem / / 07/18/2010 / 40000
03 / Raza / / 23/04/2011 / 30000
04 / Asim / / 13/05/2009 / 35000
05 / Naeem / / 21/03/2008 / 50000
You have studied four (4) data quality validation techniques i.e.
· Referential Integrity (RI)
· Attribute domain
· Using Data Quality Rules
· Data Histograming
Question no. 1:
Apply the Referential Integrity (RI) and Attribute Domain Validation techniques to identify the records that do not conform to these rules.
Question no. 2:
After analyzing the quality of data you have decided to publish a set of data quality rules so that they could be followed by the staff to measure/identify quality of data. On the basis of data provided above, derive some data quality rules (at least two) that you think, can guide them to find/identify incorrect data. For each rule you will provide historical data problem, Rule Type, Generic Rule Set and SQL Query to identify the non-conforming records for that rule.
Note: please refer to table 22.1 to solve question no. 2.