IT 241 Information Discovery Fall 2012 Exam 1 Page 1
Thursday, Sept. 20, 2012
Name ______
[18 pts]
1. Below is one of the visualization pipelines from the text.
a. Excluding the Internet, describe 3 sources where the raw data may reside? [4]
b. Describe three data transformations that are possibly used to generate a useful data table for input to a visualization tool. [6]
c. Give an example of a visual mapping (color, line, dot, position, etc.) you might use to represent each of the following attribute types in a scatterplot: [8]
Nominal data of 4 distinct categories:
Nominal data of 10 categories that can be ranked:
Ordinal data:
Relational data (connections among data points):
2. Why is Minard’s map of Napoleon’s march to Moscow and back considered a good visualization example?
[6 pts]
Is Minard’s Map an exploratory visualization, explanatory visualization or an example of visual art?
3. Why is Nightengale’s rose petal visualization of causes of death a worthy visualization example?
[6 pts]
Was this visualization informative or persuasive, or an example of visual art? Explain your choice.
[20 pts]
4. Data coding
a. The value 0110 1101 in binary is ______in decimal
and its corresponding hexadecimal digits are _____.
Converting decimal 55 to binary becomes ______.
If the 8bit ASCII codes in hex for “A” and “a” are 41 and 61, respectively,
then hex codes for the string “Bad” are ______.
If we want to store 366 unique values, we would need minimally ______bits to represent those values.
b. A 1250 x 800 pixel color image coded in RGB (+alpha) format requires ______Mbytes.
c. Why is a .gif file considered a “lossy” image compression?
d. A 30 second mono sound clip sampling rate at 48000 samples per second with a 16 bit depth will result in storing ______bytes.
e. What are the colors for these RGB hexadecimal encodings?
FFFFFF = ______888888 = ______
00FF00 = ______FFFF00 = ______
f. If your data is simply a table of data with rows and columns, then what editable file structure is appropriate? ______(choose from XML, CSV, BMP, XLS)
If your data contains hierarchical relationships, what editable data file structure would be appropriate? ______(XML, CSV, BMP, XLS)
5. Plot this set of 15 univariate numbers {65, 85, 93, 77, 48, 65, 50, 63, 44, 80, 55, 87, 47, 92, 73} then superimpose a Tukey box plot representing the median and 25th and 75th percentiles.
[12 pts]
┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼
40 45 50 55 60 65 70 75 80 85 90 95
Without calculating do you expect the mean of this set of numbers to be greater or smaller than the median? ______
6. We saw the following relational database SQL query in class. Fill in the blanks below regarding the query.
[4 pts]
SELECT S.lastName, S.firstName, S.major
FROM Student S, Enroll E, Class C, Faculty F
WHERE F.name='Byrne' AND F.facId=C.facId
AND C.classNumber=E.classNumber AND E.stuId=S.stuId
There are _____ (number) tables participating in the query. The combining of the tables in called a(n) ______operation. There are ______(number) attributes in the result. The result will have ______(more/less) tuples than the Student relation has.
7. Describe three possible ways to handle missing data in a data set.
[6 pts]
a.
b.
c.
8. What is normalizing data in an attribute? Give a concrete example of normalizing.
[5 pts]
9. The following attributes are found in a daily weather data set in some state.
[14 pts]
a. Associate the best descriptor for each attribute. If you do not understand the semantics of the attribute, please ask for clarification. Not every descriptor may be used.
Choices: Nominal-Categorical (NC), Nominal-Ranked (NR), Nominal-Arbitrary (NA)
Ordinal Continuous (OC), Ordinal Discrete (OD), Ordinal Statistical (OS)
Spatial/geometric (Sp), Temporal (T)
DateDay of Week
Latitude-Longitude
County
Low temperature
High temperature
Rainfall
Snowfall
Number of highway fatalities
Prominent cloud type
b. Give two attributes that are independent: ______and ______
c. Give two attributes that are dependent: ______and ______
10. Data mining concept views. Refer to the above weather data set attributes.
[9 pts]
a. Give a classical concept view from the data set above. That is, come up with a property in an IF- condition THEN assertion ELSE assertion pattern. Be specific using attributes.
Do something more creative than IF cloudy THEN rainfall>0 ELSE rainfall=0!
b. Recast your classical concept as probabilistic concept views. E.g. when there was rainfall it was cloudy
c. Give two exemplar views of your classical concept (Examples of the concept).