IT 241 Information Discovery Fall 2012 Exam 1 Page 1

Thursday, Sept. 20, 2012

Name ______

[18 pts]

1.  Below is one of the visualization pipelines from the text.

a.  Excluding the Internet, describe 3 sources where the raw data may reside? [4]

b.  Describe three data transformations that are possibly used to generate a useful data table for input to a visualization tool. [6]

c.  Give an example of a visual mapping (color, line, dot, position, etc.) you might use to represent each of the following attribute types in a scatterplot: [8]
Nominal data of 4 distinct categories:
Nominal data of 10 categories that can be ranked:
Ordinal data:
Relational data (connections among data points):

2.  Why is Minard’s map of Napoleon’s march to Moscow and back considered a good visualization example?

[6 pts]

Is Minard’s Map an exploratory visualization, explanatory visualization or an example of visual art?

3.  Why is Nightengale’s rose petal visualization of causes of death a worthy visualization example?

[6 pts]

Was this visualization informative or persuasive, or an example of visual art? Explain your choice.

[20 pts]

4.  Data coding

a.  The value 0110 1101 in binary is ______in decimal
and its corresponding hexadecimal digits are _____.
Converting decimal 55 to binary becomes ______.
If the 8bit ASCII codes in hex for “A” and “a” are 41 and 61, respectively,
then hex codes for the string “Bad” are ______.
If we want to store 366 unique values, we would need minimally ______bits to represent those values.

b.  A 1250 x 800 pixel color image coded in RGB (+alpha) format requires ______Mbytes.

c.  Why is a .gif file considered a “lossy” image compression?

d.  A 30 second mono sound clip sampling rate at 48000 samples per second with a 16 bit depth will result in storing ______bytes.

e.  What are the colors for these RGB hexadecimal encodings?
FFFFFF = ______888888 = ______
00FF00 = ______FFFF00 = ______

f.  If your data is simply a table of data with rows and columns, then what editable file structure is appropriate? ______(choose from XML, CSV, BMP, XLS)
If your data contains hierarchical relationships, what editable data file structure would be appropriate? ______(XML, CSV, BMP, XLS)

5.  Plot this set of 15 univariate numbers {65, 85, 93, 77, 48, 65, 50, 63, 44, 80, 55, 87, 47, 92, 73} then superimpose a Tukey box plot representing the median and 25th and 75th percentiles.

[12 pts]

┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼

40 45 50 55 60 65 70 75 80 85 90 95

Without calculating do you expect the mean of this set of numbers to be greater or smaller than the median? ______

6.  We saw the following relational database SQL query in class. Fill in the blanks below regarding the query.

[4 pts]

SELECT S.lastName, S.firstName, S.major
FROM Student S, Enroll E, Class C, Faculty F
WHERE F.name='Byrne' AND F.facId=C.facId
AND C.classNumber=E.classNumber AND E.stuId=S.stuId

There are _____ (number) tables participating in the query. The combining of the tables in called a(n) ______operation. There are ______(number) attributes in the result. The result will have ______(more/less) tuples than the Student relation has.

7.  Describe three possible ways to handle missing data in a data set.

[6 pts]

a.

b.

c.

8.  What is normalizing data in an attribute? Give a concrete example of normalizing.

[5 pts]

9.  The following attributes are found in a daily weather data set in some state.

[14 pts]

a.  Associate the best descriptor for each attribute. If you do not understand the semantics of the attribute, please ask for clarification. Not every descriptor may be used.

Choices: Nominal-Categorical (NC), Nominal-Ranked (NR), Nominal-Arbitrary (NA)

Ordinal Continuous (OC), Ordinal Discrete (OD), Ordinal Statistical (OS)

Spatial/geometric (Sp), Temporal (T)

Date
Day of Week
Latitude-Longitude
County
Low temperature
High temperature
Rainfall
Snowfall
Number of highway fatalities
Prominent cloud type

b.  Give two attributes that are independent: ______and ______

c.  Give two attributes that are dependent: ______and ______

10.  Data mining concept views. Refer to the above weather data set attributes.

[9 pts]

a.  Give a classical concept view from the data set above. That is, come up with a property in an IF- condition THEN assertion ELSE assertion pattern. Be specific using attributes.
Do something more creative than IF cloudy THEN rainfall>0 ELSE rainfall=0!

b.  Recast your classical concept as probabilistic concept views. E.g. when there was rainfall it was cloudy

c.  Give two exemplar views of your classical concept (Examples of the concept).