Assignments:

1.  A. In order to discover the average number of children, Fred, a young college professor at a large Midwestern university, conducted a survey in which a random sample of 1,000 students in Psychology 101 were asked how many children (including themselves) were in their family. The researcher added all the data together and divided by 1,000 and uncovered the answer – 3.5. The most recent census has found that the average number of children in a U.S. family is 2.5.

a.  Why are these two numbers different?

b.  Should Fred have designed his study differently to get the right answer?

c.  How?

B. Prenatal screening for Down syndrome for mothers over the age or 35 is usually recommended. A non-invasive test is about 95% accurate. That is, if the fetus has Down syndrome it will be detected 95% of the time. And if the fetus does not have Down’s it will correctly say so 80% of the time. We know that Down’s is not very common, affecting only about one in every 200 fetuses whose mothers are over age 35.

a. What is the probability that if the test says the fetus has Down’s, that the test is correct?

b. What is the probability that if the test says the fetus doesn’t have Down’s, that the test is correct?

C. In a survey of hospitals it was found that those hospitals that had the highest proportion of female births tended to also have the fewest births of any of the hospitals in the survey. The Jones family, having already had as son, decided to boost their chances of a daughter by going to one of the hospitals that, so far this year, had the highest likelihood of female births.
a. Is this a sensible strategy?
b. If so, why? If not, why not?
c. How does this shed light on why the best performing mutual funds are usually small?
d. Should this guide our investment strategy? If so, why? If not, why not?


2. A. Find data displays in the mass media (not a blog) that illustrates at least two of the most common errors. You can find one display with multiple flaws, or two displays with one flaw apiece. Redo the displays correctly. Explain (i) where you found the displays, (ii) what you believe the point of the display was, (iii) what were the flaws, and (iv) what you did to fix them.

(e.g. see http://flowingdata.com/2009/11/26/fox-news-makes-the-best-pie-chart-ever/)

B. What were the key lessons in Arbuthnot’s (1710) paper? Compare the explanations for the change in the number of christenings in 1704 with that in 1665-1666.

B.  Find one wonderful display in the mass media. Explain (i) where you found the display, (ii) what you believe the point of the display was, (iii) why you think it is wonderful.

3  A. With your knowledge of improved methods of multivariate display, develop a display the following data set:

Antibiotic
Bacteria / Penicillin / Streptomycin / Neomycin / Gram Staining
Aerobacter aerogenes / 870 / 1 / 1.6 / negative
Brucella abortus / 1 / 2 / 0.02 / negative
Brucella anthracis / 0.001 / 0.01 / 0.007 / positive
Diplococcus pneumoniae / 0.005 / 11 / 10 / positive
Escherichia coli / 100 / 0.4 / 0.1 / negative
Klebsiella pneumoniae / 850 / 1.2 / 1 / negative
Mycobacterium tuberculosis / 800 / 5 / 2 / negative
Proteus vulgaris / 3 / 0.1 / 0.1 / negative
Pseudomonas aeruginosa / 850 / 2 / 0.4 / negative
Salmonella (Eberthella) typhosa / 1 / 0.4 / 0.008 / negative
Salmonella schottmuelleri / 10 / 0.8 / 0.09 / negative
Staphylococcus albus / 0.007 / 0.1 / 0.001 / positive
Staphylococcus aureus / 0.03 / 0.03 / 0.001 / positive
Streptococcus fecalis / 1 / 1 / 0.1 / positive
Streptococcus hemolyticus / 0.001 / 14 / 10 / positive
Streptococcus viridans / 0.005 / 10 / 40 / positive

The entries of the table are the minimum inhibitory concentration (MIC) in ug/ml, a measure of the effectiveness of the antibiotic. The MIC represents the concentration of antibiotic required to prevent growth in vitro. The covariate “gram staining” describes the reaction of the bacteria to Gram staining. Gram-positive bacteria are those that are stained dark blue or violet; Gram-negative bacteria do not react that way.

B.  Smoothing problem – One might think that if life expectancy is great the murder rate cannot be. But although murder does not take a huge toll on a population perhaps it is an indicant of other life-threatening processes going on in society.

(a) Plot life expectancy as a function of murder rate, then

(b) smooth life expectancy by adding the 53h twice smooth to the plot. What have you learned?

(c) Make a separate plot of residuals from the smooth vs. murder rate. What has this taught you?

(d) Add a straight-line fit to the plot. Does this help us to understand things better? Or does it hide things that the smooth has told us? Explain.

STATE NAME / LIFE EXPECT. / MURDER / HSGRAD / INCOME / ILLITERACY
Alabama / 69.1 / 15.1 / 41.3 / 3624 / 2.1
Alaska / 69.3 / 11.3 / 66.7 / 6315 / 1.5
Arizona / 70.6 / 7.8 / 58.1 / 4530 / 1.8
Arkansas / 70.7 / 10.1 / 39.9 / 3378 / 1.9
California / 71.7 / 10.3 / 62.6 / 5114 / 1.1
Colorado / 72.1 / 6.8 / 63.9 / 4884 / 0.7
Connecticut / 72.5 / 3.1 / 56.0 / 5348 / 1.1
Delaware / 70.1 / 6.2 / 54.6 / 4809 / 0.9
Florida / 70.7 / 10.7 / 52.6 / 4815 / 1.3
Georgia / 68.5 / 13.9 / 40.6 / 4091 / 2.0
Hawaii / 73.6 / 6.2 / 61.9 / 4963 / 1.9
Idaho / 71.9 / 5.3 / 59.5 / 4119 / 0.6
Illinois / 70.1 / 10.3 / 52.6 / 5107 / 0.9
Indiana / 70.9 / 7.1 / 52.9 / 4458 / 0.7
Iowa / 72.6 / 2.3 / 59.0 / 4628 / 0.5
Kansas / 72.6 / 4.5 / 59.9 / 4669 / 0.6
Kentucky / 70.1 / 10.6 / 38.5 / 3712 / 1.6
Louisiana / 68.8 / 13.2 / 42.2 / 3545 / 2.8
Maine / 70.4 / 2.7 / 54.7 / 3694 / 0.7
Maryland / 70.2 / 8.5 / 52.3 / 5299 / 0.9
Massachusetts / 71.8 / 3.3 / 58.5 / 4755 / 1.1
Michigan / 70.6 / 11.1 / 52.8 / 4751 / 0.9
Minnesota / 73.0 / 2.3 / 57.6 / 4675 / 0.6
Mississippi / 68.1 / 12.5 / 41.0 / 3098 / 2.4
Missouri / 70.7 / 9.3 / 48.8 / 4254 / 0.8
Montana / 70.6 / 5.0 / 59.2 / 4347 / 0.6
Nebraska / 72.6 / 2.9 / 59.3 / 4508 / 0.6
Nevada / 69.0 / 11.5 / 65.2 / 5149 / 0.5
NewHampshire / 71.2 / 3.3 / 57.6 / 4281 / 0.7
NewJersey / 70.9 / 5.2 / 52.5 / 5237 / 1.1
NewMexico / 70.3 / 9.7 / 55.2 / 3601 / 2.2
NewYork / 70.6 / 10.9 / 52.7 / 4903 / 1.4
NorthCarolina / 69.2 / 11.1 / 38.5 / 3875 / 1.8
NorthDakota / 72.8 / 1.4 / 50.3 / 5087 / 0.8
Ohio / 70.8 / 7.4 / 53.2 / 4561 / 0.8
Oklahoma / 71.4 / 6.4 / 51.6 / 3983 / 1.1
Oregon / 72.1 / 4.2 / 60.0 / 4660 / 0.6
Pennsylvania / 70.4 / 6.1 / 50.2 / 4449 / 1.0
RhodeIsland / 71.9 / 2.4 / 46.4 / 4558 / 1.3
SouthCarolina / 68.0 / 11.6 / 37.8 / 3635 / 2.3
SouthDakota / 72.1 / 1.7 / 53.3 / 4167 / 0.5
Tennessee / 70.1 / 11.0 / 41.8 / 3821 / 1.7
Texas / 70.9 / 12.2 / 47.4 / 4188 / 2.2
Utah / 72.9 / 4.5 / 67.3 / 4022 / 0.6
Vermont / 71.6 / 5.5 / 57.1 / 3907 / 0.6
Virginia / 70.1 / 9.5 / 47.8 / 4701 / 1.4
Washington / 71.7 / 4.3 / 63.5 / 4864 / 0.6
WestVirginia / 69.5 / 6.7 / 41.6 / 3617 / 1.4
Wisconsin / 72.5 / 3.0 / 54.5 / 4468 / 0.7
Wyoming / 70.3 / 6.9 / 62.9 / 4566 / 0.6

4.  A. Exact exponential growth – Fred and Alice were born the same year, and each began life with $500. Fred added $100 each year but kept his treasure under his mattress so he earned no interest. Alice added nothing, but earned interest at 7.5% annually. After 25 years, Fred and Alice are getting married. Who has more money? How much does each have? Alice’s cousin Charlie thinks that Fred is a paranoid loser and that Alice is cheap. He used a combined strategy and added $100 a year and obtained 7.5% interest. How much did he have after 25 years? All three continued with their strategies in the hopes of using the money to fund retirement. How much did each have at age 65?

a.  Generate accumulations for each person for 65 years

b.  Plot both series.

c.  Answer the questions.

d.  Fit linear function to Fred

e.  Based on this experiment which retirement savings strategy works better, (a) add money regularly or (b) start early.

B.  In Table 2 below are a number of state statistics. Some are correct and some are made up.

a.  Through plots, correlations and regression lines discuss the relationship between the correct data and their imaginary counterparts.

b.  Compare the four NAEP scores and see if the mean NAEP score adequately represents all states.

c.  How would you characterize Gore and Bush states vis-à-vis their income and academic performance?

d.  Has this characterization changed for the 2004 election?

e.  And what about obesity (Table 3)? Include in your answer some discussion of fat blue states and thin red ones (i.e. states with large residuals).

Table 2. Correct state data on income and academic accomplishment
Median / NAEP Scores
State / Income / Math-4 / Rdg - 4 / Math-8 / Rdg-8 / mean NAEP / '00 election / IQ / FakeIncome
Massachusetts / $50,587 / 242 / 228 / 287 / 273 / 257 / Gore / 111 / 24059
New Hampshire / $53,549 / 243 / 228 / 286 / 271 / 257 / Bush / 102 / 18834
Vermont / $41,929 / 242 / 226 / 286 / 271 / 256 / Gore / 102 / 20049
Minnesota / $54,931 / 242 / 223 / 291 / 268 / 256 / Gore / 113 / 26979
Connecticut / $53,325 / 241 / 228 / 284 / 267 / 255 / Gore / 99 / 18287
North Dakota / $36,717 / 238 / 222 / 287 / 270 / 254 / Bush / 111 / 26457
South Dakota / $38,755 / 237 / 222 / 285 / 270 / 254 / Bush / 100 / 18226
Montana / $33,900 / 236 / 223 / 286 / 270 / 254 / Bush / 100 / 18727
Wyoming / $40,499 / 241 / 222 / 284 / 267 / 253 / Bush / 102 / 20398
Iowa / $41,827 / 238 / 223 / 284 / 268 / 253 / Gore / 109 / 23534
New Jersey / $53,266 / 239 / 225 / 281 / 268 / 253 / Gore / 103 / 21451
Virginia / $49,974 / 239 / 223 / 282 / 268 / 253 / Bush / 99 / 18202
Kansas / $42,523 / 242 / 220 / 284 / 266 / 253 / Bush / 101 / 20253
Maine / $37,654 / 238 / 224 / 282 / 268 / 253 / Gore / 99 / 19508
Colorado / $49,617 / 235 / 224 / 283 / 268 / 252 / Bush / 104 / 21608
Wisconsin / $46,351 / 237 / 221 / 284 / 266 / 252 / Gore / 105 / 22974
Ohio / $43,332 / 238 / 222 / 282 / 267 / 252 / Bush / 107 / 20299
North Carolina / $38,432 / 242 / 221 / 281 / 262 / 252 / Bush / 106 / 21218
Nebraska / $43,566 / 236 / 221 / 282 / 266 / 251 / Bush / 101 / 21278
Washington / $44,252 / 238 / 221 / 281 / 264 / 251 / Gore / 92 / 15353
Indiana / $41,581 / 238 / 220 / 281 / 265 / 251 / Bush / 105 / 22934
Missouri / $43,955 / 235 / 222 / 279 / 267 / 251 / Bush / 92 / 16854
New York / $42,432 / 236 / 222 / 280 / 265 / 251 / Gore / 90 / 16558
Delaware / $50,878 / 236 / 224 / 277 / 265 / 250 / Gore / 90 / 16062
Utah / $48,537 / 235 / 219 / 281 / 264 / 250 / Bush / 89 / 17423
Oregon / $42,704 / 236 / 218 / 281 / 264 / 250 / Gore / 100 / 20629
Idaho / $38,613 / 235 / 218 / 280 / 264 / 249 / Bush / 96 / 19376
Pennsylvania / $43,577 / 236 / 219 / 279 / 264 / 249 / Gore / 99 / 20124
Michigan / $45,335 / 236 / 219 / 276 / 264 / 249 / Gore / 99 / 18624
Illinois / $45,906 / 233 / 216 / 277 / 266 / 248 / Gore / 93 / 17667
Maryland / $55,912 / 233 / 219 / 278 / 262 / 248 / Gore / 95 / 19084
Kentucky / $37,893 / 229 / 219 / 274 / 266 / 247 / Bush / 94 / 18043
Texas / $40,659 / 237 / 215 / 277 / 259 / 247 / Bush / 98 / 18835
South Carolina / $38,460 / 236 / 215 / 277 / 258 / 246 / Bush / 87 / 15325
Florida / $38,533 / 234 / 218 / 271 / 257 / 245 / Bush / 87 / 16067
West Virginia / $30,072 / 231 / 219 / 271 / 260 / 245 / Bush / 92 / 16534
Alaska / $55,412 / 233 / 212 / 279 / 256 / 245 / Bush / 92 / 17892
Rhode Island / $44,311 / 230 / 216 / 272 / 261 / 245 / Gore / 89 / 15989
Oklahoma / $35,500 / 229 / 214 / 272 / 262 / 244 / Bush / 98 / 19397
Georgia / $43,316 / 230 / 214 / 270 / 258 / 243 / Bush / 93 / 15065
Arkansas / $32,423 / 229 / 214 / 266 / 258 / 242 / Bush / 98 / 21603
Tennessee / $36,329 / 228 / 212 / 268 / 258 / 241 / Bush / 90 / 16198
Arizona / $41,554 / 229 / 209 / 271 / 255 / 241 / Bush / 92 / 18130
Nevada / $46,289 / 228 / 207 / 268 / 252 / 239 / Bush / 92 / 15439
Hawaii / $49,775 / 227 / 208 / 266 / 251 / 238 / Gore / 94 / 17341
California / $48,113 / 227 / 206 / 267 / 251 / 238 / Gore / 94 / 17119
Louisiana / $33,312 / 226 / 205 / 266 / 253 / 238 / Bush / 99 / 20266
Alabama / $36,771 / 223 / 207 / 262 / 253 / 236 / Bush / 90 / 15712
Mississippi / $32,447 / 223 / 205 / 261 / 255 / 236 / Bush / 90 / 16220
New Mexico / $35,251 / 223 / 203 / 263 / 252 / 235 / Gore / 85 / 14088
NAEP data were gathered in February, 2003.


Table 3

State / % Obese / Voted For / State / % Obese / Voted For
Hawaii / 17 / Kerry / Wisconsin / 22 / Kerry
Colorado / 17 / Bush / Nevada / 22 / Bush
Connecticut / 18 / Kerry / Alaska / 23 / Bush
Massachusetts / 18 / Kerry / Iowa / 23 / Bush
New Hampshire / 18 / Kerry / Kansas / 23 / Bush
Utah / 18 / Bush / Missouri / 23 / Bush
California / 19 / Kerry / Nebraska / 23 / Bush
Maryland / 19 / Kerry / North Dakota / 23 / Bush
New Jersey / 19 / Kerry / Ohio / 23 / Bush
Rhode Island / 19 / Kerry / Oklahoma / 23 / Bush
Vermont / 19 / Kerry / Pennsylvania / 24 / Kerry
Florida / 19 / Bush / Arkansas / 24 / Bush
Montana / 19 / Bush / Georgia / 24 / Bush
Oregon / 20 / Kerry / Indiana / 24 / Bush
Arizona / 20 / Bush / North Carolina / 24 / Bush
Idaho / 20 / Bush / Virginia / 24 / Bush
New Mexico / 20 / Bush / Michigan / 25 / Kerry
Wyoming / 20 / Bush / Kentucky / 25 / Bush
Maine / 21 / Kerry / Tennessee / 25 / Bush
New York / 21 / Kerry / Alabama / 26 / Bush
Washington / 21 / Kerry / Louisiana / 26 / Bush
D.C / 21 / Kerry / South Carolina / 26 / Bush
South Dakota / 21 / Bush / Texas / 26 / Bush
Delaware / 22 / Kerry / Mississippi / 27 / Bush
Illinois / 22 / Kerry / West Virginia / 28 / Bush
Minnesota / 22 / Kerry
Fat data from
NY Times Feb. 1, 2004
Page 12
Centers for Disease Control & Prevention

5.  What is the pricing structure of convertibles? How would you answer someone who asked “how much does a convertible cost? Do the costs of convertibles fall into specific groups?” A transformation is most useful in the revelation of the underlying price structure. Include informative displays and a narrative explaining both what you did and what you found.