Important

Submit your answers in a Microsoft Excel workbook, with each problem on a separate worksheet. Highlight the answers in yellow and provide an interpretation in a text box.

"Simple Linear Regression and Correlation":

1.  The term regression was originally used in 1885 by Sir Francis Galton in his analysis of the relationship between the heights of children and parents. He formulated the “law of universal regression” which specifies that “each peculiarity in a man is shared by his kinsmen, but on average in a less degree”. In 1903, two statisticians, K. Pearson and A. Lee took a random sample of 1,078 father-son pairs to examine Galton’s law (“on the laws of inheritance in Man, I. Inheritance of physical characteristics” Biometrika 2:457-462). Their sample regression line was

Son’s height=33.73 + .516 * Father’s height

a.  Interpret the coefficients

b.  What does the regression line tell you about the height of sons of tall fathers

c.  What does the regression line tell you about the heights of sons of short fathers?

2.  Florida condominiums are popular winter retreats for many North Americans. In recent years, the prices have steadily increased. A real estate agent wanted to know why prices of similar-sized apartments in the same building vary. A possible answer lies in the floor. It may be that the higher floor, the greater the sale price of the apartment. He recorded the price (in $1,000s) of 1,200 sq. ft. Condominiums in several buildings in the same location that have sold recently and the floor number of the condominiums.

a.  Determine the regression line

b.  What do the coefficient tell you about the relationship between the two variables.

Data

Floor, Price
22, 212
20, 225
16, 261
4, 184
18 ,232
18, 222
21, 210
13, 201
14 ,189
8, 200
4 ,203
2, 196
16 ,220
13 ,245
12 ,211
8 ,216
21, 256
7, 173
8 ,194
8 ,196
23, 182
21, 230
10, 216
9, 188
15, 210
1 ,216
27, 218
8 ,169
27, 235
8, 227
10, 191
1, 203
27, 223
14, 206
28,204
4 ,190
28 ,246
8, 168
10, 193
12, 186
7 ,193
27, 224
21, 231
6, 183
21, 224
9 ,212
12 ,232
7, 193
18 ,233
12, 249

3 . In television’s early years, most commercials were 60 seconds long. Now, however, commercials can be any length. The objective of commercials remains the same – to have as many viewers as possible remembering the product in a favorable way and eventually buy it. In an experiment to determine how the length of a commercial is related to people’s memory of it, 60 randomly selected people were asked to watch a 1-hour television program. In the middle of the show, a commercial advertising a brand of toothpaste appeared. Some viewers watched a commercial that lasted 20 seconds, others watched one that lasted 24 seconds, 28 seconds,…, 60 seconds. The essential content of the commercial was the same. After the show, each person was given a test to measure how much he or she remembered about the product. The commercial times and test scores (on a 30 point test) were recorded and are listed below.
A. Draw a scatter diagram of the data to determine whether a linear model appears to be appropriate.
B. Determine the least squares line.
C. Interpret the coefficients.
Test, Length, Type.
24, 52, 1
20 ,40, 2
16, 36, 2
11, 28, 1
10, 44 ,3
4 ,16, 1
24, 48 ,1
18, 52, 2
16, 60, 3
15, 44, 2
14 ,36, 1
15, 44, 2
24, 60, 3
10, 24, 3
1, 32, 1
8, 40, 3
9 ,24, 1
0, 32, 3
17, 52, 2
9, 36, 3
26, 60, 2
28, 56, 1
15, 20, 1
8, 40, 3
2 ,48, 1
0, 20, 2
11, 24, 3
8 ,36, 3
24, 60, 1
10, 44, 1
15, 52, 2
7 ,28, 3
26, 56, 1
11, 20, 3
18, 52 ,2
16, 16, 2
8,20, 3
12, 40, 1
10, 16, 3
14, 44, 1
19, 32, 2
8, 20, 3
11, 56, 3
24, 56,1
15,60, 2
9 ,24, 3
18, 48, 2
14, 16, 2
14, 32, 1
11, 16, 3
15, 40, 2
11, 24, 3
27, 36, 2
5 ,28, 3
17, 56, 2
8 ,32, 1
15, 28, 1
8, 48 ,1
24, 28, 2
21, 48, 2
Problem 3 (continued)
Refer to Exercise above
A. What is the standard deviation error of estimate? Interpret its value.
B. Describe how well the memory test scores and length of television commercial are linearly related.
C. Are the memory test scores and length of commercial linearly related? Test using a 5% significance level.
D. Estimate the slope coefficient with 90% confidence.

4.  The president of a company that manufacturer car seats has been concerned about the number and cost of machine breakdowns. The problem is that machines are old and becoming quite unreliable. However, the cost of replacing them is quite high, and the president is not certain that the cost can be made up in today’s slow economy. To help make a decision about replacement, he gathered data about last month’s costs for repairs and the ages ( in months) of the plant’s 20 welding machines.

a.  Find the sample regression line

b.  Interpret the coefficients

c.  Determine the coefficient of determination and discuss what this statistics tell you

d.  Conduct a test to determine whether the age of a machine and its monthly cost of repair are linearly related

e.  Is the fit of the simple linear model good enough to allow the president to predict the monthly repair cost of a welding machine that is 120 months old? If so, find a 95% prediction interval. If not explain why not.

Age / Repairs
110 / 327.67
113 / 376.68
114 / 392.52
134 / 443.14
93 / 342.62
141 / 476.16
115 / 324.74
115 / 338.98
115 / 433.45
142 / 526.37
96 / 362.42
139 / 448.76
89 / 335.27
93 / 350.94
91 / 291.81
109 / 467.8
138 / 474.48
83 / 354.15
100 / 420.11
137 / 416.04


"Multiple Regression": ( use a 5% significance level for #5)

5.  Pat Stasdud, a student ranking near the bottom of the statistics class, decided that certain amount of studying could actually improve final grades. However, too much studying would not be warranted because Pat’s ambition ( if that’s what one could call it) was to ultimately graduate with the absolute minimum level of work. Pat was registered in a statistics course that had only 3 weeks to go before final exam and for which the final grade was determined in the following way:

Total mark=20% (Assignment)

+30% (Midterm test)

+50% (Final exam)

To determine how much work to do in the remaining 3 weeks, Pat needed to be able to predict the final exam mark on the basis of the assignment mark (worth 20 points) and the midterm mark (worth 30 points). Pat’s mark on these were 120/20 and 14/30, respectively. The final exam mark, assignment mark, and midterm test mark for 30 students who took the statistics course last year were collected.

Data

Final / Assignment / Midterm
23 / 15 / 11
49 / 15 / 28
34 / 13 / 19
43 / 20 / 26
43 / 20 / 22
29 / 18 / 13
31 / 20 / 10
30 / 10 / 11
36 / 13 / 16
33 / 16 / 15
39 / 19 / 16
33 / 20 / 16
24 / 10 / 12
36 / 12 / 22
29 / 10 / 13
43 / 15 / 23
50 / 12 / 28
40 / 14 / 20
42 / 11 / 20
35 / 13 / 15
31 / 15 / 10
48 / 19 / 30
42 / 13 / 16
37 / 13 / 24
40 / 18 / 20
30 / 10 / 16

a.  Determine the regression equation

b.  What is the standard error of estimate? Briefly describe how you interpret this statistics

c.  What is the coefficient of determination? What does the statistics tell you?

d.  Test the validity of the model

e.  Interpret each of the coefficients

f.  Can Pat infer that the assignment mark is linearly related to the final grade in this model?

g.  Can Pat infer that the midterm mark is linearly related to the final grade in this model?

h.  Predict Pat’s final exam mark with 95% confidence

i.  Predict Pat’s final grade with 95% confidence.

6.  When one company buys another, it is not unusual that some workers are terminated. The severance benefits offered to the laid-off workers are often the subject of dispute. The severance benefits offered to the laid-off workers are often the subject of dispute. Suppose that the Laurier Company recently bought the western company and subsequently terminated 20 of western’s employees. As part of the buyout agreement, it was promised that the severance packages offered to the former western employees would be equivalent to those offered to Laurier employees who had been terminated in the past year. Thirty six year old Bill Smith, a western employee for the past 10 years, earning $32,000 per year, was one of those to let go. His severance package included an offer of 5 weeks severance pay. Bill complained that this offer was less than that offered to Laurier’s employees when they were laid off, in contravention to the buy out agreement. A statistician was called in to settle the dispute. The statistician was told that severance is determined by three factors: age, length of service with the company and pay. To determine how generous the severance package had been, a random sample of 50 Laurier ex-employees was taken. For each, the following variables were recorded:

Number of weeks of severance pay

Age of employee

Number of years with the company

Annual Pay (in thousands of dollars)

a.  Determine the regression equation

b.  Comment on how well the model fits the data

c.  Do all the independent variables belong in the equation? Explain

d.  Perform an analysis to determine whether Bill is correct in his assesmement of the severance package.

Data

Weeks SP / Age / Years / Pay
13 / 37 / 16 / 46
13 / 53 / 19 / 48
11 / 36 / 8 / 35
14 / 44 / 16 / 33
3 / 28 / 4 / 40
10 / 43 / 9 / 31
4 / 29 / 3 / 33
7 / 31 / 2 / 43
12 / 45 / 15 / 40
7 / 44 / 15 / 32
8 / 42 / 13 / 42
11 / 41 / 10 / 38
9 / 32 / 5 / 25
10 / 45 / 13 / 36
18 / 48 / 19 / 40
17 / 52 / 20 / 34
13 / 42 / 11 / 33
14 / 42 / 19 / 38
5 / 27 / 2 / 25
11 / 50 / 15 / 36
10 / 46 / 14 / 36
8 / 28 / 6 / 22
15 / 44 / 16 / 32
7 / 40 / 6 / 27
9 / 37 / 8 / 37
11 / 44 / 12 / 35
10 / 33 / 13 / 32
8 / 41 / 14 / 42
5 / 33 / 7 / 37
6 / 27 / 4 / 35
14 / 39 / 12 / 36
12 / 50 / 17 / 30
10 / 43 / 11 / 29
14 / 49 / 14 / 29
12 / 48 / 17 / 36
12 / 41 / 17 / 37
8 / 39 / 8 / 36
12 / 49 / 16 / 28
10 / 37 / 10 / 35
11 / 37 / 13 / 37
15 / 44 / 19 / 33
5 / 31 / 6 / 37
8 / 42 / 9 / 36
11 / 40 / 11 / 32
15 / 35 / 15 / 30
11 / 46 / 13 / 40
6 / 25 / 5 / 33
6 / 40 / 7 / 33
13 / 40 / 14 / 48
9 / 38 / 10 / 37

Important

Submit your answers in a Microsoft Excel workbook, with each problem on a separate worksheet. Highlight the answers in yellow and provide an interpretation in a text box.