1. During the early part of the 1994 baseball season, many sports fans and baseball players noticed that the number of home runs being hit seemed to be unusually large. Below are the team-by-team statistics on home runs hit through Friday, June 3, 1994 by National and American League teams (from the Columbus Dispatch Sports Section, Sunday, June 5, 1994).
a. (10 points) Find the five number summaries for each League.
League / Min / Q1 / Median / Q3 / MaxAmerican / 35 (1) / 49 (1) / 57.5 (1) / 68 (1) / 77 (1)
National / 29 (1) / 46 (1) / 50.5 (1) / 55 (1) / 67 (1)
b. (10 points) Describe the center, spread and shape of the distribution of homeruns in each League.
League / Center / Spread / ShapeAmerican / (1)57 -58 / (2)35 to 77 / (2) Symmetric to Left skewed
National / (1)50-51 / (2)29 to 67 / (2) Skewed Left
c. (10 points) Which league had more homeruns?
The American League, across the board (from min to max) had more home runs.(10)
d. (10) Sketch a box plot of the data from each league (by hand).
(5) for each box-whisker plot drawn by hand.
2. The scatter plot maps the relation between violent crimes (in number per 100,000 in the population) and poverty rate (in number per 100,000 in the population).
a. (4 points) What is the response variable and which is the explanatory variable?
Response: Violent Crimes (2) ; Explanatory: Poverty (2)
b. (6 points) Estimate the violent crime rate without poverty per state, using the equation provided with the scatter plot.
(6) with x=0, y=29.684 violent crimes per 100,000 persons in the population
c. (10 points) Interpret the slope coefficient of the regression line (39.798).
For every additional 1 person per 100,000 in the population that lives in poverty, there are almost 40 additional violent crimes per 100,000 in the population. (10)
d. (10 points) Explain why you might not have confidence in using this coefficient to project violent crime rate in a state by calculating the correlation coefficient from the stated R2 in the above problem.
The R-Square is less than 18% (.1796), which means that poverty rate is loosely associated (5) (albeit in a positive way (5)) with the rate of violent crimes.
3. Consider the following scatter plot of two variables X and Y.
e. (10 points) What is the correlation coefficient for X and Y? Explain how you arrive at your answer without the need of computing the correlation or even without knowledge of the data for X and Y.
The correlation coefficient, r = 0 (5) because the form of the data is a perfect hyperbola—and not linear at all. Thus no straight line will fit the scatter plot meaningfully.(5)
f. (10 points) Draw the least squares regression line in the graph above (sketch the trend line).
See the horizontal line drawn above. (10)
4. (10 points) A study of human development showed two types of movies to groups of children. Crackers were available in a bowl, and the investigators compared the number of crackers eaten by children watching the different kinds of movies. One kind of movie was shown at 8 AM (right after the children had breakfast) and another at 11 AM (right before the children had lunch). It was found that during the movie shown at 11 AM, more crackers were eaten than during the movie shown at 8 AM. The investigators concluded that the different types of movies had an effect on appetite. Explain why the results of this poll cannot be trusted.
Lurking variables not clearly designated (5), and also there are no control groups established (5).
5. A business has two types of employees: managers and workers. Managers earn either $100,000 or $200,000 per year. Workers earn either $10,000 or $20,000 per year. The number of male and female managers at each salary level and the number of male and female workers at each salary level appear in the columns of the following two-way tables:
Managers/ Gender
Male / Female
Salary / $100,000 / 80 / 20
$200,000 / 20 / 30
Workers / Gender
Male / Female
Salary / $10,000 / 30 / 20
$20,000 / 20 / 80
a. (20 points) Find the Conditional Percentage Distribution for Managers by Salary
Managers/ Gender
Male / Female / Total
Salary / $100,000 / 80% (5) / 20% (5) / 100%
$200,000 / 40% (5) / 60% (5) / 100%
b. (20 points) Who earns more on average at this company, Males or Females?
There are 150 Men and 150 Women, so all I have to do is check for total payroll by gender across worker types and salary ranges: men earned a total of 8M + 4M + 0.3M + 0.4M = 12.7M (10); women earned a total of 2M + 6M + 0.2M + 1.6M = 9.6M (10). Thus Men earn more on average.
6. (10 points) Select a simple random sample of three from the following employees of a small company. Use the list of random numbers below, and assign multiple two-digit identifiers to each employee in order to select the three. Start by identifying the first person with 00 (there is only one answer for this question)
(3)
1. Bechhofer 00 / 4. Kesten 03 / 7. Taylor 072. Brown 01 / 5. Kiefer 04 / 8. Wald 08
3. Ito 02 / 6. Spitzer 05 / 9. Weiss 09
Random digits:
11,79,32,04,95,05,90,71,13,84,44,98,22,07,51
Pick 1: _____Ito_____(3)
Pick 2: ____Wald_____(2)
Pick 3: ____Spitzer___(2)
2