Final Review KEY

Disclaimer: I tried to put important topics and concepts that I thought you should go over again before the final. Anything we covered in class, on homework, classwork, or that was in exams or checkpoint quizzes can still be fair game on the final.

For the exam: You may use a scientific calculator, no phones or computers allowed for calculations. No notes, no books, no computers allowed.

Mod 2: “Distributions for Quantitative Data”

Things to Remember from this Module:

ü  How to calculate and interpret the Mean, Median, IQR, Range, ADM, and Standard Deviation.

ü  Shapes of Histograms and Dot Plots

ü  Know the difference between categorical and quantitative variables

ü  Write an essay describing a data set by interpreting all of the sample statistics and graphs.

Practice questions

1.  Use the following set of data: 7, 9, 9, 10, 11, 11, 12, 12, 12, 12, 15, 22

a.  Find the Median: 11.5

b.  Find the quartiles, , and . Now find the Interquartile Range (IQR).

= 9.5, =11.5, =12 ,IQR = 2.5

c. 3

d. 5

2.  The SPCA collects the following data about the dogs they house. Label whether each variable is categorical or quantitative. Circle one.

a. breed categorical / quantitative

b. number of days housed categorical / quantitative

c. color categorical / quantitative

d. veterinary costs categorical / quantitative

e. weight categorical / quantitative

3.  The class of Math 075 students have their heights measured (in inches) for a statistical survey. The tallest student in the class is 71 inches tall. One of the students in the class has been assigned to record all the values, but made a mistake when recording one of the values. Instead of recording 71, this student accidentally typed 711.

How will the median and IQR be affected by this mistake and why? Not affected, since the position of the numbers does not change.

How will the mean and standard deviation be affected by this mistake and why? The mean and standard deviation will increase since the data will be more spread out due to the outlier of 711.

4.  Consider the two quantitative data sets below:

Set A: The times (in minutes) of all competitors in the 1,500 meter running track-and-field event at the most recent Olympic Games.

Set B: The times (in minutes) of all competitors in the 1,500 meter running track-and-field event at all high schools meets in the United States last year.

Which set has the smaller standard deviation and why? Set A, the times for the Olympic runners would have a smaller spread when compared to the times for the students in set B.

5.  A student conducts a study for her statistics project. She investigates the relationship between the number of text messages (sent and received on average each day) and GPA (grade point average). She collects data from a random sample of 50 students. The correlation is 0.015. What can you conclude about the relationship between the two quantitative variables? Since r= 0.015, there is a weak correlation between the two variables. Thus, it can not be concluded that the two variables are related.

6.  Consider the following histograms:

Answer the following questions with a letter I, II, II, or IV.

Explain your choice for each question using complete sentences.

a)  Which of the histograms could represent a distribution of salaries paid at a financial company with many low level analysts, few top-ranking employees, and one CEO? II, The bulk of the data would fall to the left with few values on the right side.

b)  Which of the histograms could represent a distribution of weights of babies for a large random sample of male newborns at a local hospital? III, Weights are normally distributed.

c)  Assume that the histograms are drawn on the same scale. Which of the histograms has the largest IQR? IV, The middle 50% of the data is spread out the most.

7.  If you are asked to describe a data set in an essay from graphs and sample statistics, what key things should you discuss? (See Mod2 exam and review.)

Shape, outliers, best measure of center (mean vs. median), best measure of spread (standard deviation vs. IQR)

Mod 3: “Linear Regression”

Things to Remember from this Module:

ü  How to calculate slope, describe slope as a rate of change with units

ü  How to find the equation of a line with two points

ü  What does correlation tell us? Facts about correlation, strength

ü  How to graph lines

ü  Word problems with slope/lines

ü  Explanatory/response Variables, know how to determine which is which

ü  Interpret r ADL

ü  Lurking variables, be able to identify possible lurking variables

ü  Come up with the regression line using formulas

ü  Do not extrapolate using your linear regression model

Practice Questions:

10.  In 1998, Ann bought a house for $184,000. In 2009, the house is worth $305,000. Find the average annual rate of change in dollars per year of the value of the house. Round your answer to the nearest cent. Let x represent number of years after 1990.

a.  Write the data provided into two ordered pairs (8, 184000),(19, 305000)

b. Find the slope, include the appropriate label $11000 per year

c. Find the equation of the line representing Ann’s house value y=96000+11000x

d. Estimate Ann’s house value in 2004 (x=14) $250000

e. Interpret the meaning of the slope in context of this problem. As each year passes, the value of her house increases by $11000

f. Interpret the meaning of the y-intercept in context of this problem. In 1990 (x=0), the value of her house was 96000

11.  Give the following scatterplots

a. Write positive, negative, or no correlation next to each scatterplot

#1 positive, #2 slightly positive (or no linear corr), #3 no corr, #4 negative

b. Which graph has the strongest linear correlation__4______

c. Which graph has the strongest non-linear correlation__2______

d. The four correlation coefficients for the scatterplots shown are -0.1169#3, 0.7699#1, -0.9396#4, and 0.1632#2. Match the correlations to the plots.

12.  Does a strong positive correlation PROVE that one variable causes the other variable to respond?No, association does not imply causation. Lurking variables can affect outcomes as well. Ex. Recall: Polio was caused by consumption of ice cream. Does no correlation always imply there is no relationship? No, there could be a non-linear relationship. Ex. The length of a bear as it ages is best described with a logarithmic curve. Give an example of two variables that confirms your answers.

13.  For an essay question on a relationship between two variables, what key things should you talk about? (See exam on Mod3) direction, form, strength, outliers

14.  A stop-smoking pamphlet states, “Children of mothers who smoked during pregnancy scored nine points lower on intelligence tests at ages three and four than children of nonsmokers.” Does this statement provide good evidence that mothers’ smoking during pregnancy causes lower scores in their children? If not, state why using complete sentences. No, association does not imply causation!!

15.  What does an outlier do to an r-value? It changes closer to zero.How will the absolute value of the r-value change: increase or decrease? And thus making it a weaker correlation.

16.  We looked at 40 randomly selected men to analyze the relationship between the weight of a man and his BMI (Body Mass Index). Minitab found the following graphs and statistics.

a.  What does the scatterplot tell us about the relationship between the weight of a man and his Body Mass Index? There is a positive linear relationship. As the weight increases, the BMI increases as well.

b. What is the explanatory variable? weight

c. What is the response variable? BMI

d. Interpret the slope and y-intercept in context to the problem. Slope: With every increase of one pound, the BMI will increase by 0.104 units. Y-intercept: When a man weighs 0 pounds, the BMI will be 8.02 units (which does not have meaning in this context.)

e. A trainer said that if a man is heavy, it will cause him to have a large
BMI. Is this a valid this statement? No, association does not imply causation!

f. What is the r value and what does it tell us? 0.800 There is a strong positive correlation.

g. Are there any other lurking variables that might influence BMI besides weight?

Yes, examples: genetics, medications, etc.

h. The ADL is 2.1. Interpret the meaning in context to this problem. (Note: there are two meanings.) 1) Each data point is on average a distance of 2.1 BMI units from the regression line.

2) On average, predictions will have an error of 2.1 BMI units.

i. Use the regression line to predict the BMI of a man that weighs
220 pounds? How accurate do you think this prediction is? BMI =30.9 units Since the error is 2.1, one can estimate that the actual BMI reading can be between 30.9+2.1=33 units and 30.9-2.1=28.8 units.

j. Can we use the regression line to predict the BMI of a man that is 100
pounds? Why or why not? No, it is not within the scope of the data. We are not allowed to extrapolate since we do not know if the pattern would continue beyond the scope of the data provided.

Match each description of a set of measurements to a scatterplot.

17.  x = average outdoor temperature and y = heating costs for a residence for 10 winter days scatterplot 3

18.  x = height (inches) and y = shoe size for 10 adults scatterplot 1

19.  x = height (inches) and y = score on an intelligence test for 10 teenagers

scatterplot 2

Identify the explanatory and response variables in the following:

20. How the price of a package of meat is related to the weight of the package?

a.  Explanatory:______weight______

b.  Response:______price______

21.  Is the distance you walk related to the calories you burn?

c.  Explanatory:______distance walked______

d.  Response:______calories burned______

Mod 4: “Non Linear Modeling”

Things to Remember from this Module:

ü  How to graph exponential, logarithmic, and quadratic functions, and give the Domain of each.

ü  Determine a quadratic function with two points, one being the vertex

ü  Find the equation of an exponential function with two points, one being the y intercept

ü  Change from log form to exponential form and graph.

ü  Use the change of base formula to change a Log to LN form and graph.

ü  Use exponential, quadratic, and logarithmic models to make predictions

ü  Be able to interpret the meaning of ADC’s and use ADC’s to determine which curve is the best fit.

Practice Problems

22. Graph

Has aso it will be a parabola. (Domain is all reals.)

x / y
0 / -8
1 / -3
2 / 0
3 / 1
4 / 0
5 / -3

23. Graph

Has aso it will be an exponential graph. (Domain is all reals.) Choose some negative and positive values for x. Horizontal Asymptote y=-4

x / y
-7 / -3.75
0 / -3.5
1 / -3
2 / -2
3 / 0

24. Convert to exponential form and graph

(Domain is x>2 since you can’t choose numbers less than or equal to 2.) Vertical Asymptote x= 1

x / y
1.5 / -0.3
2 / 0
3 / 0.3
4 / 0.5

25. Graph and find domain

(Domain is x>0 since you can’t use numbers less than 0 or equal to 0.) Asymptote: x=0

x / y
0.5 / -3.3
1 / -3
2 / -2.7
3 / -2.5