MDM4UName:______

Date:

Unit 1 Review

1.A survey was taken to examine the types of movies preferred by people. The results are:

Movie Type / Male / Female
Action / 42 / 20
Comedy / 27 / 36
Drama / 15 / 31
Thriller / 26 / 16
Sci Fi / 14 / 2

Create a split-bar graph to display these data and state any conclusions that can be made.

2.The enrollment at a high school is divided between grades as shown.

Grade / Enrollment
9 / 160
10 / 132
11 / 170
12 / 138

a)State the sector angles that would be used to create a circle graph to represent these data.

b)The school has $40 000 to spend on text books. How much should be spent on each grade if the money is to be distributed based on enrollment?

3.A botanist is testing a new fertilizer for apple trees. She measures the mass of apples from trees without the new fertilizer and the mass of apples from trees with the new fertilizer. Her results are shown in the following tables.

Apple Mass Without New Fertilizer (g) / Apple Mass With New Fertilizer (g)
127 / 133 / 191 / 178
148 / 145 / 154 / 191
109 / 151 / 102 / 201
168 / 138 / 176 / 149
129 / 115 / 143 / 145
135 / 197 / 105 / 169
131 / 169 / 195 / 182
148 / 108 / 204 / 222
145 / 142 / 196 / 139
111 / 148 / 167 / 159

a)Create a histogram for each set of data.

b)What conclusions can be made about the effectiveness of the fertilizer?

4.The following chart shows the exchange rate of the American dollar (the cost in Canadian dollars to purchase one American dollar).

Year / Exchange Rate
1990 / 1.16
1991 / 1.153
1992 / 1.1508
1993 / 1.2817
1994 / 1.3217
1995 / 1.406
1996 / 1.356
1997 / 1.3761
1998 / 1.4252
1999 / 1.527
2000 / 1.4433
2001 / 1.4935
2002 / 1.5974

a)Create a broken-line graph to represent these data.

b)Describe the trend shown.

c)Draw a line of best fit and use this line to predict the exchange rate in the year 2010.

5.Identify when each type of graph would be used:

Bar Graph Circle Graph Box-and-Whisker Plot Histogram Broken-Line Graph

6.The following table shows the commuting distances to work for the employees at a company.

Commuting Distance (km)
17.8 / 4.2 / 12.6 / 10.1
5.0 / 6.3 / 7.2 / 15.0
27.4 / 11.4 / 25.4 / 9.1
13.0 / 23.9 / 2.3 / 13.3
13.2 / 3.4 / 20.0 / 9.4

Create a box-and-whisker plot to represent these data.

7.Archeologists often rely on relationships between body measurements when they analyze ancient human fossils. The following table shows the measurements of the femur (thigh bone) and overall height of 50 adults.

Relationship of Human Femur Length to Human Height (cm)
Femur / Height / Femur / Height
38 / 149 / 50 / 184
46 / 169 / 41 / 171
38 / 152 / 47 / 187
38 / 147 / 42 / 162
44 / 190 / 42 / 168
48 / 191 / 39 / 151
46 / 190 / 43 / 169
39 / 152 / 38 / 143
39 / 152 / 29 / 104

a)Create a scatter plot to display these data.

b)How strong would you describe the correlation to be?

c)According to this line of best fit, how tall would a person be with a 38-cm long femur?

8.The following table shows the average annual cost of gasoline (in cents/L) in Ontario.

Year / Cost (cents/L)
1989 / 50.0
1990 / 56.2
1991 / 54.8
1992 / 53.8
1993 / 52.5
1994 / 51.4
1995 / 54.0
1996 / 57.1
1997 / 57.2
1998 / 52.5
1999 / 58.6
2000 / 71.2
2001 / 66.6
2002 / 61.7

a)Create a scatter plot for the data

b)Draw the median-median line, and use it to estimate the cost of gasoline in 2010.

c)Describe how to change the vertical scale so that the increase in gasoline costs is exaggerated.

9.Explain the terms coefficient of correlation and coefficient of determination. How are these values related, and what information is given by each of these coefficients?

10.From the following list of variables, choose pairs that would have each of the indicated correlations. Give reasons for your choices.

Arm length / Distance from home to school / Time spent on homework / Math mark / A sister’s history mark
Shoe Size / Number of minutes to walk to school / Time spent watching television / Number of missed classes / Colour of sister’s eyes

a)strong positive correlation

b)strong negative correlation

c)weak positive correlation

d)weak negative correlation

e)no correlation

11.A supermarket keeps track of the amount of bananas sold each week. The following table shows the price of bananas for various weeks and the amount sold.

Cost ($/kg) / Amount Sold (kg)
0.65 / 408
0.72 / 363
0.95 / 321
1.10 / 242
1.18 / 223
1.30 / 236
1.42 / 166
1.60 / 95

The produce manager knows that due to a tropical storm, the price of bananas is being raised to $1.80/kg. How many kilograms of bananas should she order for the next week?

12.The following graph illustrates the increase in Internet users. Explain why the graphic is misleading.

Unit 1 Review Suggested Solutions

1.

Action and Comedy movies are the most popular types of movies among those surveyed. Among females, comedies are the most popular and Sci Fi are the least popular. Among males, action movies are the most popular and dramas are the least popular.

2.a)

b)Each grade’s funding should be proportional to the enrollment.

Grade 9:  40 000 $10 667

Grade 10:  40 000 $8800

Grade 11:  40 000 $11 333

Grade 12:  40 000 = $9200

3.a)

b)It can be concluded that the apples with the new fertilizer have higher masses than those without the new fertilizer.

4.a)

b)The exchange rate has increased fairly steadily since 1990.

c)If the current trend continues, the exchange rate will be 1.815 in the year 2010.

5.Bar Graph: Used to display the frequency of data in discrete categories

Circle Graph: Used to display the relative frequency of various categories in a sample

Box-and-Whisker Plot: Used to display the spread of data, including minimum and maximum values, median, and medians of the lower and upper halves of the data

Histogram: Used to display the frequency of data in continuous class intervals

Broken-Line Graph: Used to show the trend of data over time

6.a) Minimum = 0.4 km Maximum = 27.4 km

Median (average of 20th and 21st data values) = 12.85

Median of lower half (average of 10th and 11th data values) = 7.55

Median of upper half (average of 30th and 31st data values) = 19.45

7.a)

b)

This indicates a strong positive correlation.

c)A person with a femur length of 38 cm would have a height of 154.5 cm.

8.a)

b)The estimated cost of gasoline in 2010 is 72.1 cents/L

c)The change can be made more dramatic by altering the vertical scale.

9.The coefficient of correlation, denoted r, gives the strength and direction of the relationship between two variables. An r value close to –1 indicates a strong negative correlation. A value close to +1 indicates a strong positive correlation. An r value close to 0 indicates no correlation between the variables.

The coefficient of determination is calculated as . It gives the relative strength of the relationship between two variables. Specifically, it determines the amount of variation in the dependent variable that is due to variation in the independent variable.

10.a)There is a strong positive correlation between the distance from school and the # of minutes to walk to school. This is true because most people walk at about the same speed and so the relationship is quite predictable.

b)There is a strong negative correlation between time spent on homework and time spent watching television. Spending a lot of time at one of these activities means spending little time at the other.

c)There is a weak positive correlation between arm length and shoe size. In general, taller people have longer arms and larger feet, but there are exceptions.

d)There is a weak negative correlation between a math mark and the number of missed classes. In general, people who miss more classes have lower marks and vice versa.

e)There is no correlation between a sister’s history mark and the colour of her eyes.

11.Using a line of best fit, she should expect to sell approximately 47 kg of bananas.

12.The number of users in 2002 is 6 times the number of users in 1998. However, the picture of the computer in 2002 is more than 6 times larger than the computer in 1998. Each dimension is 6 times longer, making it actually 36 times larger. Therefore, the size of each picture is not representative of the number of Internet users.

Unit 2 Review

1.In 1995, the American estimate for the percentage of properly vaccinated American children was raised from 67% to 75%. The new, higher estimate was based on 25 000 telephone interviews which were followed up by immunization record checks. The earlier, lower estimate was based on data collected from a more general national health survey. Describe what type of bias resulted in the lower estimate and explain how it could have occurred.

2.Large polling organizations typically interview a random sample of 1200 to 1500 Canadians when conducting political polls. Explain why the number of people interviewed is often doubled just before a federal election.

3.A biologist wants to compare the growth rates of rats on two different diets. She has 20 rats labelled 01 to 20. Rats 01 to 10 are male and rats 11 to 20 are female. Ten of the rats will be randomly assigned diet 1 and the remaining 10 will be assigned diet 2. A random starting point is selected in a random number table. It is decided to start at that point and move horizontally across the rows, ignoring spaces. An excerpt from that random number table starting at the chosen starting point is:

24005 / 52114 / 26224 / 39078 / 80798 / 15220 / 43186 / 00976
85063 / 55810 / 10470 / 08029 / 30025 / 29734 / 61181 / 72090
11532 / 73186 / 92541 / 06915 / 72954 / 10167 / 12142 / 26492
59618 / 03914 / 05208 / 84088 / 20426 / 39004 / 84582 / 87317

The first 10 rats chosen will receive diet 1 and the remaining rats will receive diet 2. What are the labels of the rats who will receive diet 1? How many are male? What type of sampling method is being used? Suppose males grow faster than do females. Should the sampling method be adjusted? If so, how?

4.The administration of a university believes that living in residence during first year is beneficial to students' academic performance. To see if this is true, they will consider the living arrangements and academic achievements of 200 first-year students. Describe how a sampling study should be carried out to evaluate the administration's claim. Indicate the population and the sampling procedure you would use to do this. Then identify at least one categorical and one quantitative variable that should be measured.

5.A member of parliament wants to know what constituents think about proposed immigration legislation. His assistant reports that he has received 115 letters from constituents and that 85 or 74% of the letter-writers oppose the new legislation. Identify the population of interest in this situation, being as specific as possible. The 115 letters come from a sample of the population. Explain why it is biased and whether the true proportion of the population that opposes the legislation is probably greater than or less than 74%.

6.Explain how cluster sampling and stratified sampling both use simple random samples as part of the sampling process.

7.The word population is a common word in the English language. Explain how it is used differently in casual conversation than in statistics.

8.A secondary school has 800 students, 200 in each of grades 9, 10, 11, and 12. How many distinct random numbers are necessary to select a sample of size 40 for each of the following scenarios.

a)a simple random sample of 40 students is to be selected

b)students are ordered by student number and a systematic sample of 40 is to be selected

c)a stratified sample of 10 students from each grade level is to be selected

d)a multi-stage sampling procedure is used: two grade levels are randomly chosen and then 20 students are randomly selected from each of the chosen grades

9.A student enrolled in MDM 4U would like to conduct a survey of a random sample of grade 9 students at his school. However, the school administration will not give him a list of all grade 9 students. He does have a list of all grade 9 home room classes. Explain why it is impossible for him to obtain a simple random sample of grade 9 students for his survey.

10.A large study is to be conducted to determine if taking aspirin reduces the risk of having a heart attack. Approximately 10 000 participants took an aspirin every other day, while 10 000 participants took a pill that looked and tasted like aspirin but had no active ingredients. Neither the participants nor the attending doctors knew which type of pill was being taken by each participant. For this experiment, identify the treatment group, the control group, the placebo, if any, and whether or not the experiment is double-blind.

11.A soft drink manufacturer is interested in surveying customers' opinions on the tastes of 5 new drink flavours. The flavours are A, B, C, D, and E. What information could the manufacturer derive from a ranking question that it could not derive from a checklist question?

12.A recent American study on underage drinking claimed that people under the age of 21 drink 25% of the alcohol consumed in the U.S. Note that the age where drinking alcohol becomes legal in most states is 21. The study was based on data from an annual survey of 25 500 Americans, approximately 10 000 of whom were between 12 and 20 years old. Underage drinkers accounted for about 25% of all people in the survey who drink alcohol. However people between 12 and 20 actually constitute less than 20% of the American population. Explain how the 25% estimate is biased. Then state the type of bias and whether it is an over- or under-estimate of the true quantity of alcohol consumed by underage drinkers.

13.Illustrate how the amount of time a family uses the internet in the past month can be measured three ways. The three ways are using a qualitative variable, using a discrete quantitative variable, and using a continuous quantitative variable.

14.A university has 20 000 students, distributed as follows: 3000 in graduate school, 2000 in professional schools, and the remainder in undergraduate studies. Of the undergraduates, half are in an arts program and half are in a science program. Describe how a stratified sample of 400 students should be chosen to represent the students at this university.

15.Surveys on smoking can be used to estimate the number of cigarettes sold in Canada. People are asked how many cigarettes they smoke on average per day. However, past surveys have resulted in estimates of the number of cigarettes sold that are significantly lower than counts of cigarettes sold that have been collected from cigarette sales data. Explain how this could happen.

Unit 2 Review Suggested Solutions

1.The earlier estimate occurred because of response bias. The interviews that resulted in the new estimate were followed up with record checks. Thus, we can assume that they are reasonably accurate estimates of the number of properly vaccinated children. Because the previous estimate relied on data from a more general survey (and was presumably not verified), respondents likely did not accurately remember when or if their children had been vaccinated.

2.The larger the sample size, the more precise are the estimates resulting from the survey. Just before an election, the polling organizations want to be able to give the estimates of the percent of Canadians who will vote for each party with greater precision.

3.Rats 02, 06, 09, 10, 11, 14, 15, 17, 18, 20 will receive diet 1; 4 of these 10 rats are male. Simple random sampling was used. If male rats do grow faster than female rats, it may be desirable to have an equal number of males and females in each diet group. In that case, stratified random sampling should be used where the strata are male and female and 5 rats are chosen from each group for diet 1.

4.The population is all first-year students at the university. The administration should conduct a stratified sample, randomly choosing 100 first-year students who live in residence and 100 first-year students who do not. A categorical variable that should be measured for each participating student is whether or not the student lives in residence (another possibility: major). A quantitative variable that should be measured is each participating student's academic average.

5.The population of interest is all people of voting age who live in the member of parliament's riding. The sample is biased because it is not a random sample. It is self-selected by the people who wrote the letters. They probably feel more strongly about the legislation and are more likely to be against it than the overall population. Thus, the true proportion of the population that opposes the legislation is probably less than 74%.

6.In both cluster sampling and stratified sampling, the population is divided into groups—clusters or strata. In cluster sampling, a simple random sample of the clusters is taken. Then every member of the population in the chosen clusters forms the sample. In stratified sampling, every stratum is used. Then from each stratum, a simple random sample of members of that stratum is chosen to participate.

7.In casual conversation, population is used for the people living in the location such as the city or country being discussed. In statistics, population is used for the individual people or objects represented by a sample. Also in statistics, it may only represent a subset of all the people living somewhere. For example in a study about women's health, the population would only include women.

8.a)The simple random sample requires 40 distinct random numbers.

b)The systematic sample requires one random number to pick the starting point.

c)The stratified random sample requires 10 random numbers for each grade or 40 in total.

d)The multi-stage sampling procedure requires two random numbers to pick the grade level, and 20 for each chosen grade to pick the individual students or a total of 42.

9.In order to obtain a simple random sample of grade 9 students, he would need a list of all grade 9 students. The first step in obtaining a simple random sample is always to identify and label every member of the population.

10.The treatment group is the group taking aspirin. The control group is the group taking the pill with no active ingredients. The placebo is the pill with no active ingredients. The experiment is double-blind because the participants and the attending doctors do not know which pill is being taken by which participants.

11.In a checklist question, the customers will only be able to check off the answer to a simple yes/no question such as whether or not they like the flavour. The designer of the survey could use a ranking question to find out which flavour customers liked best, second-best, et cetera. This may be helpful if the company only wants to manufacture some of the flavours.

12.This is an example of household bias. Almost 40% of the people surveyed were teenagers, which over-represents their proportion in the population. This will lead to an over-estimate of the percentage of alcohol that is consumed by underage drinkers.

13.When a qualitative variable is used, the family is classified as a heavy, medium, or light user. When a discrete, quantitative variable is used, the number of hours or partial hours in the past month that a family member used the internet is counted. When a continuous, quantitative variable is used, the time that the internet was used as a continuous variable is recorded as precisely as required.