A Sample Practice Exercises to Prepare Students for the AP Exam
On the morning of January 8, 2003, USAirways Express Flight 5481 crashed on take-off from CharlotteDouglasAirport, killing all 21 people on board. The National Transportation Safety Board (NTSB) conducted a thorough investigation of both flight 5481 and flight 5492, another flight that serves the same route, but flew the day before the crash during the evening of January 7, 2003. The accompanying dataset contains the data collected about the passengers of both flights. After analyzing these data, the NTSB identified two factors that may have caused the crash: the extent to which the plane was weighted down by passengers and cargo, and maintenance issues. The pilots knew at the time of take-off that the plane was within 100 pounds (according to their best estimates) of the flying limits for the Beech 1900 aircraft.
In response to the crash, the NTSB required airlines that operate planes with a passenger capacity of 10 to 19 passengers to weigh both passengers and baggage over a three day period during February, 2003. The NTSB included 400 flights in their sample. The purposes of the study were: a.) To investigate whether the current weight estimates for both bags and passengers were still accurate and b.) Determine whether and how often planes in this class are operating close to their safe weight limits. Prior to the study, the NTSB used the following estimates for passenger weight:
Winter Mean: 165
Summer Mean: 160
After the study the weight estimates were changed to:
Winter Mean: 170
Summer Mean: 165
The NTSB estimate for mean checked bag weight remained the same at 23.5. Passengers are allowed to check two bags and carry one onto the plane. Passengers must pay an extra charge for any bags weighing over 70 pounds.
For each of the following problem sets, if you were to answer the questions conclusively for the NTSB, naturally the full data set from the 400 flights would be preferable for many of the problems. However, you are to use the sample data provided from the two flights simply to replicate the analyses that the NTSB conducted prior to the large national study. Of course the smaller dataset is also more manageable for the purpose of this review exam. The NTSB did eventually conclude that these two flights were indeed representative of the population for the period immediately prior to the NTSB change in policy.
Problem Set #1
1. In the boxplot below, “chckwt” can be interpreted as the weight in pounds for all of the baggage checked by each passenger. “Mnchckwt” can be interpreted as the average weight per bag checked for al passengers. The numbers next to specific points on the graph indicate the passenger number. Using only the boxplot below, what can you conclude about passengers number 5 and 7? Be sure to mention all that you can determine from the graphs.
2. Compare and contrast the two boxplots below. Explain why the two distributions look so similar in the 0 to 20 pound range?
3. What do these graphs tell you about the NTSB decisions to keep the estimate for mean checked bag weight at 23.5 pounds?
4. Using the graph below, compare and contrast the distributions of weight per bag for carry on and checked baggage. What does the graph tell you about the weights of most carry on bags compared to the weights of checked bags? How does the carry on bag for passenger number 27 compare to the weights of checked bags for this sample of passengers?
5. Here is the five number summary for passenger weights on Flight 5481: 113, 149, 172, 182, 349. Based on this information, what conclusions can you draw about the weights of the passengers on this flight relative to the existing NTSB estimates for passenger weight that were in effect at the time of the flight?
6. Using the boxplot below and the five number summaries provided, would you advise the NTSB to create separate weight estimates for men and women? Justify your answer.
Compare the existing NTSB estimates for passenger weight that were in effect at the time of the flight to information provided below. Would you qualify your response to problem 5 based on this information?
Problem Set #2
1. The NTSB issues a report prior to conducting the national study that included preliminary findings from their analyses of the data from the two flights in January. In the report they stated that the 400 flights they planned to study would constitute a sufficient sample size to insure that they could be 95% confident that their estimate of the percentage of flights that took off within 500 pounds of the total weight limit for the Beech 1900 would be within 5 percentage points of the true population percentage. Is the NTSB justified in making this statement? Explain your answer using statistical evidence.
2. What factors would the NTSB need to consider in designing the sampling plan for the national study so that the sample would be representative of the population?
3. Explain how you would design a sampling plan, using stratified random sampling, such that at least one of the issues you addressed in problem 2 is incorporated into the plan. Include details regarding how the sample would be selected.
4. The NTSB found the following results across the 400 sampled flights.
Mean number of passengers flying per flight: 17.56
Mean total weight per passenger (sum of passenger weight, checked weight, and carryon weight): 206.83, SD = 55.79.
Create a 95% confidence interval around the NTSB estimate for total weight per passenger. Interpret this interval in the context of the problem.
5. Create a 95% confidence interval around the NTSB estimate for the total weight accounted for by all passengers and their baggage on a flight of this type. Interpret this interval in the context of the problem.
6. The table below lists the mean, standard deviation, and variance for passenger weight, weight of checked baggage, and weight of carry on baggage for the 38 passengers on the two flights. What information do these statistics provide, if any, about whether passenger weight, weight of checked baggage, and weight of carry on baggage are correlated?
7. If the NTSB decides to use simple random sampling to select the 400 flights, and the population percentage of flights of this type that are evening flights is 27%, what is the probability that the NTSB will select 3 evening flights in a row during the sampling process?
Problem Set #3
1. The tables below contain the results of conducting a regression analysis using the number of bags checked as the predictor variable and the total weight of checked baggage per passenger as the dependent variable. The NTSB conducted these analyses as an exploratory test of whether people who check more bags also check heavier bags. Write out the regression equation that could be used to predict the total checked weight per person given knowledge of the number of bags checked. Interpret the intercept and slope in the context of the problem.
Model Summary
Model / R / R Square / Adjusted R Square / Std. Error of the Estimate1 / .802(a) / .643 / .633 / 12.77491
a Predictors: (Constant), checked
2. Interpret the scatterplot and residual plot below. Comment specifically on whether there are any features of these plots that would cause you to qualify your interpretations of the regression results.
3. Suggest an alternative method for determining whether passengers that check more bags also check heavier bags. Conduct a statistical test following this strategy and interpret the results in the context of the problem.
Problem Set # 4
1. Is there statistical evidence that passengers on flights such as flights 5481 and 5492 weigh more on average than they did when the NTSB set the winter population mean at 165?
2. Now consider only those passengers checking bags. Is there statistical evidence that passengers on flights such as 5481 and 5492 check bags that weigh more on average than bags weighed at the time the NTSB set the average at 23.5?
3. Based on the results you obtained for problems 1 and 2, make a recommendation to the NTSB about whether they should raise the mean weight estimates for winter passengers and checked baggage.
4. If you were to conduct the same analyses you conducted for problems 1 and 2 with the full sample size of 400 flights from the full study, and the means and standard deviations for the weight estimates you found in problems 1 and 2 remain consistent, would you expect the recommendations you made in problem 3 to change?
5. If the population mean and standard deviation for passenger weight is actually very similar to what you found in problem 1, what implications does this have for the conclusion you made in problem 1? What do we call this type of decision in statistics? Explain the consequences of this type of decision in the context of the problem.
6. Is there statistical evidence that day flights and evening flights are different with respect to the average passenger weight and the average weight of checked baggage?
7. Is there statistical evidence that male and female passengers on flights such as these are different with respect to the average passenger weight and the average weight of checked baggage?
Problem Set # 5
1. The NTSB is interested in knowing whether there is an association between gender (men vs. women) and how likely a passenger is to carry a bag onto the plane for flights like these. Using the data provided, use a statistical procedure to help the NTSB answer this question. Interpret your results.
2. What is your best estimate of the percentage of men and women that carry a bag on the plane? Place a 95% confidence interval around these estimates. Write a brief interpretation of the meaning of these intervals in the context of the problem.
3. Interpret the meaning of the confidence level you used in problem 2 in the context of the problem.
4. Using the intervals you created in problem 2, would you draw the same conclusion as you did in problem 1? Explain.
5. Now conduct a two sample test for proportions to answer the same question. Interpret your results. Compare your conclusions to those from problem 1. Explain any differences.
6. Which of your conclusions, problem 1 or problem 4, would you trust more, and why? What would your final recommendation be to the NTSB regarding the answer to their question?
7. Prior to the crash, the NTSB assumed that 25% of passengers on these types of flights did not check any bags, 50% checked one bag, and the remaining passengers checked 2 bags. These assumptions were based on a large study they conducted about ten years prior to the incident in question. The NTSB is interested in determining if Americans have changed their habits on these types of flights related to carry on baggage. Use a statistical procedure to determine if these assumptions are still valid. Interpret your results.
8. What other analyses using tests of proportions could be conducted using these data? For each suggested analysis, include the research questions and hypotheses that would be addressed.