Projects on Estimation and Hypothesis Testing

General Guidelines

This assignment is due on Wednesday, December 6th at the beginning of class. Papers that are late or not stapled will not be accepted. Leave time for unforeseen emergencies. You are to choose one of the following three assignments for your project.

Your work on this project must be done in Excel. Except where specified in the instructions, you may not use my Excel templates for your work. Generate the appropriate Excel formulas for your problem without reference to my templates.

Standard Operating Procedures (SOP) for Excel Printouts

  1. Turn gridlines on and turn labels on. Your resulting printout should then be gridded, and each row and column should appear with a numerical or letter label. You can find these options on the File/Page Setup menu by clicking on the Sheet tab that appears.
  1. Adjust the column widths of your work so that the columns are wide enough to contain the data, yet fit well on the page. When printing, look at both landscape and portrait orientation for your printout, and choose the one that fits best. Your problem must fit on a single page. You can make your data fit on one page by checking the “fit to 1 page wide by 1 page high” box. If you do this, make sure that you haven’t made so many columns (or rows) that the printing is too tiny to read!
  1. Print two copies of your sheet: one with the numeric values, and one with the formulae that generated them. You can switch between these views by using the control-~ key combination. Note that you’ll probably need to resize your column widths when you display formulas. Failure to do so might cut some information off.
  1. Clearly label the meanings of the data that you enter. If the population mean for the number of rabbits caught is 35 per week, indicate this (e.g., “population mean”, “mean number caught (pop)”, etc.). Make sure that it’s clear where your answer is. Although my templates are more detailed than you will need for this work, they can give you some ideas for organization.
  1. Any graphs in your work should be properly identified and labeled. This includes title, axis labels, and (if more than one series is in the graph) a sensible legend.

Project #1: The Name Game

Go to the website Scroll to the bottom of the page and select the top 1000 baby names for people born in the same decade that you were born. Find your own first name on this list. (If your name is not on the list, you cannot do this particular project.) Read the information given above the data, as it is relevant to your work. Treat the information at the website as applying to a sample from a much larger population.

Your tasks are as follows:

  1. We’ll define your “cohort” as the set of all Americans of your gender who were born in the same decade as you were. (My cohort, for example, would be American males born in the 1950s.) Build the 98% confidence interval for the fraction of people in your cohort who share your first name.
  1. At the 2% level of significance, determine if there is evidence that your name is less popular (in terms of the fraction of children given that name) in the 2000s cohort than it was in your own cohort. You may use my Excel spreadsheet for this, but must clearly explain your answer.
  1. The website includes the top 10 names for all years from 1880 until 1997. Assume that this list is correct for the entire US population for those years. Generate a simple random sample of 10 of those years and note the number of letters in the most popular girl’s name for each of those 10 years. Test the null hypothesis that the average number of letters in the most popular girl’s name during the period 1880 to 1997 is 4.5 letters long. Use a 10% level of significance.

The report should be professional in appearance, done with a word processor and Excel, and should include:

  • A cover page including your name, your section, and the name of the project . Also include a signed honor pledge: “On my honor, I have neither given nor received unauthorized assistance on this project.”
  • The text of each of the tasks above, along with your answer for that task. This writeup of your answer to the task should include
  • all relevant data raw data (such as the names selected for task 3).
  • a check, where appropriate, of the assumptions needed to apply a statistical method. (You should conduct the statistical analysis even if the assumptions are not met, but should comment on this in your conclusions.)
  • your calculations, shown on an Excel spreadsheet, with correct notation used. (You may write “x-bar” for sample mean, “sigma-sub-p” for the standard error of the proportion, and so on.) You may not use my spreadsheets for this work. Except as noted in task 2, you may not used my Excel spreadsheets in doing this work. Note, in particular, that you will be giving me two copies of your spreadsheet: the numerical view and the formula view.
  • your final conclusions, interpreting the results of your spreadsheet. Your conclusions should refer to the particular task at hand. Don’t just tell me, “We reject the null hypothesis.” What was the null, and what does it mean to reject it?
  • A clear explanation of how you generated your simple random sample. Use the Excel “randbetween” function in generating your sample.

It’s up to you whether you print your spreadsheets on a separate sheet or imbed them in your Word document, but it’s probably easier for you to print them on separate sheets.

Project #2: Tornado!

Visit the website Use the site to gather the data on all tornadoes of intensity F2 or higher that occurred in Oklahoma during the 25 year period from January 1, 1969 to December 31, 2003. (The site should return information on all 548 such tornados, including the number of people killed and injured in each of these storms.) You can move the data from this site to Excel by highlighting all of the data and doing a cut and paste command.

Your tasks are as follows:

  1. Take a simple random sample of 50 storms selected from this population. Use the data from these 50 storms to build a 95% confidence interval for the average number of casualties per storm. (A person is a casualty if he or she is either injured or killed by the storm.)
  1. Use all of the population data (all 548 storms) to determine the actual average number of casualties per storm from this population. Compare the result of this calculation with the confidence interval you constructed, and discuss your findings. (It’s quite possible that your confidence interval “won’t work”. If it doesn’t think about possible reasons why.)
  1. Using the same sample of 50 storms obtained above, conduct the appropriate hypothesis test to see if there is evidence that someone was hurt or killed in more than40% of Oklahoma tornados of intensity F2 or higher during the period from 1969 to 2003. Use a 5% level of significance. Identify two changes that have taken place over the period of time from 1969 to 2003 that suggest that it may not be appropriate to use these results in predicting tornado casualties in 2007 from this data.

The report should be professional in appearance, done with a word processor and Excel, and should include:

  • A cover page including your name, your section, and the name of the project . Also include a signed honor pledge: “On my honor, I have neither given nor received unauthorized assistance on this project.”
  • The text of each of the tasks above, along with your answer for that task. This writeup of your answer to the task should include
  • all relevant data raw data (such as the information on the 50 storms that constitute your sample). You need not include the population data on all 548 storms.
  • a check, where appropriate, of the assumptions needed to apply a statistical method. (You should conduct the statistical analysis even if the assumptions are not met, but should comment on this in your conclusions.)
  • your calculations, shown on an Excel spreadsheet, with correct notation used. (You may write “x-bar” for sample mean, “sigma-sub-p” for the standard error of the proportion, and so on.) You may not use my spreadsheets for this work. Note, in particular, that you will be giving me two copies of your spreadsheet: the numerical view and the formula view.
  • your final conclusions, interpreting the results of your spreadsheet. Your conclusions should refer to the particular task at hand. Don’t just tell me, “We reject the null hypothesis.” What was the null, and what does it mean to reject it?
  • A clear explanation of how you generated your simple random sample. Use the Excel “randbetween” function in generating your sample.

It’s up to you whether you print your spreadsheets on a separate sheet or imbed them in your Word document, but it’s probably easier for you to print them on separate sheets.

Project #3: The Bible, Chapter and Verse

Get a copy of the Holy Bible, either the Revised Standard Version or the King James Version.

Your tasks are as follows:

  1. Obtain a simple random sample of 50 Bible chapters, and note the number of verses in each of these chapters.
  1. Use this sample to estimate the average length (in verses) of chapters in the Bible. Your margin of error for this confidence interval should be +3 verses. What is the confidence level for the interval that you have constructed?
  1. At the 10% level of significance, do you reject the null hypothesis that the average length of a Bible chapter is 21 verses? Base your conclusion on the 50 chapters in your simple random sample.

The report should be professional in appearance, done with a word processor and Excel, and should include:

  • A cover page including your name, your section, and the name of the project . Also include a signed honor pledge: “On my honor, I have neither given nor received unauthorized assistance on this project.”
  • The text of each of the tasks above, along with your answer for that task. This writeup of your answer to the task should include
  • all relevant data raw data (such as the information on the 50 Chapters that constitute your sample).
  • a check, where appropriate, of the assumptions needed to apply a statistical method. (You should conduct the statistical analysis even if the assumptions are not met, but should comment on this in your conclusions.)
  • your calculations, shown on an Excel spreadsheet, with correct notation used. (You may write “x-bar” for sample mean, “sigma-sub-p” for the standard error of the proportion, and so on.) You may not use my spreadsheets for this work. Note, in particular, that you will be giving me two copies of your spreadsheet: the numerical view and the formula view.
  • your final conclusions, interpreting the results of your spreadsheet. Your conclusions should refer to the particular task at hand. Don’t just tell me, “We reject the null hypothesis.” What was the null, and what does it mean to reject it?
  • A clear explanation of how you generated your simple random sample. This is crucial for this project. Use the Excel “randbetween” function in generating your sample.

It’s up to you whether you print your spreadsheets on a separate sheet or imbed them in your Word document, but it’s probably easier for you to print them on separate sheets.