Stat 100 Project 3

Purpose: To explore density histograms for the Uniform and Standard Normal sample distributions using random data, and to explore the properties of the Binomial distributions as a function of p and n.

We wish explore how histograms work in this project. Normally we only have one or a few data sets. We apply the histogram tool and we get what we get. How can we know how faithful the histogram of the sample is to the true histogram of the population? This project will help us get a feel for this.

We will create random data with known distributions and display the sample histograms. We can see how quickly histograms converge by doing this for different sample sizes and for different samplings.

We will also look at the population histogram for the binomial distribution to see how it behaves as n and p are changed.

Reading: Text, 5.6 on the Binomial Distribution, 6.3 on the Standard Normal Distribution, and page 139 on the Uniform Probability Distribution.

Turn in: A total of seven histograms (three uniform distributions, three standard normal distributions, and one binomial distribution) and the answers to the questions at the end of this assignment. A one or two-page submission is all that is required

Instructions:It is important that you have worked the first two projects; the instructions in this project assume that you have had at least this much experience with your spreadsheet program.

Click on “All” in the projects table and open the spreadsheet for Project #3. Click on the “Uniform” tab if this is not already the active worksheet. The first column of this worksheet contains 10,000 random numbers drawn from a population that is uniform on the interval [0, 1). The next two columns contain a list of bins and the count of the number of times one of the random numbers fell into that bin. The third column converts count into probability and the last column holds the bin center for the plots.

There are three plots. (If you have a small screen, you may need to click on “View” “zoom” and set the display scale so you can see them all.) Each plot depicts the sample histogram of the random data. Note the differences amongst the graphs.

The F9 key tells the spreadsheet program to recalculate everything, including creating another 10,000 random numbers. Hit the F9 key a few times and watch the graphs change as each new set of samples is generated.

We will next repeat the process for the standard normal. Click on the “Standard Normal” tab and you should see a worksheet that is similar to the previous. However the data in column 1 comes from a standard normal population.Notice also that the bins have been changed to be appropriate for the standard normal distribution.

Press the F9 key a few times and watch how the shapes of the sample distributions change for each new random sampling.

Lastly, we will look at the binomial distribution. Click on the “Binomial Distribution” tab. You will see a population histogram for a binomial distribution with n = 30 and p = 15.

Change the probability in cell d2 and watch the graph change. Try 0.01, 0.05, 0.1, 0.2, up to 0.90, 0.95, and 0.99, until you get a feel of what the binomial distribution looks like as a function of the probability. Note how the height of the bars changes with the probability as well as the position of the peak bar.

Next type 15 into cell d3 and “Expected Value” in cell e3. Type “=d3/d1” into cell d2, so that the expected value of the distribution is 15. (Remember that the expected value of a binomial distribution is np, or p = E/n.) Now change the expected value and watch how the histogram changes. Try 0 through 9, 15, 25, 29 and 30. Now change n to 60 and repeat, to get a feel of the shape of the distribution.

Printout:Select the “Output” spreadsheet. Edit the title of the first graph to change “Student Name” to yours. Previewthe page.Adjust the “Page Setup” if necessary to see everything on one page.Print this one page and save your work.(Use care here – it is possible to send over 400 pages to the printer if you are not careful.)

Questions:

1. One hundred samples might provide a reasonable estimate of the mean and standard deviation of a distribution. How many samples do you think are required to get a reasonable estimate of the distribution shape itself? Why do you think it takes so many more?

2. Does it appear that more or less samples are required to get a reasonable estimate of the standard normal distribution than a uniform distribution?

3. The area of a distribution is always unity. Yet looking at the uniform and standard normal graphs, you might not see this. Explain why the areas on the uniform and standard normal distributions are indeed the same. (Hint: check the y-axis.)

4. What is the difference between a sample histogram and a population histogram?

5. The binomial distribution sometimes looks like it might be well-approximated by the normal distribution. Sometimes though, it does not. When is that? Can you support the statement in the green box on Page 245? What about the footnote on this page?