Lab 1 the Sampling Distribution Using Excel

Econ 488 – Applied Managerial Econometrics

Cameron Kaplan, Fall 2010

9/3/10

Lab 1 – The Sampling Distribution Using Excel

(Adapted from Hunt and Mashhoundy, 1997)

Directions: Complete this assignment using MS Excel. Your write-up should be typed and be no longer than 3 pages (including graphs). Due in class next week.

Part 1: Sampling from a normal population

Suppose we are concerned about the speeds of cars on a local highway. It is reasonable to assume that the speeds follow a normal distribution with mean 50 mph and standard deviation 16 mph. These can be simulated using the NORMINV function in Excel. We shall take 100 samples, in rows 16 to 115 of the worksheet.

Part 1 Procedure:

1.In B15 enter the value 1 to indicate the first car in the sample.

2.Suppose we only observe the speed of one car on the highway, drawn at random. In B16 enter the formula =NORMINV(RAND( ), 50, 16). This generates a random normal variable with a mean of 50 and a standard deviation of 16.

3.In Column A we are going to calculate the sample means so in A15 enter the label "Means"

4.In A16 enter the formula =AVERAGE(B16:Z16), allowing for sample sizes of up to 25. This relies on the spreadsheet ignoring empty cells when computing the average. For now, our sample size is just 1, so the mean will have the same value as the observation.

Now we want to repeat this 100 times, each time drawing a sample of size 1:

5.Select A16:B16 and copy down to B115. Format cells to 1 decimal place.

6.Select A16:A115 and name it Means (using Insert Name Define). Initially the sample size is 1, so the means equal the sample values and give a picture of the population itself.

The spreadsheet should now look like Figure 1.

7.Enter the label "Limits" in A1 and the limits 10, 20, ... , 100 in A2:A11.

8.Select A2:A11 and name it Limits.

9.Enter the midpoints 5, 15, ..., 95 in B2:B11. These are required because Excel labels the midpoints of bars on a column chart, rather than the boundaries.

10.To set up the frequency tally, highlight C2:C11 and enter the array formula=FREQUENCY(Means, Limits) by pressing CTRL and SHIFT and ENTER.

11.Use the Chart Wizard to construct the histogram of B1:C11.

12.The formulae for the summary statistics in Row 13 are =MIN(Means), =MAX(Means), =AVERAGE(Means), =STDEV(Means).

The completed spreadsheet is shown in Figure 2 where, to facilitate increasing the sample size, rows 19 to 113 have been hidden (using Format Rows Hide). To increase the sample size, simply select B15:B115 and use the drag-handle (or Copy and Paste) to copy the formulae and heading across to column C and beyond. To reduce the sample size, simply drag the selection back again (or Delete).

Figure 1: Initial Spreadsheet

A / B / C
14
15 / Means / 1
16 / 53.1 / 53.1
17 / 62.6 / 62.6
18 / 55.2 / 55.2
19 / 47.1 / 47.1
20 / 63.3 / 63.3

Figure 2: Competed Spreadsheet.

Part 1 Write-up:

Create a table showing the mean and standard deviation (or standard error, as it is commonly referred to in this context) of the distribution of sample means for several different sample sizes. Include at least 4 different sample sizes. Note that if you want to create a sample size larger than 25, you will have to revise the code from step 4.
What happens to the mean of the sample means as the sample size is increased? What happens to the standard error of the sample means as the sample size is increased? Can you find the implied standard deviation of the sample? (Hint: see Studenmund pg. 555) Put this into your chart.
Based on what you just found, why might a larger sample size be better than a small sample size?

Part 2: Sampling from a Uniform Population

Now suppose that each motorist surveyed is asked to estimate what percentage of their journey distance has been completed. It is reasonable to assume that this is uniformly distributed between 0% and 100%. To simulate data from this population requires the simple formula =100*RAND( ).

Procedure:

Clear out the sample values in columns C to Z.
Change the formula in B16 to =100*RAND( ) and copy down to B115.

Write-Up Part 2:

Create a table similar to the table in part 1, showing the mean and standard error of the distribution of sample means for several different sample sizes. Also include the implied standard deviation of the sample. Include at least 4 different sample sizes.
What happens to the sampling distribution as the sample size increases (Hint: look at your histogram)? How does this relate to the Central Limit Theorem?

Part 3: Sampling from a Skewed Population

Suppose we now turn our attention to the time-gap between each car passing a certain point. These inter-arrival times could follow an Exponential distribution, which is skewed towards smaller values. We shall assume that one car passes every 30 seconds on average, so that the population mean and s.d. are both 30. These times can be simulated using a transformation of RAND( ).

Part 3 Procedure:

Clear out the sample values in columns C to Z.
Change the formula in B16 to = - 30*LN( 1 - RAND( ) ) and copydown to B115.

Note that one or two means may exceed 100 and are therefore missing from the histogram.

Part 3 Write-Up:

Create another table, showing the mean and standard error of the distribution of sample means for several different sample sizes. Also include the implied standard deviation of the sample. Include at least 4 different sample sizes.
What happens to the sampling distribution as the sample size increases? Approximately how large must the sample be for the sampling distribution to look normal?

Part 4: Sampling from a Binary Population

Finally, consider the percentage of drivers who are female. Let 100 represent a female and 0 represent a male.Assuming an equal number of male and female drivers, we are thus sampling from a population consisting of an equal number of 0’s and 100’s, with mean 50 and s.d. 50.

Part 4 Procedure:

Clear out the sample values in columns C to Z.
Change the formula in B16 to =IF(RAND( )<0.5, 0, 100) and copydown to B115.

Part 4 Write-Up:

Create another table, showing the mean and standard error of the distribution of sample means for several different sample sizes. Also include the implied standard deviation of the sample. Include at least 4 different sample sizes.
Copy the histogram generated from a sample size of 1 into a word document, and label it. Copy the histogram from a sample size of 16 into the word document. Be sure to label both graphs. Comment on the differences between the two graphs and what this demonstrates.