Lab 1: Using Excel

Excel is a spreadsheet program. This first lab will focus on inputting data, creating charts, creating a histograms, and calculating descriptive statistics. You should open the Excel document that in your email (or downloaded from my website) and save it as Lab1M244S09ABAB.xls where ABAB is replaced with your initials and your partners initials.

Activity 1

Collect a bag of Skittles from the front of the room. This is your data set. (Do not eat your data set until you’ve recorded its specifics.) You will now create a categorical histogram of the color distribution of the Skittles. Sort the Skittles by color. Open the Excel worksheet I have sent you. Make sure the Tab “Your Skittles Package” at the bottom of the screen is selected at the bottom of the screen. Excel uses columns to organize the input. So, using your sample of Skittles, fill in the following chart in Excel:

A / B
1 /

Colors

/

Frequency

2 / Green
3 / Yellow
4 / Purple
5 / Red
6 / Orange

At this point, copy your frequencies on the note card you have and come and give them to me in the front of the room.

Now create a categorical histogram.

Step 1:In cell A7 type “Sum”. Every formula in Excel must begin with an equal sign, =. So, in order to have Excel compute the total number of Skittles in your bag, in cell B7 type “=sum(B2:B6)”. (Instead of typing “B2:B6”, you can highlight those cells and Excel will complete your formula.)

Step 2:To create the histogram, you will need to create a column that contains the relative frequency of each color. The cell C1 is labeled “Relative Frequency”. In cell C2 you want to have Excel compute the ratio of green Skittles to the total number of Skittles. You can do this by typing “=B2/B7”. You can repeat this process to compute the relative frequency of yellow Skittles by typing “=B3/B7” in cell C3. However, there is a shortcut! So, type “=B2/B$7” in cell C2. The dollar sign, $, tells Excel to fix the character that follows it. In this case, Excel will fix the eighth row. Highlight the cell, C2. Move your cursor to the bottom right corner of the cell. You should notice that your cursor changes. It will change from an outline of a plus sign to a plus sign. Once your cursor is a plus sign drag your cursor down to C6. What is the sum C2 to C6? What should it be? Check it by typing “=sum(C2:C6)” in cell C7.

Step 3:Now create a histogram or bar chart. First highlight the relative frequencies in cells C2:C6. On the Insert menu select Chart. The chart wizard should pop-up. So, select “column” as your Chart type. Five different bar charts will now appear at the top of the screen. Click on the first of the five charts, which is called a “Clustered Column”. This will produce the bars with the correct heights, but the columns will not be labeled with the colors. To add the category data for the horizontal axis move the cursor over the chart, hold down the Controlkey, and click the mouse. Select Select Data… You will now see a window that looks like this:

Add in the Category (x) axis labels by clicking in this box and then selecting the cells with the colors in them. Alternatively you could type in 'Your Skittles Package'!$A$2:$A$6 .

Step 4: Now customize your chart. Click on the Toolbox button at the top of your window. Click anywhere on your chart. You should now see the Formatting Palette. Customize your own chart. For instance if you click on one bar so that it (and not all the bars) are highlighted, then under the Colors, Weights, and Fills menu of the formatting palette you can make the colors of the columns correspond to the colors of the Skittles! You can also under the Chart Options window place labels on top of each bar with the corresponding height. Save this excel document. When you are done with this part your screen might look something like this:

Activity 2

Now we are going to let Excel create a class distribution histogram for us. In the file that is provided there is a sheet titled Eruption Data containing two columns of data. This data is taken from several days in August 1978 and August 1979 one month of data for Old Faithful, a geyser in Yellowstone National Park and was obtained fromApplied Linear Regression, 2nd Edition, by Sanford Weisberg, pp. 231 and 234 Linear Regression, 2nd Edition, by Sanford Weisberg, pp. 231 and 234. The first column is the duration of eruptions in minutes and the second column is the time since the previous eruption in minutes. The data is data is sorted so that the eruption durations are ordered from smallest to largest but the intervals are placed next to the corresponding duration time and are not necessarily in order. We will not be using the second column for this part of the assignment but we may in a future Lab, however in your summary you might comment on any related observations between the two measurements.

First Create a Histogram for the Duration

Step 1:You need to decide on the number and size of the class intervals. (Excel calls these bins.) Since there are 272 observations, you should have about 16 intervals (why?). Decide on the range you are going to use. Make sure both the largest and smallest values are included in this range. Divide difference of the endpoints of your range by 16 and that will be the width of each bin. Now create a column of numbers in column C, that contains the endpoints of each class interval. For instance if you were going to divide the interval from 2 to 10 into 16 bins, each bin would be (10-2)/16 =.5 long. So the endpoints would be (2, 2.5, 3, 3.5, ….,10). If you choose your range to be width 4, these numbers will look relatively nice.

Step 2:You now need to count how many items go into each bin. DO NOT DO THIS BY HAND (I will take off points if you do). Instead you need to use the Frequency function. In cell D1 type =Frequency(A2:A273,C2:C17) and hit enter. A zero should appear in this cell. Next, select the range D1:D17 by highlighting those cells, press CONTROL+U, and then press ⌘+Z+RETURN. The frequencies should now appear in column D. The default for values on the border of the class intervals is the opposite of what we discussed in class, but don’t worry about this. Before you proceed stare at this column and see if it agrees with what you would expect by looking carefully at the data.

Step 3:In Column E calculate the Relative Frequencies as you did in Activity 1.

Step 4:In Column F calculate the midpoints of each of the class intervals. For example you should type =(C2+C3)/2 in cell F2. You can then drag this column down to calculate the midpoints of the other class intervals. This will be the column you will use for the labels of your histogram.

Step 5:Now create a bar graph as in Activity 1 using the relative frequency column as the y values and the Interval midpoint data as the x labels. You should also make it so your bars have no gaps in them by pressing CONTROL while clicking inside your graph and then selecting Format Data Series. A window will appear. Select Options and change the Gap Width to 0.

Step 6:Compute the mean duration of eruptions from the sample in two ways. Do these calculations in cells H2 and I2

Method 1:Find the sum of all eruption durations in the sample and divide by the number of measurements in the sample. You will need to use the commands “sum” and “count”.

Method 2:Use the command “average” to find the mean of the sample.

These two methods should result in the same number.

Step 7:Compute median, the lower fourth and the upper fourth. You should look up and use the built in median command to do this. Put these in cells J2, K2, and L2. (Since you ordered your data earlier this should be relatively easy)

Step 8:Now compute the standard deviation of the data set in cell M2 using the built in function stdev.

Step 9:Compute a Trimmed Mean of your choice and indicate what percentage you used.

Activity 3 (You may have to do this part on your own after class)

I will send you an Excel File using the data you recorded on the board. Copy this data into the sheet Class Summary in the workbook you have saved. Create a histogram just like in Activity 1, but using the totals for the entire class. Also compute the mean number of skittles in a bag and the standard deviation of skittles in a bag in the cells indicated.

What to turn in:

By email you must submit the Excel workbook file containing three sheets, one for each of the Activities. In addition you must submit a written report. In this report you should write a paragraph or two for each of the activities summarizing what you did and your results. In these summaries you should include a description of the behavior of the data. Also, you should comment on how your data from the Skittles project compared to the data of the entire class.

Lab by Maria Robinson and David Neel, revised by Jeff DiFranco 2009