LAB 4: Exploring Data by Box Plot & Probability Simulation

LAB 4: Exploring Data by Box Plot & Probability Simulation

Before you begin this lab, make sure that the Add-In component DDXL is available.

·  With your home PCs you may add the software from the CD inside the pocket of your textbook, following the instructions of the ‘ReadMe.txt’ file found in the Directory of D:\Software\DDXL(Excel Add-In)\Windows.

·  In the lab classroom, click the Start Button in the lower left-hand corner of the Desktop and choose the menu item Classes & Workshops. Then after seeing the list of menu items, you would click DDXL option. This would launch Microsoft Excel with DDXL loaded already along with the Analysis Tool Package, one of the Microsoft add-in components. (If you do not see the DDXL option, notify your instructor.)

Purpose: In this lab, we will use Excel to help us construct boxplots. We will also learn how to use Excel to simulate an event in order to find probabilities.

Creating a Boxplot

We will learn how to create a boxplot directly from a data set. We will be using the data from Data Set 4 in Appendix B entitled COTININE_4.XLS.

  1. Double click on the Connect to the Internet icon on the desktop.
  2. Click on Authenticate for Network Access.
  3. Fill in your username and password.
  4. Click Submit and then click on The University Portal, WINGS.
  5. Click on the website address and type in http://www.wright.edu/cosm/departments/math/resources/student/stt160lab.html.
  6. Click on COTININE_4.xls to open the dataset. In the File Download dialog box click on Save.
  7. When the Save As dialog box opens, click Save. Close the Download dialog box. (The default save location will be the lab computer’s hard drive. Your file will be automatically deleted once the lab computer is restarted. To keep your work, we recommend that you save your file to a flash drive or Skydrive through your RaiderMail.)
  8. To open Excel, click on the Start Menu, Productivity Apps, Office 2010 and Excel 2010.
  9. Open the data set COTININE_4.XLS which was saved on the lab computer’s hard drive by going to the File Tab and selecting Open. In the Open dialog box, find the COTININE_4.XLS file and double click on the COTININE_4.XLS file.
  10. Select the data you will be working with. There are several columns in this data set, but we will be working with the measured levels of serum cotinine for Smokers. The data we want is summarized in column A. Highlight column “A” in your worksheet, including the word title.
  11. Click on Add-Ins on the menu bar, and then click on the DDXL command and choose Charts and Plots. Choose the Function Type Boxplot. In the boxplot dialog box, click on the pencil located at the bottom of the Quantitative Variables section. This will put the cursor in the box immediately to the left of the pencil icon. In the box, type in the cells where the data you want to use is stored A1:A41. Make sure that the box for First row is variables name is checked. Then click on OK. Your boxplot and Summary Statistics box will appear.
  12. Once you have your boxplot, you can make modifications.

a)  Move the Summary Statistics box down to the bottom of the screen by clicking on the box and dragging to the bottom of the screen.

b)  Make the Summary Statistics box larger (to see all the information) by pulling on the diamond shape in the lower right hand corner of the box.

c)  The boxplot that is created is hard to read due to how small it is. To create a larger picture, click and hold on the diamond shape in the lower right hand corner of the Boxplot screen and drag this corner out to create a larger graph.

  1. Save your boxplot.
  2. Click on the boxplot and then go to File then Print to print out a copy of your boxplot.
  3. Click on the Summary Statistics box and then go to File then Print to print out a copy of your summary statistics.

Use your boxplot to answer the following questions.

  1. On the printout of your boxplot, label the 5 Number Summary.
  2. Explain how to use a boxplot show you the measures of center, variation, and distribution of a data set?
  3. What would an outlier look like on a boxplot?

Probabilities Through Simulation

The goal of many statistical studies is to collect data and to use that data to make decisions. Often collecting data or repeating a trial a large number of times can be impractical. With the use of computers we can often simulate an experiment.

The random number generator can be used to simulate a variety of statistical problems.

Suppose we toss two die 50 times, and we want to find the probability that the sum of the two numbers is 10.

To simulate tossing two die 50 times, use the following steps.

  1. Open a new worksheet through the File Tab.
  2. Type in “First Die” in cell A1, “Second Die” in cell B1, and “Sum” in cell C1.
  3. Click on the blank cell A2.
  4. To generate a random set of numbers, on the menu bar click on Formulas, then click on Math & Trig, and then select RANDBETWEEN. In the dialog box, type in 1 for the bottom and 6 for the top. This is because the smallest number when you toss a die is 1 and the largest is 6. Click OK.
  5. A random number from 1 to 6 will appear in A2. To get 50 random numbers, right click on cell A2 and click Copy, then highlight A3 to A51 and right click and click Paste. You can then press Escape to eliminate the box around the copied item. (Note: A2 may change to another random number.)
  6. Click on B2.
  7. Generate another column of 50 random numbers from 1 to 6 using the same method as above.
  8. Click on C2.
  9. Type in the formula =SUM(A2+B2) and press Enter.
  10. Copy and paste the formula from cell C2 down to C51.
  11. Save your worksheet.
  12. Click on a blank cell of the worksheet so that the entire sheet is active. Then select Page Layout on the ribbon and in the Scale To Fit section, change width and height to 1 page. Click on the File Tab and choose Print. Make sure your data looks correct. Then print out a copy of your data.

Use your simulation results to answer the following questions.

  1. Find the exact probability that you get a sum of 10 when two die are tossed (Hint: there are 36 possible outcomes when you toss two die.)
  2. Estimate the probability you get a sum of 10 using your simulation results: Count the number of times you get a sum of 10 and divide by the number of times you tossed the two die.
  3. What happens if you were to repeat this simulation? Will you get the same results? Explain why or why not.
  4. Suppose that instead of two die you wanted to flip two coins. Explain what you would enter into the RANDBETWEEN dialog box as Bottom and Top to make this simulation work?

Be sure to include the following in your lab report:

A.  Cover sheet with a summary of the lab you did. It should be one to two paragraphs.

B.  Answer all the questions stated above on the answer sheets provided to get full credit.

C.  Attach the boxplot, Summary statistics information and worksheet that you were asked to print.