Using Fathom to Organize Data

Chapter 1: Part A and Part B

Chapter 2: Part C

Part A: Smoking and Cancer

Finding the Data

1. Use data from Data and Story Library (DASL) http://lib.stat.cmu.edu/DASL/. Follow Links: POWER SEARCH > Type, “Smoking and Cancer” Choose the File “Smoking and Cancer” with Document Size 3451. Read what the file and the variables are about.

2. Using your mouse, start at Number of Cases: 44 and highlight to the end of the explanation of the fifth variable (Leuk). Copy this by clicking right mouse key or selecting copy from the edit menu. Do not close the Internet window.

Using Fathom

3. Open Fathom. From the tool bar, click on the “Text” icon. Now click in your main window and a text box appears. Right click and paste text. The box can be made larger by dragging the sides of the box.

4. Return to the Smoking and Cancer site and highlight the data, starting at the beginning of “STATE” to the end of the data table. Right click and copy (Method 1 for retrieving data).

5. Return to Fathom and click the “Collection” icon from the toolbar. Now click on the main window. Right click on box and Paste Cases. Objects that look like gold balls will now appear in the box. If you wish to rename the collection, double click on title, “Collection 1” beneath the box.

6. Double click on the Collection Box to open its contents. Under the “Comments” tab, record the URL and any links you used to access the Internet site.

7. Click once on your Collection box to highlight it (blue outline). From the toolbar, select the “Table” icon. Now click on the main window. The data from your collection should be in the table of values.

8. Make the table of values box wider so you can see all the headings. Notice that the last two headings are not properly imported into Fathom. Always verify that Fathom has properly imported your data.

Cleaning Up the Data

9. Double click on KIDLEUK and rename it KIDNEY. Double click on Attr6 and rename it LEUK.

10. Verify that the rest of the data is imported correctly. (You could make both Fathom and the Internet window smaller so that you can view both at the same time.) In Fathom, you can use the scroll bar to see all the data in the table of values.


Graphing the Data

11. Maximize space on the Fathom desktop by making your table of values smaller so that you can only see the attribute names and the first three cases.

12. Select the “Graph” icon and drag it onto the main screen. It will be empty. Now, grab and drag the attribute CIG (from your table) over to the graph. Drop it on the x-axis (You’ve made a Dot Plot!). Now grab the attribute BLAD and drop it onto the y-axis (You’ve now made a Scatter Plot!). The graph should say scatter plot in the top right corner.

13. Select the graph icon again. Place another graph on your Fathom desktop. Compare CIG with LUNG. Continue to compare CIG with the remaining attributes. (There should be four scatter plots in total).

14. Select the A tool and create another textbox. Explain the conclusions you can make from the graphs.

Linear Regressions using Fathom

15. If there is a trend, it is useful to do a linear regression. In Fathom, right click on the graph and select the Least Squares line. An r2 value will appear below the graph. To determine the correlation coefficient, take the square root of this value.

Changing Axis Scales

16. The basic way to work with axes in Fathom is to drag on the numbers of the axis. Dragging in the middle translates the axis, moving the range without changing the scale. Dragging closer to the ends expands or contracts the range, keeping the opposite end of the axis constant. Think of this action as zooming in or zooming out (NOTE: when you move the cursor over the axis, there is a little white hand icon that appears)

17.  You can double click on the graph and manually adjust the xmin, xmax, ymin, and ymax. To return to original scale, select Rescale Graph Axes from the Graph menu.

Tips for Maximizing your Fathom Desktop

·  Hide items that you are not using (e.g., the box with the gold balls). Select an item (graph, table, etc) and then click on “Object” à “Hide…”. If you need to see it again, return to “Object”, then select “Show…”.

·  When finished with a table of values and only the graphs are important, delete the table of values and the graphs remain. To retrieve the table of values, select the collection box (with gold balls), select the table of values from the toolbar, and bring it into the main window.

In Your Notebooks: Go to www.agius.com/hew/resource/assoc.htm. List the 11 criteria for determining causation.

Temporality, Reversibility, Strength of Association, Exposure-response, Consistency, Biologic plausibility, Analogy, Specificity, Chance, Bias, Confounding

Discuss each criterion in reference to the smoking and cancer data from DASL. Discuss the research that is needed to be sure that there is a cause-and-effect relationship between smoking and cancer. Return to the website where the data is from to get additional information. Also discuss the role of the correlation coefficient and cause-and-effect relationships. Write your report within the Fathom document.


Part B: Sport Injuries in Football

1.  Access http://nccsir.unc.edu/reports/

> “Football fatalities”. Click on “2010”

2. Copy the table Fatalities Directly Due to Football. (starts on page 18)

3. Open Fathom. Paste cases into a Collection Box (Method 1).

4. Create a Case Table and observe the data. We need to do a little clean up. Create a scatter plot for Year and High School Fatalities.

5. The first four cases cause problems when graphing. Holding shift, select Case 1, Case 2, Case 3 and Case 4; right click and select Delete Cases.

6. Delete the last five attributes since there is no data in the columns. Holding shift, select the last three columns; right click and select Delete Attributes.

7. Relabel the attributes to the appropriate headings. Minimize the Fathom screen so that you can see both Fathom and the Internet page.

8. Notice how the last two attributes/variables have imported incorrectly. Create a new attribute (click on the <new> box), and call it TOTAL.

Click outside the table, and then click on the attribute to highlight. Then right click and select “Edit Formula”.

The formula should calculate the total fatalities from the various football groupings. You can either type in titles or double-click them from the list. When you are finished, you can click ok.

8. Relabel the collection Football Fatalities by double clicking on the name Collection 1.

9. Double click on the Collection Box and record the URL in the comments section and the name of the site. Do this for all data that you use.

10. Create a scatter plot for Year and High School Fatalities.

11. Depending on your version of Fathom, you may notice it does not produce a scatter plot because there are a few cases that have “text” instead of data. Delete these cases and create a new scatter plot. Create the least squares line.

In Your Notebook: Discuss the possible reasons for this trend and discuss the correlation coefficient.

Importing a Second Table of Data from a Different URL

12. Return to the web browser and press Back. Select “Catastrophic Sport Injury Research 27th Annual Report 2008. pg21”

13. Copy the table Cervical Cord Injuries. (You may need to copy the table into Microsoft Word first before adding to the Collection in Fathom)

14. On the same desktop as Football Fatalities, select a new Collection Box and paste cases.

15.  Create a Case Table and clean the table up. Relabel this collection Cervical Cord Injuries in Football. Create a scatter graph of Year and High School.

In Your Notebook: Discuss the possible reasons for this trend and compare it to the football fatalities. Notice the scale of each graph.

Merging Case Tables

16. Create a new Case Table. Create the attribute labelled x.

17. Notice a new Collection Box is automatically created.

18. Compare football fatalities with cervical cord injuries for each year.

19. Relabel the Collection Football Fatalities compared to Cervical Cord Injuries.

20. Click on Year from the Football Fatalities Case Table. The column should be highlighted. From the Edit Menu, select Copy Attribute. Click on x. From the Edit Menu, select Paste Attribute. Copy the high school Attribute and paste it in y. Relabel this attribute Secondary School Fatalities (use an underscore for a space).

21. Since Cervical Cord Injuries only begin in 1977, delete all cases prior to 1977 in the Football Fatalities compared to Cervical Cord Injuries Case Table.

22. Copy and paste high school attribute from Cervical Cord Injuries Case Table. Relabel it.

23. Create a scatter plot comparing Football Fatalities and Football Cervical Cord Injuries.

In Your Notebook: Discuss the criteria for determining causation with reference to this data for sports injuries. What additional information is needed for determining if this is a cause-and-effect relationship? What does the correlation coefficient tell us about this data?

Part C: Investigating Wealth

1. Use data from DASL: http://lib.stat.cmu.edu/DASL/ Data Subjects > Economics > Billionaires 92 Datafile. (If it won’t return this search file, just do a Power Search for “92”)

Import Data into Fathom (Method 2).

2. Right click on URL and Copy it. In Fathom, from File Menu, select Import from URL. Right click on address box and select paste. Select OK. A case box should appear with Gold Balls in it. Select table of values and check to see if data was imported. If not, use Method 1 from the cancer activity.

3. Create a table of values on the desktop. Graph age versus wealth. Depending on your version of Fathom, you may notice that the graph says Dot Plot at the top right corner and does not have a scale for the x-axis. To make a scatter plot, there must only be numbers in the table of values. If you scroll down to case 105, there may be an asterisk in the cell. This is another type of data clean up that you will have to check for. Delete all asterisks and delete the graph.

Analysing One-Variable Data

4. Select graph and place on desktop. Drag age and put it on the x-axis. Notice the shape of the dot plot. This looks to be a normal distribution. To verify, calculate the mean and median of the data. If they are equal, then it is a normal distribution.

5. Right mouse click on graph. Select Plot Value. Type on the screen mean(age). Press OK. Repeat this process to calculate the median. Notice they are approximately the same and they are in the middle of the dot plot.

6. Change this graph to a histogram by clicking on Dot Plot and selecting Histogram. It is easy to see the mode if you double click on the x-scale and change bin width to 1. The mode is 68; this data is not exactly normal. To return to the original graph, select Rescale Graph from Graph Menu.

Using Sliders to Analyse Data

7. Create another graph on the desktop and put age on the x-axis and wealth on the y-axis.

8. Bring down 4 sliders from the toolbar (the icon beside the A). Label a, b, k, and d.

9. Right click on the graph. Select Plot Function. Click on + sign beside Function. Click on + sign beside distribution. Click on + sign beside Normal. Double click on normalDensity (the description of this tool is at the bottom of the Expression for function screen).

10. Type in the letters a, k, b, and d. Select OK.

11. Using the sliders, try to fit a curve of best. Drag the slider on the scale to change the value of a, b, k, and d.

Changing a Slider’s Scale: Use the same rules as Changing Axis Scale

4