Biostatistics Case Study 1:
Exploratory Data Analysis Techniques
Student version / July 28, 2009

BIOSTAT Case Study 1: Exploratory Data Analysis Techniques

Time to Complete Exercise: 30 minutes

LEARNING OBJECTIVES

At the completion of this Case Study, participants should be able to:

Access TB surveillance data from the CDC Web site

Generate box-and-whiskers plots, stem and leaf diagrams, and histograms

Generate percentile values and measures of central tendency and dispersion for skewed distributions

Describe the magnitude of the TB incidence (new case) rates in the United States

Describe the differences in TB incidence rates by sex/gender and state across the United States

ASPH A. BIOSTATISTICS COMPETENCIES ADDRESSED IN THIS CASE STUDY

A.5. Apply descriptive techniques commonly used to summarize public health data

A.8. Apply basic informatics techniques with vital statistics and public health records in the description of public health characteristics and in public health research and evaluation

ASPH INTERDISCIPLINARY/CROSS-CUTTING COMPETENCIES ADDRESSED IN THIS CASE STUDY: F. COMMUNICATION AND INFORMATICS

F. 8. Use information technology to access, evaluate, and interpret public health data

Introduction

Control of tuberculosis (TB) in the United States is an important public health responsibility. Effective TB control requires a complex system that merges elements of laboratory science, investigative work, public health, surveillance, and clinical care.

The Tuberculosis Information Management System (TIMS) is one example of a public health surveillance system. TIMS is one of the main sources of descriptive data regarding TB in the United States. TIMS includes information on all cases of TB that have been reported to the Division of TB Elimination (DTBE) at the Centers for Disease Control and Prevention (CDC). This information is reported to CDC by 50states, the District of Columbia, the city of New York, Puerto Rico, and other jurisdictions in the Pacific and Caribbean.

Data on person, place, and time relating to TB in the United States are gathered using TIMS. These data are analyzed and published by the CDC annually and may be accessed through the CDC Website in the form of TB Surveillance Reports at: the Online Tuberculosis Information System (OTIS) at If you were to access OTIS and request current TB case reports by sex and state for the period 2001-5, you would obtain the data below.The data presented below are the TB new case rates per 100,000 population for males and females (person), in the 50 states and the District of Columbia (DC) (place) during the years 2001 to 2005 (time).

TB Case Rates per 100,000 Population

PlaceFemales Males Place Females Males

1

Biostatistics Case Study 1:
Exploratory Data Analysis Techniques
Student version / July 28, 2009

Alabama3.47.2

Alaska6.69.4

Arizona3.56.6

Arkansas3.46.5

California7.110.6

Colorado2.13.0

Connecticut2.53.6

Delaware2.74.6

DC8.219.0

Florida4.38.5

Georgia4.57.8

Hawaii7.912.6

Idaho0.91.1

Illinois4.16.0

Indiana1.62.7

Iowa1.21.8

Kansas2.13.2

Kentucky2.14.7

Louisiana3.77.9

Maine1.22.0

Maryland4.56.0

Massachusetts3.55.0

Michigan2.43.2

Minnesota3.94.7

Mississippi2.96.1

Missouri1.53.1

Montana0.72.1

Nebraska1.52.5

Nevada3.65.1

New Hampshire1.21.3

New Jersey5.06.8

New Mexico2.32.8

New York5.69.5

North Carolina3.25.9

North Dakota0.81.0

Ohio1.62.9

Oklahoma3.56.4

Oregon2.33.9

Pennsylvania2.23.3

Rhode Island4.05.5

South Carolina4.28.2

South Dakota1.72.1

Tennessee3.46.9

Texas4.99.5

Utah1.21.7

Vermont1.50.9

Virginia3.94.9

Washington3.35.0

West Virginia1.02.0

Wisconsin1.21.7

Wyoming0.50.7

1

Biostatistics Case Study 1:
Exploratory Data Analysis Techniques
Student version / July 28, 2009

Exploratory data analysistechniques are often used to organize, summarize, and describe clinical and epidemiologic data. These techniques include stem-and-leaf plots and box plots. To make this easier, the sorted data, by gender, appear below.

1

Biostatistics Case Study 1:
Exploratory Data Analysis Techniques
Student version / July 28, 2009

Female TB Case Rates per 100,000 Population

  1. Wyoming0.5
  2. Montana0.7
  3. North Dakota0.8
  4. Idaho0.9
  5. West Virginia1.0
  6. Iowa1.2
  7. Maine1.2
  8. New Hampshire1.2
  9. Utah1.2
  10. Wisconsin1.2
  11. Missouri1.5
  12. Nebraska1.5
  13. Vermont1.5
  14. Indiana1.6
  15. Ohio1.6
  16. South Dakota1.7
  17. Colorado2.1
  18. Kansas2.1
  19. Kentucky2.1
  20. Pennsylvania2.2
  21. New Mexico2.3
  22. Oregon2.3
  23. Michigan2.4
  24. Connecticut2.5
  25. Delaware2.7
  26. Mississippi2.9
  27. North Carolina3.2
  28. Washington3.3
  29. Alabama3.4
  30. Arkansas3.4
  31. Tennessee3.4
  32. Arizona3.5
  33. Massachusetts3.5
  34. Oklahoma3.5
  35. Nevada3.6
  36. Louisiana3.7
  37. Minnesota3.9
  38. Virginia3.9
  39. Rhode Island4.0
  40. Illinois4.1
  41. South Carolina4.2
  42. Florida4.3
  43. Georgia4.5
  44. Maryland4.5
  45. Texas4.9
  46. New Jersey5.0
  47. New York5.6
  48. Alaska6.6
  49. California7.1
  50. Hawaii7.9
  51. District of Columbia8.2

Male TB Case Rates per 100,000 Population

  1. Wyoming0.7
  2. Vermont0.9
  3. North Dakota1.0
  4. Idaho1.1
  5. New Hampshire1.3
  6. Utah1.7
  7. Wisconsin1.7
  8. Iowa1.8
  9. Maine2.0
  10. West Virginia2.0
  11. Montana2.1
  12. South Dakota2.1
  13. Nebraska2.5
  14. Indiana2.7
  15. New Mexico2.8
  16. Ohio2.9
  17. Colorado3.0
  18. Missouri3.1
  19. Kansas3.2
  20. Michigan3.2
  21. Pennsylvania3.3
  22. Connecticut3.6
  23. Oregon3.9
  24. Delaware4.6
  25. Kentucky4.7
  26. Minnesota4.7
  27. Virginia4.9
  28. Massachusetts5.0
  29. Washington5.0
  30. Nevada5.1
  31. Rhode Island5.5
  32. North Carolina5.9
  33. Illinois6.0
  34. Maryland6.0
  35. Mississippi6.1
  36. Oklahoma6.4
  37. Arkansas6.5
  38. Arizona6.6
  39. New Jersey6.8
  40. Tennessee6.9
  41. Alabama7.2
  42. Georgia7.8
  43. Louisiana7.9
  44. South Carolina8.2
  45. Florida8.5
  46. Alaska9.4
  47. New York9.5
  48. Texas9.5
  49. California10.6
  50. Hawaii12.6
  51. District of Columbia19.0

1

Biostatistics Case Study 1
Exploratory Data Analysis Techniques
Instructor’s version / June 17, 2009

Question 1

Generate separate stem-and-leaf diagrams of these case rates for males and females and describe the distribution of these data. (Hint: use the decimal as the leaf.)

Female TB Case Rates per 100,000Male TB Case Rates per 100,000

1919

1818

1717

1616

1515

1414

1313

1212

1111

1010

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0

Question 2

Describe the distributions. Are they normally distributed or skewed to the right or skewed to the left?

Question 3

What is the median TB case rate among females and among males? The 75% and 25% values? The interquartile (IQ) range? The range?

Question 4

Draw/generate a histogram and a box-and-whiskers plot describing the rates for males and females. Which states/locations have unusually high or low (outlier) rates?

Question 5

Describe the differences in theTB case rates for males and females.

1