Data Sets:
HEADS
CHOLESTEROL
BLOOD
BOOKS
CARS
CEREAL
EVERGLADES
HEIGHTS
IRIS
MARATHON
MISC
MOVIES
HOMES
M&M
OLDFAITHFUL
QWERTY
STOWAWAYS
SURVEY
SysBP
YEAST
HEADS: Head circumferences (cm) of 50 Two-Month-Old Baby girls and boys. The data are from the U.s. Department of Health and Human Services, National Center for Health Statistics, Third National Health and Nutrition Examination Survey.
BLOOD: The data are blood measurements from 50 subjects (from the U.S. Department of Health and Human Services, National Center for Health Statistics, Third National Health and Nutrition Examination Survey). The gender, age, white blood cell count, red blood cell count, hemoglobin level, and platelet count are given for each subject. The blood cell counts are easured in cells per microliter; hemoglobin is measured in g/dL; platelet count is number per mm3.
BOOKS: For 12 pages randomly selected from each of three books, the data set lists the mean number of words per sentence, the mean number of characters per word, the Flesch Reading Ease score, and the Flesch-Kincaid Grade Level score. The books are The Bear and the Dragon by Tom Clancy, Harry Potter and the Sorcerer’s Stone by J. K. Rowling, and War and Peace by Leo Tolstoy.
CARS: A sample of 20 cars, including measurements of fuel consumption (city mi/gal and highway mi/gal), weight (pounds), number of cylinders, engine displacement (in liters), amount of greenhouse gases emitted (in tons/year), and amount of tailpipe emissions of NOx (in lb/yr).
CEREAL: Data from 16 brands of cereal, including cost per 100 grams, and the contents of calories, fat, sugar, cholesterol, sodium, protein, and the shelf on which the cereal was placed.
EVERGLADES: Temperatures (in degrees Celsius), conductivity measurements, and rainfall amounts (in inches) are given for the Garfield Bight hydrology outpost in the Florida Everglades. The data are from Kevin Kotun and the National Park Service.
HEIGHTS: The data are heights of 20 boys and 20 girls along with the heights of both parents. All heights are in inches. The data are from the U.S. Department of Health and Human Services, National Center for Health Statistics, Third National Health and Nutrition Examination Survey.
IRIS: The data are 50 measurements (sepal length, sepal width, petal length, petal width) from each of three classes (setosa, versicolor, virginica) of Irises. The data are from “The Use of Multiple Measurements in Taxonomic Problems” by R.A. Fisher, Annals of Statistics, Vol. 7. All measurements are in mm.
MARATHON: The data are 150 randomly-selected results for 150 runners who finished the New York City Marathon in a recent year. For each subject, the order, age, gender, and time (in seconds) are given.
MISC: These data for the years 1980-2000: Dow Jones Industrial Average high level, U.S. car sales (in thousands), U.S. motor vehicle deaths, U.S. murders and non-negligent homicides, sunspot number, total points scored in the Super Bowl.
MOVIES: A sample of 36 movies including the budget amounts (in millions of dollars), the amounts grossed (in millions of dollars), the lengths of the movies (in minutes), and the viewer ratings.
HOMES: Data Set 18 in the 10th edition of Elementary Statistics includes recent data from home sales, but this data set includes data from homes sold in 1999. Selling prices and list prices are in thousands of dollars. Living areas are in hundreds of square feet. Taxes are in dollars.
M&M: recent weights from a sample of M&M plain candies, but this data set includes weights from a sample collected in 1993.
OLDFAITHFUL: includes Old Faithful measurements that are very recent, but this data set includes measurements (duration, interval, height) from eruptions of the Old Faithful geyser in 1995.
QWERTY: Ratings of difficulty of typing each of the 52 words in the Preamble to the Constitution using the QWERTY configuration of keys (found on typical keyboards in use today) and the Dvorak configuration designed to make typing easier. Higher values correspond to words that are more difficult to type.
STOWAWAYS: Ages of stowaways on the Queen Mary, categorized by westbound crossings and eastbound crossings. The data are from the Cunard Steamship Co., Ltd.
SURVEY: Survey results from 100 statistics students. Results include gender, age, height, value of coins in possession, number of keys, number of credit cards, pulse rate (beats per minute), whether the subject exercises, whether the subject smokes, whether the subject is color blind, and handedness (right, left, ambidextrous).
SYSTOLIC: The blood pressure measurements (mm Hg) are taken before and after a period consisting of 25 minutes of aerobic bicycle exercise. During the pre- and post-exercise periods, subjects were measured during a time of no stress, and a time of stress caused by an arithmetic test, and a time of stress caused by a speech test. Data are from “Sympathoadrenergic Mechanisms in Reduced Hemodynamic Stress Responses after Exercise” by Kim Brownley et al, Medicine and Science in Sports and Exercise, Vol. 35, No. 6.
YEAST: The data are counts of yeast cells using a haemacytometer, and each value is the count over 1 mm2 divided into 400 squares. The data were collected by William S. Gosset, who developed the Student t distribution. Gosset was an employee of the Guiness Brewery, and his contributions to statistics have origins in the brewing process, which requires yeast used for fermentation. Counts of yeast cells were important because the addition of too little yeast would result in incomplete fermentation, but too much yeast would result in beer with a bitter taste. The data are from Student’s Collected Papers, edited by E.S. Pearson and John Wishart, Cambridge University Press, London, 1958. (Gosset published with the pseudonym of “A Student.”)