• How do we make sense of data that we have collected with graphs?

There are three parts of statistics: data production, data analysis, and statistical inference. In your reading from section 1.1, you learned some of the different ways data is produced. But once you have the data, how can you discern if it actually tells you something useful? In this course, we will learn the following ways to graph data: bar graph, pie graph, histograms, stemplots, dot plots, boxplots, time plots, ogives, and frequency polygons. To make any of these graphs useful, however, you must make sure that you are using an appropriate graph for the type of data you have. There are two types of data:

Categorical data—data that fits into some category. Examples are gender, grade, birth month, favorite color, etc. To graph categorical variables, you must use a bar graph or pie chart.

Quantitative data—data that can be measured and is typically numerical in nature. However, categorical variables can also be numbers, but to be quantitative, it must make sense to average the numbers. To graph quantitative variables, you must use any of the graphs mentioned above EXCEPT FOR bar graphs and pie charts.

To help you learn how to make a histogram and how to interpret them, I would like for you to read section 2.3 in your book, pages 53-70. In this section, you should specifically be sure that after reading you are able to define the following terms:

  • Range
  • Classes
  • Class width
  • Lower limit and upper limits of classes
  • Cumulative frequency
  • Relative frequency
  • Distribution of the graph (shape the data makes) i.e. roughly symmetric, skewed left, skewed right, uniform
  • Unimodal and bimodal

After reading section 2.3, I want you to work through the hand-out entitled, “Histogram Class Examples.” The first example on this hand-out use the same scenario and data as the example beginning on page 55 in your book and listed in table 2-9. However, this example uses 10 classes to make the histogram, and I would prefer you not use that many. So work through this one with me using my notes below

My notes on making histograms!

A histogram is a graph used to analyze the distribution of a quantitative variable. Some people confuse histograms with bar graphs, but they are not the same. In bar graphs, the width of the bars do not affect what the graph says, but in a histogram, the width of the “bars” or classes must always be the same so that the shape is not distorted.

So in making a histogram, how do we determine how many classes (what you used to call “bars”) to use? This all depends on the range of the data. The range is the difference in the maximum and minimum values.

Histograms should always have between 5 and 15 classes, but the rule of thumb is to use more classes for a small range, and less classes for a large range. Think about this example: Suppose you had 25 data values rounded to the tenths place that were all between 5 and 6. This would not produce a large range, and thus the fewer classes you used, the less you could actually see about the pattern the graph was making.

Look at Example 1 on the Histogram Class Examples hand-out. Go to STAT, EDIT on your calculator. Enter the data into L1 on your calculator. When finished, you should have n=60 values in the list. Now sort the data from least to greatest (ascending). To do this, go to STAT, #2: SORTA(L1). When you go back to STAT, EDIT to view the list, the values should now be listed from least to greatest. Now find the range.

  1. Range = 47 – 1 = 46. This is a large range, so we can use a lower number of classes. We will use 6 for our example. Your book used 10 classes, so you can use what they are doing as well to track your comprehension.

After finding the range and choosing the appropriate number of classes, find the class width, which determines how many numbers should fall into each of the classes we will graph.

  1. Class width = 46/6 = 7.7

The class width can NEVER be a decimal. So you should always round the class width up to the next whole number. I know it sounds crazy, but even if your class width was 25/5, which is 5, you would still need to round this up to 6. SO OUR CLASS WIDTH FOR THIS HISTOGRAM WILL BE 8!

  1. Now make a frequency chart. It should always look like the following chart:

Lower Limit / Upper Limit / Frequency / Cumulative Frequency / Relative Frequency
1 / 8 / 14 / 14 / 14/60
9 / 16 / 21 / 35 (21 + 14) / 21/60
17 / 24 / 11 / 46 (35 + 11) / 11/60
25 / 32 / 6 / 52 / 6/60
33 / 40 / 4 / 56 / 4/60
41 / 48 / 4 / 60 / 4/60

*The lower limit of the first class should ALWAYS be the minimum value in the data set. In this case, that would be 1.

**To find the upper limit of each class, you cannot simply add the class width to the lower limit.

1 + 8 = 9, but how many numbers are from 1-9? From 1-9, there are 9 numbers, which would make the class width 9. The number of values that are to be included in each class in this problem is 8, which means the first class should include values from 1 to 8. HINT: To find the upper limits, just always add 1 less than the class width to the lower limits

***To find the frequencies quickly, use the sorted list in your calculator instead of the data printed on the paper.

****Cumulative frequency helps you make sure you have included all values in the data set in one of the classes. If the cumulative frequency is not equal to n, then you need to relook at the data.

  1. Now make the graph. Draw only the first quadrant. Make 6 marks equidistant apart to represent each of the 6 classes on the x-axis. YOU MUST MAKE SURE THAT THE CLASSES HAVE EQUAL WIDTH!
  1. After making any quantitative graph in statistics, you must describe three things about the graph: shape, center, and spread.

SHAPE—typically the shape will either be roughly symmetric, skewed left, or skewed right. Your book shows good examples of these on page 65. We would say that this graph is skewed right, because majority of data is concentrated on the left and there are a few extreme values that pull the “tail” of the graph to the right.

CENTER—the center of the graph is where you think 50% of the data lies to the right and 50% to the left. Unfortunately, you can never find the exact center (median) of a histogram by just looking. You must estimate where the center of the distribution is, which for this graph, we might say is approximately 16 miles. The center is interpreted by saying, “Approximately 50% of downtown Dallas workers drive 16 miles or less one way to work.”

SPREAD—the spread is how much variation you see in the data. In assessing the spread, you should ask yourself, “Do I see gaps in the data, which would be classes that had no frequency. Is the graph really spread out or is it mostly concentrated in one area?” For now, the only measure of spread you have learned is range, but it is not the best measure to summarize spread. For example, what if a data set had 60 values, and 59 of them were 1 and the other value was 42. The range would be 41, but all the values are 1 except for 1! A range of 41 might lead you to believe the data is pretty inconsistent, when in fact, there is almost no variation in the data at all. Eventually, we will learn to use other measures of spread that are more accurate, but for now, we will say there are no gaps and the range is 46.

ASSIGNMENT: Finish the Histogram Class Examples hand-out. Then use what you’ve learned to complete problems on page 73-77 (10-14 all). Again, keys to these will be posted, but use them wisely!

ASSIGNMENT: Complete the Journal for Histograms

Before moving on to the next item, could you make a frequency polygon or ogive? Be sure to pay close attention to those graphs on pages 66-70.