2.2Discuss Graphical Techniques
2.2.1Define and Describe the Shape of a Distribution
The distribution of data offers important information to statisticians.
To help you discuss and better understand data distribution, it is important to know some basic terms and definitions.
Clusters: A cluster is a group that is closely packed together. In the frequency distribution to the left (below) you can see that there are two clusters. That is, two areas seem to have much larger frequencies than the others.
In the histogram above right, it appears that teachers’ salaries are clustered in the $36000 to $54000 range.
Symmetrical or Asymmetrical? Dispersions are symmetrical if the left side and right side are mirror images of each other. In the case of symmetrical data, the mean, median and mode (to be discussed later) are all in the middle and are equal. When the dispersion is not a mirror image, it is asymmetrical. Asymmetrical distributions may be skewed.
Symmetrical Distribution
/Asymmetrical Distribution
Notice the left and right sides are the same.
/Notice the left and right sides are different.
Skew: A distribution is skewed if one of its tails is longer than the other. The first distribution shown has a positive skew. This means that it has a long tail in the positive direction. The distribution below it has a negative skew since it has a long tail in the negative direction. Finally, the third distribution is symmetric and has no skew.
Positive Skew
/Negative Skew
/Symmetrical
Notice the long “tail” going towards the right.
/Notice the long “tail” going towards the left.
/No skew. The tails are the same on the right and left.
Distributions with a positive skew are more common than distributions with negative skews. One example is the distribution of income. Most people make under $40,000 a year, but some make quite a bit more with a small number making many millions of dollars per year. The positive tail therefore extends out quite a long way whereas the negative tail stops at zero.
What is a Normal Curve? A graph representing the density function of the normal probability distribution (to be discussed in Unit 5) is also known as a Normal Curve or Bell Curve. The graph below has a mean of zero and a standard deviation of 1, so it is known as the Standard Normal Distribution.
The Normal probability distribution curve:
Is bell-shaped and has a single peak at the center.
Is symmetrical about the mean.
Is asymptotic…that is the curve gets closer and closer to the X-axis but never actually touches it.
Has the mean,, to determine its location and its standard deviation, , to determine its dispersion.
2.2.2Construct and analyze a histogram, a frequency polygon, a bar chart, a pie chart, and an ogive.
What is a Histogram? A histogram is a graph in which the “classes” are marked on the horizontal axis and the “class frequencies”are marked on the vertical axis. The class frequencies are represented by the heights of the bars and the bars are drawn adjacent to each other. The histogram below shows the varied salaries of employees.
What is a Frequency Polygon? A frequency polygon is similar to a histogram. A frequency polygon consists of line segments connecting the points formed by the “class midpoints” and the class frequency.
What is a Bar Chart? A bar chart is a diagram that compares bars of the same width but of different heights according to the statistics or data they represent. Bar charts can be vertical or horizontal. It can be used to depict any of the four levels of data.
What is a Pie Chart?A pie chart is a circular chart that provides a visual concept of the whole (100% = 360 degrees). The pie is divided into slices, each corresponding to a category of the data represented. The size of the slices is proportional to the percentage of the corresponding category. It is useful in portraying a relative frequency.
What is an Ogive? An ogive is a cumulative frequency polygon. It shows the number of values or observations that are less than each of the “class limits” rather than how many are in each of the “classes”.
The ogive is useful whenever you want to determine what percentage of your data lies below a certain value. The percentages are listed on the right side.
Constructions
How do we construct a Histogram?
The table below is a frequency distribution showing the scores of students in Mr. Manuel’s MA1670 class.
Classes / Frequency0 up to 20 / 1
20 up to 40 / 2
40 up to 60 / 6
60 up to 80 / 12
80 up to 100 / 4
Step 1: Decide on a scale and draw the horizontal and vertical axes.
Notes:
a)The frequencies go on the Y – axis and the classes go on the X – axis.
b)Make the scale reasonable for the data given.
c)This is interval data, so the bars should be all the same width. There should be no spaces between the bars.
Step 2: Add the headings and title.
Step 3: Draw the heights of the bars to match the frequencies in the table.
Notes:
a) This is interval data, so the bars should be all the same width. There should be no spaces between the bars.
Step 4: Simply draw the remainder of the bars to complete the histogram.
How do we construct a Frequency Polygon?
The table below is a frequency distribution showing the ages of students in Mr. Manuel’s MA1670 class.
Classes / Frequency15 up to 20 / 3
20 up to 25 / 12
25 up to 30 / 6
30 up to 35 / 3
35 up to 40 / 1
Step 1: Calculate the midpoints of the classes.
Classes / Midpoints / Frequency10 up to 15 / 12.5 / 0
15 up to 20 / 17.5 / 3
20 up to 25 / 22.5 / 12
25 up to 30 / 27.5 / 6
30 up to 35 / 32.5 / 3
35 up to 40 / 37.5 / 1
40 up to 45 / 42.5 / 0
Notes:
a)Midpoint of first class is
b)Notice that 2 extra classes (one below and one above) have been added. These are called anchors and should have a frequency of 0.
Step 2: Decide on a scale and draw the horizontal and vertical axes
Notes:
a) The frequencies go on the Y – axis and the class midpoints go on the X – axis.
b) Make the scale reasonable for the data given.
d)Evenly space the midpoint values.
c)Notice how the gap between 0 and 12.5 is marked.
Step 3: Add the headings and title.
Step 4: Mark a dot at the intersection of the ages and frequencies.
Step 4: Complete for each class and join the dots with line segments.
Analysis of Charts & Graphs
Histograms The histogram below shows the varied prices of 32 hotels in four different cities in the US. Let’s look at the information and answer some questions.
Question 1: What is the class interval? Answer: 100 – 50 = 50
Question 2: How many hotels are in the $100 to $150 price range? Answer: 12
Question 3: Most of the prices are clustered (see page 17) in what price range?
Answer: $100 up to $200
Question 4: How many of the 32 hotels are priced above $150?
Answer: 11 + 4 + 1 = 16
Frequency Polygon: The polygon below shows the prices of used vehicles sold at Gulf Used Cars.
Question 1: How many of the vehicles sold were in the $15000 to $18000 range?
Answer: 23
Question 2: What per cent of the vehicles sold were between $15000 and $18000?
Answer:
Pie Chart(Example 1): The pie chart below shows the sales at “Doha Music Store”.
Question 1: What percentage of the sales is “Rock and roll”? Answer: 13%
Pie Chart(Example 2): A simple 3D version of the pie chart representing sales figures by region. In addition to the absolute dollar value a percentage figure is also calculated for each pie segment.
Question 1: What is missing from the chart that?
Answer: The legend is missing so we are unable to determine which region is which.
Bar Chart (Example 1) – Vertical Bar Graph: The vertical bar graph below shows the number of police officers in Crimeville from 1993 to 1996.
Question 1: What trend do you see in the number of police officers in the city?
Answer: You can see that the number of police officers decreased from 1993 to 1996, but started increasing again in 1996.The legend is missing so we are unable
Question 2: How many more officers were there in 2001 than in 1998?
Answer: 9
Bar Chart (Example 2) – Horizontal Bar Graph: The Horizontal Bar Graph below is an example of a double horizontal bar graph. Hillary sampled an equal number of boys and girls at her high school and asked them to pick the one snack food they liked the most..
Question 1: What snack food is least preferred by girls? Answer: Vegetables
Question 1: What snackfood was preferred by substantially more girls than boys?
Answer: Fruit
Ogive (Cumulative Frequency Polygon): The ogive below shows the price range of vehicles sold at “Qatar Quality Vehicles”.
Question 1: 25 vehicles sold for less than what amount?
Answer: $17500
To find the price below which 25 of the vehicles sold, we locate the value of 25 on the Y-axis. Next, we draw a horozontal line from the vlaue of 25 to the point where we meet the graph, then draw a vertical line to the X-axis and read the price. We can see that 25 vehicles sold for approximately less than $17,500.
Question 2: What percent of the vehicles sold for less than $25 500?
Answer: 87.5%
In this graph the precentages are listed on the right-hand side. To estimate the present of vehicles that sold for less than $25,500, we draw a vertical line from the X-axis to the point where it intersects the graph, next draw a horzintal line to the Y-axis (on right) and see that approximately 87.5% of the vehicles sold for less than $25,500.
2.2.3Explain how graphs can be misused
Partial Picture: Not all the relevant information is given.
Example 1: The pie chart below is a report of drivers injured while drinking alcohol. Can we conclude that it's safer to drive while under the influence?
Answer: No. Drunk drivers have a fatality risk 7.66 times the norm, while non-drunk drivers have a risk only about 0 .6 of the norm. Only a very small percentage of drivers in New York City drive while under the influence, but they account for a disproportionate number of accidents.
Example 2: The bar graph below compares fatal driving accidents to age. The chart shown seem to suggest that 16-year-olds and octogenarians (someone who is in their eighties) are safer drivers than people in their twenties. Is this true?
Answer: No. As the following graph shows, the reason 16-year-old and octogenarians appear to be safe drivers is that they don't drive nearly as much as people in other age groups.If there are less of them driving then there will be fewer accidents.
Always ask: “Is there any other relevant information?”Misleading Graphs: We should analyze the numerical information given in the graph instead of being misled by its general shape or its visual appearance.
Example 1: (Bar Graph) Median Weekly Earnings of full-time Professional Workers
The graphs above show the same numbers, but the presentation makes the gap between the men and women appear greater in the graph on the left.
Always ask: “What do the numbers say?”General Guidelines for Graphing
The ability to create effective charts, whether for oral presentations or printed text, is an important skill for anyone involved with projecting numerical information. Research has shown that the appearance of images communicates even faster than words or lists of numbers, and knowledge of effective charting methods allows one to present numerical information in a visually-appealing way. Essentially, a chart's effectiveness depends on its ability to generate an immediate sense of orientation and access to information for the viewer.
General Hints:
Generally, effective charts use the simple (and easily learned) techniques of good design. Any book discussing page layout will assist you in this area. However, some design techniques should be specifically mentioned, as they relate directly to charting. These techniques include:
- Choosing the correct chart format
Chart formats are designed to portray certain types of information; you must therefore choose the correct chart format for the information you wish to project.
- Maintaining simplicity
Clarifying information is the main goal of creating a chart in the first place, and complicated charts only serve to make the information they present less clear.
- Maintaining consistency
When creating several charts, use a design grid. This grid will help you maintain a consistent chart format, eliminating distractions for your audience.
- Using labels
Effective use of labels, created using legible typefaces, will assist your audience in understanding a chart's information.
Each of these design factors is important, but the choice of chart format comes first. For this reason, the following sections discuss the design considerations for three of the most common chart formats.
Pie Charts:
Pie charts are best used to compare parts of a whole; in other words, they help divide a group into the components that make it up. Some factors to keep in mind when creating pie charts include:
Source: UWEC Campus Profile, 1999
- Limiting the number of wedges
Keep the number of wedges to a minimum by combining smaller categories into one. Too many wedges will hinder interpretation by making your pie chart appear complicated and cramped; it will also create difficulties for labeling. - Using labels for wedges
Try to place labels within wedges whenever possible; using this technique will help you create pies that are both clear and readable. - Focusing attention
If necessary, draw your audience's attention to the particular wedge you are discussing, perhaps by "exploding" it to make it appear separate from the pie or by selecting a dominant color or pattern. Refer to Using Pie Chart Options for more information on enhancing pie chart effects. - Enhancing the pie
Consider enhancing the appearance of the pie chart, perhaps by adding perspective. Keep in mind, however, that three-dimensional pies can sometimes make certain wedges appear larger than they really are.
Bar Graphs:
Bar graphs work best to emphasize the contrast between quantities. Two types of bar graphs can be used: vertical and horizontal. Vertical bar graphs work well for comparing quantities at different times, while horizontal bar graphs compare different quantities when time is not an important consideration. For example, a graph showing student enrollment by year would probably work best in the vertical format, while a graph showing current participation in faculty organizations would be most effective in the horizontal format. Some design considerations to keep in mind when creating either type of bar graph include:
Source: UWEC Campus Profile, 1999
- Limiting the scale
Make sure that your bar graph is kept within a reasonable scale; in other words, try to avoid showing three quantities of similar size and one quantity that is drastically larger or smaller.
- Enhancing the graph
Consider adding perspective or a drop shadow to your chart for visual appeal, but again be aware that (as with pie charts) the third dimension can make it difficult for the audience to determine exact measurements. Numerical values placed above or within the bars themselves may help solve this problem.
Line Graphs (Polygons):
Polygons best indicate the relationship of one variable to another, and they can be created using either straight or curved lines. Which type of line graph you use depends on the type of information you wish to convey: straight line graphs show specific observation points, while curved line graphs show general trends. Some design considerations to keep in mind when creating either type of line graph include:
Source: UWEC Institutional Planning, 1999
- Using contrast
Make sure to use lines with sufficient contrast; in other words, create a line that is bold enough to clearly appear to your audience, but thin enough to still convey specific information.
- Limiting multiple lines
When using multiple lines to compare trends, keep the number of lines to a minimum-probably no more than two. Comparing more than two trends on the same line graph can create confusion for your audience.
Summary
Following the guidelines in this document will help you create pie charts, bar graphs, and line graphs that present information clearly. Of course, these three chart formats are not the only ways to convey numerical data; other formats can effectively portray information as well. The following table provides a quick guide to most of your choices:
Chart Format / DescriptionPie / Compares parts of a whole.
Bar / Shows contrast between quantities.
Line / Indicates the relationship of one variable to another.
Area / Indicates the volume relationship of one variable to another.
Scatter Plot / Correlates two factors by marking the points where particular events occurred.
Each chart format has its own design considerations, but there is plenty of room for experimentation-use your computer software to try new techniques. As long as you choose the correct format and keep in mind the general concepts of simplicity, consistency, and labeling, you will be well on your way towards creating an effective chart that is understandable at a glance.
1