Chapter 18

Visual Displays of Data and Statistics:

Reporting Values, Groups, and Comparisons in Figures

ÒGraphical excellence is that which gives the viewer the greatest number of ideas in the shortest time with

the least ink in the smallest space.Ó[1]

Introduction

There are literally hundreds of ways to present statistics in figures (see, for example, the extraordinary book by Harris[2]). Here, we focus on charts, which usually present categorical data, and graphs, which usually present continuous data. In particular, we are concerned with bar, column, and dot charts and standard Cartesian graphs, in which continuous data are plotted on X and Y axes. These charts and graphsÑhereafter referred to simply as ÒfiguresÓÑare the most commonly used in biomedical research. The guidelines here often apply to both charts and graphs; guidelines for each type of figure are provided when necessary.

As in the previous chapter, we again refer to 3 levels of emphasis: individual values, groups of values, and comparisons between groups. In creating figures, these 3 levels consist of the following components:

¥ Values: a single datum or value (indicated by the length of a line in a chart or by a single point plotted on a graph); a single symbol

¥ Groups: sets of related values, indicated by such devices as a line connecting a series of values, a cluster of points on a graph, or a family of related columns, bars, or symbols.

¥ Comparisons: relationships between groups, either direct comparisons, which show 2 or more groups on the same figure or in a series of figures, or indirect comparisons, which show the results of a mathematical comparison, such as a graph of the differences between 2 groups.

Creating effective charts and graphs requires as much art as it does critical thinking. Many references contradict each other on several important points. As a result, the evidence, reasoning, conventions, and expert opinion that support the guidelines below are open to interpretation.

Much more can be said about how to construct good figures than is possible here. We have had to limit this chapter to those guidelines that, if followed, should provide the most help in a few pages. We have drawn heavily on the work of William Cleveland [3], Howard Wainer [4], and Helen Briscoe [5] and highly recommend their books to those seeking practical information about creating charts and graphs. We also recommend the work of Edward Tufte [1,6,7], for inspiration as well as for insight.

Functions of Figures

Figures can:

¥ Reveal underlying patterns of data and deviations from these patterns in ways that are not possible with text or tables [8]

¥ Organize and display data,especially patterns of data and group comparisons, more clearly and concisely than can be done in text or tables. [5,9,10]

¥ Condense or summarize large amounts of data more effectively or efficiently than can be done in text or tables.

¥ Improve the ease and speed with which specific information can be located and understood.[10]

Components and Types of Figures

In scientific publications, most figures presenting quantitative information (charts and graphs) have at least the first 7 of the following 9 components (Figure 18.1):

1. Figure number (Exceptions are publication styles in which only 2 or more figures are numbered)

2. Figure caption (or legend), generally appearing below the figure

3. Data field,a rectangular space in which the data are presented, usually bordered on the left and bottom by the X and Y axes and sometimes enclosed by a rectangle drawn with a thin line.

4. Vertical scale, on a Cartesian graph, the ÓordinateÓ or Y axis with its labeled divisions and unlabeled Òtick marksÓ; on a chart, either a scale or category labels.

5. Horizontal scale, on a Cartesian graph,the ÓabscissaÓ or X axis with its labeled divisions and unlabeled Òtick marksÓ; on a chart, either a scale or category labels.

6. Labels for each scale, identifying the variable graphed and the units of measurement represented on the scale

7. Data (plotting symbols, lines, shaded bars, on so on)

8. Reference lines in the data field to help orient readers

9. Keys or legends in the data field or caption that identify data

We distinguish between publication graphics, or figures printed in scientific journals or technical reports, usually in black and white; presentation graphics,Ó which are figures appropriate for overheads, posters, or slides and that are usually seen in color; and electronic figures, which are designed to be viewed on a computer screen and may have color, motion, links to related data, and so on. We focus here on publication graphics and have added notes for presentation graphics when appropriate. We do not address the unique advantages and limitations of electronic figures.

A word of caution about publication, presentation, and electronic graphics:

the default figures that are automatically compiled and formatted by software programs are rarely suitable for communicating the results of a study in any medium, let alone publication.[5] Spreadsheet, database, and statistical software designers are generally not trained in communicating data visually. Although these programs may be mathematically accurate, their output is often not visually effective or esthetically pleasing, the 2 qualities that make a good figure.

As a general rule, figures need to be redrawn if they are to be viewed in a different medium. For example, the detail possible on a printed graph is easily lost when shown as a slide or on-line, and the simplicity of a chart on an overhead transparency may not show enough information to make it a good publication graphic. Thus, for best results, charts and graphs must to be created for a specific medium, usually by a person trained in graphic arts or in technical writing and editing.

Principles of Figure Construction

1. Figures should have a purpose; they should contribute to and be integrated with the rest of the text.[10-12]

As is the case with tables, data should not be reported in figures just for the sake of displaying them. Figures should be used only when they can communicate information more efficiently or effectively than can be done in text or tables.

2. Figure should be designed to assist readers in finding, seeing, understanding, and remembering the information.[9,11]

When designing a figure, emphasize its purpose. Is the purpose to show the variability or the stability of data? To emphasize similarities or differences between groups? To show trends over time? To show linear or nonlinear relationships?

3. Figures should contain only those elements that are necessary to fulfill their purpose.[1, 10,11]

Conciseness is a value in figures, as well as in scientific writing in general. Make sure that all lines, symbols, numbers, and words in the figure are necessary and sufficient to allow readers to interpret it.[12] In particular, avoid 3-dimensional figures unless the data are actually 3-dimensional (Figure 18.2).[10,13]

4. The data should be emphasized over other elements in the figure.[3,5,10]

The advantage of figures is that they focus attention immediately on visual patterns of data. Thus, anything that distracts from this focus reduces the utility of the figure. For example, data points and lines should be larger or heavier than other graphic elements, such as the scales, the borders of the data field, or reference lines (Figure 18.3).

5. Figures should be consistent with the principles of perceptual psychology. [12]

Abstracting and interpreting data from a figure is a process of visual perception. Visual perception, in turn, is influenced by several principles identified by Gestalt perceptual psychology. Following these principles when designing figures should improve the utility of the figures.

¥Primacy: the larger arrangement (Òthe GestaltÓ) is seen before its components.

The overall visual impression of the figure should be consistent with the actual meaning of the data.[12] This principle can be used to manipulate readersÕ perceptions: the most common examples are the Òsuppressed zeroÓ (Figure 18.4), the Òelastic scaleÓ (Figure 18.5), and the Òdouble-scaleÓ problem (Figure 18.6).

¥ Proximity: objects near each other tend to be seen as a group.

To borrow an example from Kosslyn [12], the characters string,¥¥¥ ¥¥¥ , is seen as 2 groups, whereas ¥¥ ¥¥ ¥¥ is seen as 3. Thus, put data to be compared close to each other and separate data that are not to be compared. This principle is especially important for placing labels with respect to the data the identify.

¥ Similarity: similar objects tend to be seen as a group.

Again borrowing from Kosslyn [12], the characters string, || Ñ Ñ, is seen as 2 groups, rather than as 4 lines. Thus, display data from the same group in an obviously and uniquely consistent way, and display data from different groups in obviously and uniquely divergent ways. This principle is essential when graphing 3 or more variables on the same graph. Plotting marks and data lines from the same group should look alike. They should also differ enough between groups that the groups are not confused with one another.

¥ Continuation: data arranged in an obvious pattern tend to be seen as a group.

Once again borrowing from Kosslyn [12], the character string, Ð Ð Ð Ð, is seen as a single group, whereas Ð Ð _ _ is seen as 2. So, when possible, indicate data from the same group by providing an obvious pattern, and disrupt any patterns that are coincidentally comprised of dissimilar data.

¥ Closure: a brake in a pattern is automatically Òfilled inÓ to complete the pattern.

For example, in the sequence, _Ð Ð _ , readers usually imagine the missing symbol that would complete either the pyramid: _ Ð  Ð _ , or repeat the sequence: _Ð _ Ð _. So, emphasize any breaks that represent actual discontinuity in a pattern, and make the pattern clear when the data actually form a pattern (so readers do not have to Òfill inÓ to complete the pattern).

Statistician William Cleveland has ranked the graphical perception tasks from the most to the least accurately interpreted [3]:

1. Comparisons of positions along a common scale; for example, comparing 2 values on the X axis in a single figure.

2. Comparisons of positions along identical, but nonaligned scales, such as comparing 2 values on the X axis in 2 separate figures.

3. Comparisons of length (without baselines or scales for reference)

4. Comparisons of angle or slope

5. Comparisons of area

6. Comparisons of volume

7. Comparisons of color hue, saturation, and density

Of these 7 perceptual tasks, readers do well only with the first 2, unless the differences to be detected are large. Most readers can also distinguish between a few, greatly contrasting colors, although their accuracy declines as shades of the same color are used. Pie charts, for example, rely on judgments of angle and area when comparing 2 or more slices. Because readers are poor at making such judgments, pie charts have limited uses in scientific publications.

6. Data presented in figures should not be duplicated in the text.

As is the case with tables, do not describe in the text data that are also presented in a figure. Rather, identify in the text the important aspects of the figure to help readers interpret the data.

Guidelines for Writing Captions

Guideline 18.1The caption should identify the data in the data field. [14]

The most important part of the figure is its data. Therefore, at a minimum, the caption should accurately identify the data are being presented. As directed in Guideline 18.2, the caption may also identify other aspects of the research: the nature and number of subjects from whom the data were collected, the conditions under which the data were collected, details of the measurement technique, and so on.

Guideline 18.2The caption should allow the figure to be understood without reference to the text.[3,11]

Figures are often separated from their explanatory text, either by pages of text within an article or when they are abstracted from an article and shown independently. For this reason, a caption that explains the figure as completely as possible improves the speed and accuracy with which the figure is understood.

Guidelines for Constructing the Data Field

Guideline 18.3Whenever possible, size the figure (and hence the data field) to the dimensions of the intended viewing medium.

In most scientific publications, the width of a figure is determined by the column-width of the text. Thus, a Ò2-column figureÓ covers the width of 2 columns, however wide, and a Ò3-column figureÓ covers the width of 3 columns. In a 3-column page layout, the published figure might be sized to cover 1, 2, or all 3 columns, so the final width of the figure can be determined well before submittal. The maximum height of a figure may also be limited to the length of a column of text.

Slide formats are either landscape, in which width is greater than the height, or portrait, in which the height is greater than the width. The ratio of height to width is about 1:1.5 for 35 mm or PowerPoint slides (23 mm x 34 mm).[10]

When constructing the data field, keep in mind the final dimensions of the figure as a whole and remember to allow space for the associated scale labels and units, which must be added to the width of the data field.

Figures can be enlarged or reduced, of course, but planning ahead can prevent reductions from making details too small to read.

Guideline 18.4Outline the data field with a rectangle, bounded by the horizontal and vertical scales.[3]

Showing the borders of the data field not only helps focus attention on the field and its contents but also allows scales to be duplicated on the top and right borders, which can help readers abstract data from the figure (Figure 18.3). [12Kosslyn, 11Smart]

Guideline 18.5Keep reference lines unobtrusive. [Cleveland]

Common uses of reference lines in figures include indicating the location of zero on one or both scales, the timing of events in a time series, and previous or target values (Figure 18.9).

Guideline 18.6 Minimize the number of non-data elements in the data field.[cleveland]

Ideally, the data field would contain only the data. However, sometimes labels, ÒerrorÓ bars, confidence bands, and other textual or graphic elements are best placed in the data field. In particular:

¥ Place labels close to the element they label; try to avoid keys and legends. (A key or legend lists and defines values, groups, or comparisons that are indicated with symbols in the data field.) If you must use a key or legend, consider placing the information in the caption rather than in the data field (Figure 18.10).

¥ Avoid creating optical effects that obscure, detract, or misrepresent the relationships shown in the image (Figure 18.11).

¥ Avoid using unnecessary words, lines, or symbols that hinder interpretation of the image. Such Òchart junkÓ [Tufte] often consists of ÒdecorativeÓ elements added to enhance the aesthetic appeal of the figure, but it can also include extraneous details and simple cluttering of important ones (Figure 18.2).

Guideline 18.7Identify all the elements in the data field.

Although the figure caption and the scale labels identify the data and groups in the data field, the data field often contains other elements, which must also be identified. These elements include Òerror bars,Ó confidence intervals or confidence bands, threshold lines, and so on.

ÒError barsÓ should always be labeled in graphs but often are not. Because error bars may represent variability in the data (that is, standard deviations or interpercentile ranges), ÒerrorsÓ in estimating (that is, standard errors of the mean), or precision (95% confidence intervals around an estimated value), labeling them is essential to correct interpretation of the data.[Cleveland] Also, Òerror barsÓ should be shown for both directions. Variability and error are not always symmetrical about the value they accompany, so showing both directions is desirable. Charts often incorrectly show Òerror barsÓ only for values above the measured or estimated values.

Ideally, data elements will be labeled directly, in the data field. When direct labeling is not possible, a key or legend can be included in the caption or, if convenient, in an empty area of the data field.

Space permitting, labels should be horizontal for easy reading.

Guidelines for Constructing Scales

Guideline 18.8 Label each scale clearly with the name of the variable, the units in which the variable is graphed, and any multipliers associated with the units[10Schriger, 13Jordan]

Obviously, the variable label should indicate what has been measured; that is, what the data points represent.

The units of measurement should also be given in the scale label. Most scientific communities use SI Units (System International Units), which are based on the metric system.[SI Booklet] As a result of tradition, particularly in the US, the biomedical community still uses older units for some measurements. Blood pressure is reported in millimeters of mercury (mm Hg), for example, as opposed to pascals (Pa), which is the SI unit for pressure (expressed as Newtons per square metre). The general belief is that health care workers are so used to seeing blood pressure reading in millimeters of mercury that changing to a newer unit might compromise patient care. [Young DS]

Finally, multipliers in scale labels are useful because they eliminate the need to report lots of zeros in the scale . For example, a scale labeled ÒNumber of Immunizations x 1,000 personsÓ might have scale divisions of, say, 35, 85, and 105, whereas a scale labeled ÒNumber of ImmunizationsÓ would have scale divisions of 35,000, 85,000, and 105,000.

Guideline 18.9Indicate the zero-zero point of the graph, especially if one or both scales do not begin at zero.[Briscoe, Kosslyn, Smart]

Most readers assume that all graphs begin at the zero-zero point of origin. However, sometimes one or both scales are begun at a value other than zero. In such cases, it is often usefulÑif not necessaryÑto ÒbreakÓ the scale to indicate visually the discontinuity in the scale (Figure 18.12).

Whether the zero point needs to be shown for all scales in scientific publications is debated.[Cleveland EGD] The argument is that beginning the scale at a point that maximizes the range of the plotted data is more efficient, and that doing so avoids taking valuable space to indicate a broken scale. In addition, the argument goes, readers of scientific texts will read the scale labels accurately and not be mislead by the missing zeros.

On the other hand, the visual impression of a figure is usually recalled more clearly than the actual data (the Gestalt principle of primacy), so it may be important that the visual impression of the figure matches the information imparted by the data. In the Òsuppressed zeroÓ problem (Figure 18.4), the lack of a clear zero-zero point alters the visual impression of the comparisons between the columns and can thus mislead readers.