Boxplot Basics
A boxplot splits the data set intoquartiles. The body of the boxplot consists of a "box" (hence, the name), which goes from the first quartile (Q1) to the third quartile (Q3).
Within the box, a vertical line is drawn at the Q2, themedianof the data set. Two horizontal lines, calledwhiskers, extend from the front and back of the box. The front whisker goes from Q1 to the smallest non-outlier in the data set, and the back whisker goes from Q3 to the largest non-outlier.
Smallest non-outlier / Q1 / Q2 / Q3 / Largest non-outlier.. / ...
-600 / -400 / -200 / 0 / 200 / 400 / 600 / 800 / 1000 / 1200 / 1400 / 1600
How to Interpret a Boxplot
Here is how to read a boxplot. The median is indicated by the vertical line that runs down the center of the box. In the boxplot above, the median is about 400.
Additionally, boxplots display two common measures of the variability or spread in a data set.
- Range. If you are interested in the spread ofallthe data, it is represented on a boxplot by the horizontal distance between the smallest value and the largest value, including any outliers. In the boxplot above, data values range from about -700 (the smallest outlier) to 1700 (the largest outlier), so the range is 2400. If you ignore outliers, the range is illustrated by the distance between the opposite ends of the whiskers - about 1000 in the boxplot above.
- Interquartile range(IQR). The middle half of a data set falls within the interquartile range. In a boxplot, the interquartile range is represented by the width of the box (Q3 minus Q1). In the chart above, the interquartile range is equal to 600 minus 300 or about 300.
And finally, boxplots often provide information about the shape of a data set. The examples below show some common patterns.
2 / 4 / 6 / 8 / 10 / 12 / 14 / 16/
2 / 4 / 6 / 8 / 10 / 12 / 14 / 16
/
2 / 4 / 6 / 8 / 10 / 12 / 14 / 16
Skewed right / Symmetric / Skewed left
Each of the above boxplots illustrates a differentskewnesspattern. If most of the observations are concentrated on the low end of the scale, the distribution is skewed right; and vice versa. If a distribution is symmetric, the observations will be evenly split at the median, as shown above in the middle figure.
Some general observations about box plots
- The box plot is comparatively short– see example (2). This suggests that overall there is not a great deal of variability within this data.
- The box plot is comparatively tall– see examples (1) and (3). This suggests the data has a large amount of variability.
- One box plot is much higher or lower than another– compare (3) and (4) – This could suggest a difference between groups. The overlap only occurs between the top 50% of #4 and the 3rd quartile. With 50% of the data in #3 being out of the range of #4 there may be a difference between these two sets of data.
- Obvious differences between box plots– see examples (1) and (2), (1) and (3), or (2) and (4). Any obvious difference between box plots for comparative groups is worthy of further investigation.
- The 4 sections of the box plot are uneven in size– See example (1). This shows that variability within that data set. Possibly something is unknown is affecting the spread of the data. The long upper whisker in the example means that data are varied amongst the top quartile (top 25%). The small lower whisker means there is not much variability within the lowest 25% of the data.
- Same median, different distribution– See examples (1), (2), and (3). The medians (which generally will be close to the average) are all at the same level. However the box plots in these examples show very different distributions of views.
It always important to consider the pattern of the whole distribution of responses in a box plot.
Helpful Video Links
Boxplot Basics
Comparing Boxplots
How to make boxplots in Microsoft Excel