Appendix A – Instructional text control condition (translated to English)
Histograms and box plots
The distribution of a set of measurements of one variable can be presented and summarized in various ways. One could use descriptive statistics, but graphical representations are also possible. In this document we will discuss histograms and box plots.
Histograms
The vertical axis in a histogram shows the frequencies, while the horizontal axis shows the various values the represented variable can take. Each bar represents the frequency with which a certain value or the values in a certain interval were observed. The histogram below, for instance, shows the distribution of exam results of a group of students:
In this histogram we can see that only one student scored 2 out of 20, while two students scored 19 out of 20. A histogram thus shows the spread of the data distribution by showing its range. The mode of this distribution is 12. Also, we can see that this distribution is skewed to the left because of the longer “tail” at the left side as compared to the tail on the right side of the histogram. The median is not directly visible in the histogram, but can in this case be calculated. In histograms that show the frequencies of classes of values instead of frequencies of individual values, this is not possible.
The histograms below both have the same range and are both symmetrical, but still they have a different spread. While the left histogram has one mode, i.e. 7, the right histogram has two modes, i.e. 4 and 10. Because in the right histogram the observations are on average further away from the mean than in the left histogram, the standard deviation is largest in the right histogram.
Mean: 7Median: 7
Mode: 7
Minimum: 3
Maximum: 11
Range: 8
Standard deviation: 2,04 / Mean: 7
Median: 7
Mode: 4 and 10
Minimum: 3
Maximum: 11
Range: 8
Standard deviation: 2,86
Box plots
Another way to represent a data distribution is the 5-number summary, which can be graphically presented using a box plot. The 5-number summary encompasses the lowest observed value, the first quartile, the median, the third quartile, and the highest observed value. This means that the dataset is divided into four groups with (approximately) the same number of observations. Depending on the number of observations and the specific shape of the distribution, the groups will not always have the exact same size.
The box plot below shows the exam results with the different elements labeled. The horizontal lines that connect the minimum to the first quartile and the third quartile to the maximum, are called whiskers. The rectangle connection Q1 and Q3 is called the box.
In this box plot we can see that the minimum score is 2, the maximum score is 19 and the median is 12. We can also see that the distribution is skewed to the left as the left whisker and the left part of the box are larger than the box and the whisker on the right side. In the box plot we cannot see where the mode is situated. What we can see, is that approximately 25% of the students scored higher than 14 out of 20. We can also see the interquartile range, which in this case is (14 – 9 =) 5.
The box plots below both have the same range, but a different interquartile range. In the left box we see a smaller box than in the right box plot. In the left box plot the observations are hence more centered around the median than in the right box plot.
Mean: 7Median: 7
Mode: unknown
Minimum: 3
Maximum: 11
Range: 8
Standard deviation: unknown / Mean: 7
Median: 7
Mode: unknown
Minimum: 3
Maximum: 11
Range: 8
Standard deviation: unknown
Appendix B – Instructional text MERs condition (translated to English)
Histograms and box plots
The distribution of a set of measurements of one variable can be presented and summarized in various ways. One could use descriptive statistics, but graphical representations are also possible. In this document we will discuss histograms and box plots.
Histograms
The vertical axis in a histogram shows the frequencies, while the horizontal axis shows the various values the represented variable can take. Each bar represents the frequency with which a certain value or the values in a certain interval were observed. The histogram below, for instance, shows the distribution of exam results of a group of students:
In this histogram we can see that only one student scored 2 out of 20, while two students scored 19 out of 20. A histogram thus shows the spread of the data distribution by showing its range. The mode of this distribution is 12. Also, we can see that this distribution is skewed to the left because of the longer “tail” at the left side as compared to the tail on the right side of the histogram. The median is not directly visible in the histogram, but can in this case be calculated. In histograms that show the frequencies of classes of values instead of frequencies of individual values, this is not possible.
The histograms below both have the same range and are both symmetrical, but still they have a different spread. While the left histogram has one mode, i.e. 7, the right histogram has two modes, i.e. 4 and 10. Because in the right histogram the observations are on average further away from the mean than in the left histogram, the standard deviation is largest in the right histogram.
Median: 7
Mode: 7
Minimum: 3
Maximum: 11
Range: 8
Standard deviation: 2,04 / Mean: 7
Median: 7
Mode: 4 and 10
Minimum: 3
Maximum: 11
Range: 8
Standard deviation: 2,86
Box plots
Another way to represent a data distribution is the 5-number summary, which can be graphically presented using a box plot. The 5-number summary encompasses the lowest observed value, the first quartile, the median, the third quartile, and the highest observed value. This means that the dataset is divided into four groups with (approximately) the same number of observations. Depending on the number of observations and the specific shape of the distribution, the groups will not always have the exact same size. Departing from the histogram presented earlier, we can draw the 5-number summary. We start with a line on the median, which is 12 in this case.
This line divides the dataset in two groups of approximately the same size. By drawing lines on the values of Q1 and Q3, we divide the data distribution in four groups of approximately the same size:
From these four lines we are just a small step away from drawing a box plot. In order to complete our box plot, we draw some horizontal lines from the minimum to Q1 and from Q3 to the maximum, and we connect the vertical lines:
Without the histogram, the box plot of these exam results looks like this. The various elements of the box plot have been named. The horizontal lines that connect the minimum to the first quartile and the third quartile to the maximum are called whiskers. The rectangle that connects Q1 and Q3 is called the box.
In this box plot we can see that the minimum score is 2, the maximum score is 19 and the median is 12. We can also see that the distribution is skewed to the left as the left whisker and the left part of the box are larger than the box and the whisker on the right side. In the box plot we cannot see where the mode is situated. What we can see, is that approximately 25% of the students scored higher than 14 out of 20. We can also see the interquartile range, which in this case is (14 – 9 =) 5.
The box plots below both have the same range, but a different interquartile range. In the left box we see a smaller box than in the right box plot. In the left box plot the observations are hence more centered around the median than in the right box plot.
Mean: 7Median: 7
Mode: unknown
Minimum: 3
Maximum: 11
Range: 8
Standard deviation: unknown / Mean: 7
Median: 7
Mode: unknown
Minimum: 3
Maximum: 11
Range: 8
Standard deviation: unknown
The figures below make the difference between both distributions more clear. You can see the box plots shown above but now with the accompanying histograms.
Appendix C – Instructional text refutational text condition (translated to English)
Histograms and box plots
The distribution of a set of measurements of one variable can be presented and summarized in various ways. One could use descriptive statistics, but graphical representations are also possible. In this document we will discuss histograms and box plots.
WARNING!!! The box plot is a graphical representation that is difficult to interpret for many students.
Histograms
The vertical axis in a histogram shows the frequencies, while the horizontal axis shows the various values the represented variable can take. Each bar represents the frequency with which a certain value or the values in a certain interval were observed. The histogram below, for instance, shows the distribution of exam results of a group of students:
In this histogram we can see that only one student scored 2 out of 20, while two students scored 19 out of 20. A histogram thus shows the spread of the data distribution by showing its range. The mode of this distribution is 12. Also, we can see that this distribution is skewed to the left because of the longer “tail” at the left side as compared to the tail on the right side of the histogram. The median is not directly visible in the histogram, but can in this case be calculated. In histograms that show the frequencies of classes of values instead of frequencies of individual values, this is not possible.
The histograms below both have the same range and are both symmetrical, but still they have a different spread. While the left histogram has one mode, i.e. 7, the right histogram has two modes, i.e. 4 and 10. Because in the right histogram the observations are on average further away from the mean than in the left histogram, the standard deviation is largest in the right histogram.
Median: 7
Mode: 7
Minimum: 3
Maximum: 11
Range: 8
Standard deviation: 2,04 / Mean: 7
Median: 7
Mode: 4 and 10
Minimum: 3
Maximum: 11
Range: 8
Standard deviation: 2,86
Box plots
Another way to represent a data distribution is the 5-number summary, which can be graphically presented using a box plot. The 5-number summary encompasses the lowest observed value, the first quartile, the median, the third quartile, and the highest observed value. This means that the dataset is divided into four groups with (approximately) the same number of observations. Depending on the number of observations and the specific shape of the distribution, the groups will not always have the exact same size.
The box plot below shows the exam results with the different elements labeled. The horizontal lines that connect the minimum to the first quartile and the third quartile to the maximum, are called whiskers. The rectangle connection Q1 and Q3 is called the box.
In this box plot we can see that the minimum score is 2, the maximum score is 19 and the median is 12. We can also see that the distribution is skewed to the left as the left whisker and the left part of the box are larger than the box and the whisker on the right side. In the box plot we cannot see where the mode is situated. What we can see, is that approximately 25% of the students scored higher than 14 out of 20. We can also see the interquartile range, which in this case is (14 – 9 =) 5.
WARNING!! When you look at a box plot you might think that a larger part of the box represents more results than a smaller part of the box . This is incorrect! In each of the four parts of a box plot (approximately) the same number of observations is represented. The size of the parts of the box do hence not give an indication of the number of represented observations.
A larger part of the box, like in our example between 9 and 12, means that the observations here are more spread out. Between 12 and 14 the observations are hence less spread out. In each of the four parts of this box plot approximately 25% of the exam results are represented.
The box plots below both have the same range, but a different interquartile range. In the left box we see a smaller box than in the right box plot. In the left box plot the observations are hence more centered around the median than in the right box plot.
Mean: 7Median: 7
Mode: unknown
Minimum: 3
Maximum: 11
Range: 8
Standard deviation: unknown / Mean: 7
Median: 7
Mode: unknown
Minimum: 3
Maximum: 11
Range: 8
Standard deviation: unknown
WARNING!! You might think that (the box of) the left box plot represents less observations than (the box of) the right box plot as the right box plot has a larger box. This is, like explained earlier, not the case! You cannot read the number of observations off from a box plot. The size of the box only tells you something about the spread of the observations: a larger part of the box shows that the observations are more spread out.
Appendix D – Instructional text combination condition (translated to English)
Histograms and box plots
The distribution of a set of measurements of one variable can be presented and summarized in various ways. One could use descriptive statistics, but graphical representations are also possible. In this document we will discuss histograms and box plots.
WARNING!!! The box plot is a graphical representation that is difficult to interpret for many students.