Activity: Standard Deviation of Participant Heights
Step 1. Write down our sample data points. These are the heights (in inches) of five fictional participants in an ITEP training course.
60
72
62
64
74
Step 2.Calculate the mean of our sample data points.
Sum of the heights = 332
Divide by the number of heights we measured (5)
The mean = 66.4 inches
Step 3.Write down the amount that each sample data point deviates from the mean.
60 - 66.4 = - 6.4
72 - 66.4= 5.6
62 - 66.4=- 4.4
64 - 66.4 =- 2.4
74 - 66.4 = 7.6
Sum = 0.0
Because of positive and negative values the sum of the deviations is zero. If we try to average these deviations we will get an average of zero. This does not tell us much about how our data as a whole deviates from the mean.
Step 4. So, let’s square the deviations to get rid of the positive and negative signs. All the values will now be in square inches.
- 6.4=40.96
5.6=31.36
- 4.4=19.36
- 2.4= 5.76
7.6=57.76
Step 5. Now sum the squared deviations.
40.96
31.36
19.36
5.76
+ 57.76
155.20
The sum of the squared deviations =155.20 square inches.
This value is called Sum of Squares and is much-used in statistical work.
Step 6. Divide the sum of squares by the number of data points (5).
The result is 31.04 square inches.
This is the mean of the squared deviations.
Other names for this number are Mean Square or Variance.
Variance is much-used in statistical work.
Step 7. Since variance is still a squared value, we need to “un-square” it in order to return to the units (inches) of our original data set.
The square root of the variance = 5.57 inches. We can round this to 5.6 inches.
This is the Standard Deviation for our data set. Standard deviation is a statistic that tells you how tightly all the data are clustered around the mean.
The calculation we have just performed can be symbolized in the following way:
In words, we can say: the standard deviation is the square root of the average of squared deviations from the mean.
Standard Deviation and Normal Curves
Mean
For a data set, the mean is the sum of all the data values divided by the number of values.
The mean is useful for understanding our data; however, there are additional statistics that can increase our understanding of the data.
Range
Therange tells us the entire spread of our data from lowest value to highest value. However, the range can be unduly influenced by a single outlier.
Standard Deviation
A better measure of the dispersion of the data around the mean will take into account all the data values, but those values that show up more frequently will have more weight. Standard Deviation is such a measure.
When the data are bunched together and the bell-shaped curve is steep, the standard deviation is small. When the data are spread apart and the bell curve is flattened, the standard deviation is large.
Diagram 1
A common assumption of statistics-users is that data are "normally" distributed. If the variation in the data is truly random, it will fit a normal (bell-shaped) curve. Of course, for many data sets the variation is not truly random, but that requires more complicated statistics. For the purpose of this introduction, we will assume the data are normally distributed.
For data that are "normally distributed" we expect that about 68.3% of the data will be within 1 standard deviation from the mean. In general there is a relationship between the percentage of the included data and the number of standard deviationsfrom the mean. This relationship is illustrated in Table 1 and Diagram 2.
Table 1
Percent of the data / Number of Standard Deviations from the Mean50.0
68.3
90.0
95.0
95.4
99.0
99.7 / 0.674
1.000
1.645
1.960
2.000
2.576
3.000
Diagram 2
For more information on statistics visit the following websites.
Revised 12/3/14 pe