Data Analysis
Statistics Test Scores
Data Definition
The data we choose to study for this project were student test scores in a second year statistics class held at HuangHuaiUniversity. The data consists of a population size of
N = 296 with a sample size n = 20. The data was taken from the historical records we found in Elementary Statistics 8th Edition text book by Mario Triola. The data has four parameters which included age, sex, text scores and quiz scores. We choose to study test scores for our data analysis.
Sampling Distribution
To perform our data analysis we were asked to select a minimum sample of n = 20 for our sample data set. To gather these sample data we used the Systematic Sampling method for selecting random data from the population. This means we will select the kth student from the population in succession until we have a sample n = 20.
For our selection method we selected every 12th test score from the population until we obtained a sample of 20 observations. Our selection began with the 12thobservation and followed with the 24th, 36th, 48th, etc. We used these data for our Data Analysis and statistics calculations.
Selected Sample Data
Test Scores
49725681726967748576
80597273665279617760
Stem-Leaf Distribution
x
x
x
xx
x x
x xxx
x xxx
x x xxx
405060708090
Data Analysis
Mean: 69Median: 72Mode:72
Standard Deviation: 10.05
Range:High = 85Low = 49Range = 36
Rule of Thumb:
___-2d______-1d______Mean______1d______2d___
48.9 69 89.1
Decision Rule:
If a selected test score is between 89.1 and 48.9, it is considered Ordinary.
If a selected test score is more than 89.1 or less than 48.9, it is considered Unusual.
Discussion of Data Results
From our sample data analysis we can see that the data we selected has a near normal distribution. The Median and Mode are identical (72) and the Mean is very close also (69). When we look at the Stem-Leaf diagram we can see the normal curve distribution more clearly. With repeated or a larger sample we may see that the distribution of the population is also near normal.
To check our sample, we choose one additional observation from the population data set to compare to our Rule of Thumb analysis. We selected the 254th observation to be consistent with our random sampling discussed earlier. The test score selected was 62. This score was consistent with our Rule of Thumb as it fell in our “ordinary” range. We can assume our range for the Rule of Thumb is 95% accurate.
The sample data is near normal, but we can not definitely conclude our data is accurate. It is only a sample of n = 20 out of a population of N = 296 and could have some problems. To ensure the accuracy of our data as a normal distribution we would need to do more sampling of the data. Our mean is not equal to the median and mode, so this means there is a small negative skew (mean is below the median). Therefore, we can only conclude that the data is near normal and do more testing to see if the mean will eventually be equal to the median and mode.