Chapter 7 Sampling and Sampling Distribution
Example:
There are 2500 managers in the electronic associates. The average annual salary of 2500 managers is $51800. That is,
where is the i’th manager’s annual salary. Also, assume the population variance of the salary data is
.
Assume that 1500 of the 2500 managers have completed the training program. Then,
Objective: we try to just use part of the data (because it is too costly and time consuming to use all the data) and thus obtain the accurate guess for the population mean, variance and proportion.
7.1 Simple Random Sampling
Simple random sampling from finite population:
A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected.
Note: the total number of random samples is . Thus, the probability of a specific random sample being selected is .
Note: the above sampling is also called a random sample without replacement since we did not place a selected element back into the population. That is, the sample element can not be selected twice.
Simple random sampling from infinite population:
l Each element selected comes from the same population.
l Each element is selected independently.
7.2 Point Estimate
Example:
Suppose we randomly select 30 managers as a sample. Let be the annual salaries of the 30 managers (a sample). Suppose 19 of them have completed the training program. Thus,
and
Note: and are sensible estimates of and .
Point estimate:
A point estimate is a statistic based on a sample of size n (not necessary to be simple random sample) from a finite population of size N. Suppose
is the sample. Then,
and the sample proportion are point estimates of the population mean , the population variance , and the population proportion p, respectively.
Note: , and are not the only estimates. They are just some sensible estimates.
7.3 Sampling Distributions
Example:
Suppose we sample 500 times and let
be 500 simple random samples of 30 managers. Let be the random variable representing the average salary of a random sample of 30 managers. Then, are 500 possible values of . Note there are . possible for random variable .
Properties of :
Let be the random variable representing the average of a random sample. Then,
1.
2. For finite population with population size N and the sample size n, then the standard deviation of is
.
For infinite population (the infinite population size),
.
Note: as then . Thus, . That is, the standard deviation of for the finite population is approximately equal to the one for the infinite population.
Sampling Distribution of :
As or the population is normally distributed, then
, where . Thus, for some constants ,
where Z is the standard normal random variable.
Example:
What is the probability of the difference between the sample mean and the population mean will be less or equal to 500 as the sample size
[solution]
. Thus,
There is 50.36% chance that the difference between the sample mean and the population mean is not more than 500.
Sample Size and Sampling Distribution:
Since increasing the sample size will decrease the standard error!! Thus, the larger the sample size is, the larger is (since the interval
is larger than the one with smaller sample size)!!
Example:
What is the probability of the difference between the sample mean and the population mean will be less or equal to 500 as the sample size
[solution]
. Thus,
As the sample size increases to 100, there is 78.88% chance that the difference between the sample mean and the population mean is not more than 500. That is, the larger sample size will provide a higher probability that the value of the sample mean will be within a specific distance of the population mean.
Properties of :
Let be the random variable representing the proportion of a random sample (the sample proportion of a random sample is one possible value of ). Then,
1.
2. For finite population with population size N and the sample size n, then the standard deviation of is
.
For infinite population (the infinite population size),
.
Note:
As then . Thus, . That is, the standard deviation of for the finite population is approximately equal to the one for the infinite population.
Sampling Distribution of :
As
, where . Thus, for some constants
where Z is the standard normal random variable.
Note: Since increasing the sample size will decrease the standard error!! Thus, the larger the sample size is, the larger is (since the interval is larger than the one with smaller sample size)!!
Example:
What is the probability of the difference between the sample proportion and the population proportion will be less or equal to 0.05 as the sample size What is the probability as we increase the sample size to 100?
[solution]
. Thus,
There is 42.46% chance that the difference between the sample proportion and the population proportion is not more than 0.05 as ..
As sample size is increased to 100, then Thus,
There is 69.22% chance that the difference between the sample proportion and the population proportion is not more than 0.05 . That is, the larger sample size will provide a higher probability that the value of the sample proportion will be within a specific distance of the population proportion.
7.4 Other Sampling Methods
1. Stratified Random Sampling:
l The population is divided into groups of elements called strata according to some “characteristic” of the data.
l A simple random sample is taken from each stratum.
How to determine the sample size in each stratum?
l According to the size of the stratum.
l According to the variance of each stratum.
2. Cluster Sampling:
l The population is divided into several separate groups of elements called clusters.
l A simple random sample of the clusters is taken. All elements within these sampled or “selected” clusters are in the sample.
Note: one of the primary applications of cluster sampling is area sampling, where clusters are city blocks or other well-defined areas!!
3. Systematic Sampling:
l Select randomly one of the first elements, where n and N are the sample size and the population size, respectively.
l Starting from the first selected element, select every ’th element after the first element.
Example:
Suppose are the elements in the population. Since , by using systematic sampling, we should select randomly one from the first 100 elements first. Suppose the third element is selected, i.e. . Then, select every 100’th element after , thus
1