|PART ONE -- Essentials| -- Inference (Estimation and Hypothesis Testing) From Small Samples

Note:This part of the course deals with estimation and hypothesis testing, which were introduced in Stats 1, Part 5. Review the outline for that section if necessary. The material in Stats 1, Part 5 is to be considered part of Stats 2, Part 1.

Interval estimation and hypothesis testing

Two Types of Problems

Means--one-group; two-group

t-distribution

Symmetrical with center concentration, but not as concentrated as the normal distribution

Lower in the center and higher in the tails than the normal distribution

Degrees of freedom--expresses the sample size

One-group problems, (n-1); two-group problems, (n1+n2-2) or [(n1-1) + (n2-1)]

On the 4-Column formula sheet, columns 1 and 2 may be used, with the substitution of

"t" for "z". Logic is identical to chapters 6, 7 and 8 large-sample sections.

Unpaired design for two-group problems

Sample items for each group selected randomly

A difference between the means of groups might be due to experimental "treatment" or

might simply be due to the fact that the members of the two groups were different.

Treatment--intentional difference between groups being tested, e.g., in a

pharmaceutical test, drug group vs. non-drug group

Confounding variable--uncontrolled factor that might be causing an observed

difference between groups

Paired-difference design for two-group problems

Purpose--to eliminate "confounding" variables and isolate the variable of interest

Ideal--keep everything constant except the variable under investigation.

Same subjects are tested twice--before and after the experimental treatment.

Difference therefore cannot be due to the members of the groups being different.

Four assumptions

Samples

Random

Independent (in two-group unpaired experiments)

Populations

Normally distributed

Equal variances (in two-group unpaired experiments)

Moderate departures from the assumptions will not seriously affect validity.

A test with this characteristic is called "robust."

If the assumptions are seriously violated, two approaches may be taken

Increase sample to a "large" size (then, population assumptions need not be met).

Use nonparametric tests (which have no population assumptions).

Inferences regarding variances

One-group inference regarding the variance--uses "chi-square" (χ2) distribution

Estimation and hypothesis testing are possible regarding the variance of one group.

In hypothesis testing, the Ho is that σ2 is equal to some specified value.

Two-group inference regarding the variances--uses "F" distribution.

Variances are compared by division (ratio), rather than by subtraction (difference)

Estimation and hypothesis testing are possible regarding the variances of two groups.

In hypothesis testing, the Ho is that σ12 is equal to σ22.

As a ratio, this would mean σ12 / σ22 = 1.

Terminology--explain each of the following:

inferential statistics, sample mean, population mean, estimator, estimate, unbiased estimator, point estimate, interval estimate, confidence interval, degree of confidence, confidence level, error factor, required sample size, upper confidence limit, lower confidence limit, hypothesis test, null hypothesis, alternate hypothesis, type I error, α, type II error, β, calculated-t (test statistic), critical region, table-t (critical value of t), rejection of the null hypothesis, non-rejection of the null hypothesis, p-value, hypothesis-test conclusion, independent samples, standard error of the difference, paired difference design, confounding variable, chi-square distribution (purpose), F distribution (purpose)

Skills and Procedures

given appropriate data, conduct estimation and hypothesis testing on the population mean of one group, involving these steps:

make a point estimate of a population mean

compute the sampling standard deviation (standard error) of the sample means

compute and interpret the error factor for the interval estimate for the 90%, 95% and 99% confidence levels, using the t distribution

state the null and alternate hypotheses regarding the population mean

determine the table-t (critical value of t) for alpha levels of 0.10, 0.05 and 0.01

compute the calculated-t (test statistic)

draw the appropriate hypothesis-test conclusion based on the given level of α, the table-t (critical value) and the calculated-t (test statistic)

interpret the conclusion

determine and interpret the p-value

given appropriate data, conduct estimation and hypothesis testing on the population means of two groups, involving these steps:

make a point estimate of the difference between population means

compute the sampling standard deviation (standard error) of the difference between sample means

compute and interpret the error factor for the interval estimate for the 90%, 95% and 99% confidence levels

state the null and alternate hypotheses regarding the difference between population means

determine the table-t (critical value of t) for alpha levels of 0.10, 0.05 and 0.01

compute the calculated-t (test statistic)

draw the appropriate hypothesis-test conclusion based on the given level of α, the table-t and the calculated-t

interpret the conclusion

determine and interpret the p-value

given appropriate data, conduct estimation and hypothesis testing on the population means of two groups in a paired-difference design, involving these steps:

make a point estimate of the difference between population means by computing the average of the differences

compute the sampling standard deviation (standard error) of the difference between sample means

compute and interpret the error factor for the interval estimate for the 90%, 95% and 99% confidence levels

state the null and alternate hypotheses regarding the difference between population means

determine the table-t (critical value of t) for alpha levels of 0.10, 0.05 and 0.01

compute the calculated-t (test statistic)

draw the appropriate hypothesis-test conclusion based on the given level of α, the table-t and the calculated-t

interpret the conclusion

determine and interpret the p-value

Concepts--

explain why a confidence interval becomes larger as the confidence level increases

explain why a confidence interval becomes smaller as the sample size increases

describe the nature of the trade-off between precision and cost

identify the type of error that is made if the null hypothesis is "the defendant is innocent," and an innocent defendant is erroneously convicted

identify the type of error that is made if the null hypothesis is "the defendant is innocent," and a guilty defendant is erroneously acquitted

explain why a researcher seeking to reject a null hypothesis may tend to prefer a one-sided alternative hypothesis

explain how the paired-difference design eliminates a confounding variable

explain what happens to the t distribution as the sample size becomes smaller

explain what happens to the t distribution as the sample size becomes larger

describe how the t distribution is similar to the normal distribution

describe how the t distribution differs from the normal distribution

What to say and how to say it:

INSERT A NUMBER WHEREVER THERE ARE PARENTHESES ( ).

Column 1--mean, one group

Ho Rejected

The difference between the sample mean, (xbar), and the null hypothesis, (μHo), is

statistically significant at the (α) level. The population mean is probably not (μHo).

Ho not rejected:

The difference between the sample mean, (xbar), and the null hypothesis, (μHo), is

not statistically significant at the (α) level. The population mean could be (μHo).

Column 2--means, 2 groups (or paired-difference design)

Ho rejected:

The difference between the sample means, (xbar1-xbar2), is statistically significant

at the (α) level. The population means are probably not equal.

Ho not rejected:

The difference between the sample means, (xbar1-xbar2), is not statistically significant

at the (α) level. The population means could be equal.

THE tDISTRIBUTION

CENTRAL LIMIT THEOREM SAMPLING DISTRIBUTIONS OF:

MEANS

DIFFERENCES BETWEEN MEANS

PROPORTIONS

DIFFERENCES BETWEEN PROPORTIONS

ARE ESSENTIALLY NORMAL REGARDLESS OF THE SHAPE OF THE POPULATION DISTRIBUTION, WHEN SAMPLE SIZES ARE LARGE (n  30).

WHEN SAMPLE SIZES ARE SMALL (n < 30), SAMPLING DISTRIBUTIONS ARE NO LONGER NORMAL.

THEY FOLLOW tDISTRIBUTIONS:

SYMMETRICAL,

LOWER AND WIDER THAN THE NORMAL DISTRIBUTION,

LESS CONCENTRATED IN THE CENTER.

tDISTRIBUTION SHAPE VARIES AS n CHANGES.

THE SMALLER THE n, THE LESS CONCENTRATED IN THE CENTER.

SAMPLE SIZE IS EXPRESSED BY DEGREES OF FREEDOM

df = (n 1).

tDISTRIBUTION TABLE

COLUMN HEADINGS ONESIDED AND TWO-SIDED TAIL AREAS:

BODY OF TABLE CONTAINS tVALUES (ANALOGOUS TO zVALUES)--THE NUMBER OF STANDARD DEVIATIONS FROM THE MEAN

tVALUES APPROACH zVALUES AS n INCREASES.

BOTTOM ROW OF THE tTABLE CONTAINS zVALUES.

AS n DECREASES, tVALUES INCREASE.

DUE TO THE LESSER DEGREE OF CENTER CONCENTRATION, AS THE SAMPLE SIZE DECREASES, ONE MUST MOVE FARTHER FROM THE MEAN IN ORDER TO ENCLOSE A GIVEN PORTION OF THE DISTRIBUTION.