1). Can the formal hypothesis testing approach be used for nonparametric tests? How are parametric and nonparametric statistics different? How are parametric and nonparametric statistics similar?

Both kinds of tests can be used for formal hypothesis testing including the 5 step approach. The primary difference between them is that the parametric tests assume a particular distribution such as the normal or t distribution. Nonparametric tests do not assume this. Both are similar in that they use information about a sample to draw inferences about a population.

2). Under what circumstances must a nonparametric test be used? Explain. What are the strengths and weaknesses of nonparametric tests? Can the outcomes of nonparametric tests be generalized to populations?

Nonparametric tests must be used when the data don’t satisfy the normal assumption of the parametric test. They can also be used for medians or for contingency tables that don’t fall into any distribution.

Nonparametric tests are useful because they don’t assume any particular distribution. I can use them when my data is non-normal, such as in a test for means.

Nonparametric tests can also be used when the distribution is to be drawn from the data. For example, in the chi-square test for independence, I can’t use a test that relies on a particular distribution because the distribution is always different.

Another advantage of a non-parametric test is that it can be used for measures such as median that don’t have any equivalent parametric test. Unlike the mean, sample medians do not fall into a normal distribution, because medians do not use all of the available information in the sample.

The main weakness of a nonparametric test is that it may have less power than the equivalent parametric test.

3). Why do you use the chi-square statistic? What type of data is used with chi-square analysis?

The chi-square statistic can be used in two different but related contexts. The first is to find out whether two categorical variables are independent of one another. It relies on the knowledge that if we have two independent events x and y, then P (x n y) = P(x) * P(y). From a contingency table of observed values we can calculate the expected values. The chi-square statistic uses the differences between observed and expected values to determine a probability that the differences would be by chance, if the two variables were independent.

The second context is a goodness of fit test. The expected values in this case are drawn not from the observed data, but from a previously claimed probability distribution. The variable is tested to see whether its divergence from the expected distribution is likely to have occurred by chance in the sample, if it were drawn from a population with that distribution.

The data used with a chi-square statistic must be categorical. Numerical data can be binned into categories if necessary. For example, I could use a continuous variable like “weight” and bin it into groups such as “Under 1 pound”, “1 to 5.9 pounds”, “6 to 9.9 pounds”, “10 pounds or more”

What are the nonparametric tests that correspond to each type of parametric test.

Correlation > Spearman R

Linear regression > Nonparametric regression

Dependent t-test > Wilcoxon matched pairs test

Independent samples t-test > Mann-Whitney U test

ANOVA > Kruskal-Wallis or Chi-square

The Wilcoxon test calculates the difference between pairs of data and uses the result to determine the likelihood of that the median difference would be that far from zero if the null hypothesis were correct.

First you compute the difference between each set of pairs. Then you rank the absolute value of differences. Then you sum the positive and negative ranks and check to see how similar the two sums are. If they are essentially the same, then the median is close to 0 and the p value is high. If they are very different, the median is different from 0 and the p value is low, so the null hypothesis may be rejected.