Chicks and 2 Independent Sample Tests (T-Test and Mann Witney)

Chicksand2independentsampletests(t-testandMannWitney)

StephenLCowen

2017

The assingment:

Walk through all of the examples below in R - generating all figures.

Answer ALL questions that begin with Q: and are in boldYou can saveyour answers in an R file for this week's homework: HW8.R (i think we're upto 8 now). Make sure that this file has all of the data required to run yourcode. For example, be sure to do somethign like this at the top of your .Rfile...

library('tidyverse') library('ggthemes') data('ChickWeight')

The case study:

You are a farmer and are testing 2 diets to determine if one makes yourchicks fatter than the other. Fat chicks = healthy chicks. Note the amazing ascii art below.

# __//
# /.__.\
# \ \/ /
# '__/ \
# \- )
# \_____/
# __|_|____
# " "

Topics covered:

•Tests for normality and plots that help assess normality.

•How to make error bar plots using different CIs (the most common plot ever)

•Fisher's z transform for correlations

•Interpreting t-tests and the output (this was covered more in the Sleep case study).

•Effect sizes for between-subject t-tests

•Non-parametric tests.

•How to write up the results.

Let's Go!

First, call some required libraries...

library('tidyverse')
library('ggthemes')
theme_set(theme_gdocs( base_size =14))
rm(list =ls()) # this clears variables in case any are already loaded.

The 2-sample t-tests

As you may recall, the one-sample t-test determines if your sample issignificantly different from some single fixed value such as zero or an IQ of100. Also recall that the paired t-test is the same thing as a one-samplet-test when the one-sample test is performed on the difference between eachsample in groups 1 and 2 (not on the original sample data themselves). Incontrast to the one-sample test, the2-sample t-testcompares 2 samplesto determine if their means are different. How does this work?

CASE: Chicks!

You are a farmer and are testing 2 diets to determine if one makes yourchicks fatter than the other.

data('ChickWeight')

Let's look at all of the data - a good place to start. We will then simplifythis dataset into something more maneagable.

ChickWeight =as.tibble(ChickWeight)

Plot of growth curves...

ggplot(ChickWeight,aes(x = Time,y = weight, color = Chick)) +
geom_point() +geom_smooth() +facet_wrap(~Diet)

ChickWeightCorrs <-ChickWeight %>%group_by(Chick, Diet) %>%summarise()

I am mystified regarding why lines don't connect the data for the first dietin the upper left, but I will pretend that this did not happen.

As you can see, there are 4 diets in this set but we will just test 2 by asubset of the data frame. If we wanted to test all at once, we would use aone-way ANOVA.

Let's simplify to 2 categories - since we're doing t-tests

ChickWeight1 =subset(ChickWeight, Diet == "1"|Diet == "2" )
ChickWeight1$Diet =droplevels(ChickWeight1$Diet) # This gets rid of unused levels.

Let's check our sample sizes

table(ChickWeight1$Diet)

##
## 1 2
## 220 120

summary(ChickWeight1)

## weight Time Chick Diet
## Min. : 35.0 Min. : 0.00 13 : 12 1:220
## 1st Qu.: 59.0 1st Qu.: 4.00 9 : 12 2:120
## Median : 91.5 Median :10.00 20 : 12
## Mean :109.7 Mean :10.64 10 : 12
## 3rd Qu.:147.2 3rd Qu.:16.00 17 : 12
## Max. :331.0 Max. :21.00 19 : 12
## (Other):268

Plot some data

ggplot(ChickWeight1, aes(x = Diet,y = weight, fill = Diet)) +
geom_boxplot() +geom_jitter(width = .1)

Compute the mean weight and measure of growth rate per chick.

We will compute 2 values that should indicate the quality of the food:

•the mean weight of the chick (averaged over all time)

•the correlation between the time and the weight. The assumption is that astronger positve correlation will mean a better diet.

•the slope of the time x weight curve. Like correlation, we predict that asteeper slope will indicate a better diet.

In theory, we could do the slope and mean simultaneously by using a mixedeffects model. That's right, but it's also more complicated and we can savethat for later. For now, let's keep things simple and easy to interpret. Knowthat, in these data, time is a within-subject random effect such that time Xweight could have a slope and intercept and correlation. The between-subjectfactor is diet.

To show you how to compute the slope (time x weight) for each chick we mustfirst create a function that spits out the slope computed by fitting aregression line (least-squares line) to the time x weight data. I'll call itbeta.

beta =function(x,y){
# Return the slope - the beta coefficient - of the fit model.
df =data.frame(y,x)
mod =lm(y~x,data = df) # lm = linear model. Regression.
return(mod$coefficients[2]) # just return the beta, not the intercept.
}

Here is how we compute the mean, correlation coefficient, and beta values for all chicks inone elegant line of code. Again, be in awe at the power of summarise. Trythis in excel. Just kidding, please don't.

ChickWeight2 <-ChickWeight1 %>%group_by(Chick, Diet) %>%summarise(mn_weight =mean(weight), r =cor(Time, weight), beta =beta(Time,weight))

That was awesome! I get really excited about simple, elegant code.

ChickWeight2

## # A tibble: 30 x 5
## # Groups: Chick [?]
## Chick Diet mn_weight r beta
## <ord> <fctr> <dbl> <dbl> <dbl>
## 1 18 1 37.00000 -1.0000000 -2.000000
## 2 16 1 49.71429 0.8466671 1.053571
## 3 15 1 60.12500 0.8957741 1.898810
## 4 13 1 67.83333 0.9708592 2.239601
## 5 9 1 81.16667 0.9145355 2.663137
## 6 20 1 78.41667 0.9951571 3.732718
## 7 10 1 83.08333 0.9976746 4.066102
## 8 8 1 92.00000 0.9816516 4.827273
## 9 17 1 92.50000 0.9975335 4.531538
## 10 19 1 86.75000 0.9600475 5.087430
## # ... with 20 more rows

Let's check our sample sizes.

table(ChickWeight2$Diet)

##
## 1 2
## 20 10

Plot the data using boxplots.

ggplot(ChickWeight2, aes(x = Diet,y = mn_weight, fill = Diet)) +geom_boxplot() +geom_jitter(width = .1)

Plot slopes

ggplot(ChickWeight2, aes(x = Diet,y = beta, fill = Diet)) +geom_boxplot() +geom_jitter(width = .1)

Plot r values

ggplot(ChickWeight2, aes(x = Diet,y = r, fill = Diet)) +geom_boxplot() +geom_jitter(width = .1)

WTHeck!! there are some major outliers here in the r values: -1 correlation. Those chicks didnot respond well to that diet. They must have shrank (see Dr. Shrinker forthose of you who know 70's Kroft Supershow.). "I chase the Shrinkies.I catch the Shrinkies. The Shrinkies escape. It's a vicious cycle and it'sdriving me mad!" -Dr. Shrinker.

Deep.

I was 7.

Let's assume that those -1 correlations indicate bad datasets. Let's get ridof them.

How to delete rows in your data frame.

ChickWeight2 =ChickWeight2[ChickWeight2$r -.9,] # Do you understand this?

Let's regenerate the plot again - now without the wacko data.

ggplot(ChickWeight2, aes(x = Diet,y = r, fill = Diet)) +geom_boxplot() +geom_jitter(width = .1)

A little better.

Generate an old-school SEM error-bar plot...

Error bars (SEM) and let's play with themes...

ggplot(ChickWeight2, aes(x = Diet, y = mn_weight, fill = Diet)) +
stat_summary(fun.y=mean, geom ="bar",size =2,width=0.2) +
stat_summary(fun.data=mean_se, geom ="errorbar",size =2,width=0.2) +
ggtitle("Effect of Diet on Fatness") +theme_economist(base_size =15) +ylab("Weight (grams)") +xlab("Diet") +theme(legend.position="none")

Do the same with more modern bootstrap CIs (95%)

ggplot(ChickWeight2, aes(x = Diet, y = mn_weight, fill = Diet)) +
stat_summary(fun.y=mean, geom ="bar",size =2,width=0.2) +
stat_summary(fun.data=mean_cl_boot, geom ="errorbar",size =2,width=0.2) +
ggtitle("Effect of Diet on Fatness") +theme_fivethirtyeight(base_size =15) +
ylab("Weight (grams)") +xlab("Diet") +theme(legend.position="none")

Test for normality.

shapiro.test(ChickWeight2$mn_weight[ChickWeight2$Diet == "1"])

##
## Shapiro-Wilk normality test
##
## data: ChickWeight2$mn_weight[ChickWeight2$Diet == "1"]
## W = 0.97633, p-value = 0.8918

shapiro.test(ChickWeight2$mn_weight[ChickWeight2$Diet == "2"])

##
## Shapiro-Wilk normality test
##
## data: ChickWeight2$mn_weight[ChickWeight2$Diet == "2"]
## W = 0.95539, p-value = 0.7323

qqnorm(ChickWeight2$mn_weight[ChickWeight2$Diet == "1"])
qqline(ChickWeight2$mn_weight[ChickWeight2$Diet == "1"],col ="red")

qqnorm(ChickWeight2$mn_weight[ChickWeight2$Diet == "2"])
qqline(ChickWeight2$mn_weight[ChickWeight2$Diet == "2"],col ="red")

weights =ChickWeight2$mn_weight[ChickWeight2$Diet == "2"]

Let's plot a histogram and overlay the normal distribution.

H =hist(weights) # hist spits out all kinds of useful things. you can change bin size here if you wish.

print(H)

## $breaks
## [1] 60 80 100 120 140 160 180 200
##
## $counts
## [1] 1 0 4 2 2 0 1
##
## $density
## [1] 0.005 0.000 0.020 0.010 0.010 0.000 0.005
##
## $mids
## [1] 70 90 110 130 150 170 190
##
## $xname
## [1] "weights"
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"

The most useful things are mids, counts, and density.

x =seq(min(weights), max(weights),.1)
m<-mean(weights)
std<-sd(weights)
qplot(H$mids,H$density,geom ="line") +
geom_line(aes(x = x,dnorm(x,m,std)), color ="red")

Q: Now you do the same for the r value.

Q: What are the upper and lower limits of any r value? How does that almost guarantee that shapiro will reject the null?

As you might have noticed, the r values are not normally distributed. Thestandard way of dealing with this is thefisher z transform. This is alogistic function that spreads out the r values so that the are morenormally distributed. The exact equation is:.5(log(1+r) - log(1-r))

x =seq(-1,1,.01)
library(psych) # fisher z is in psych
qplot(x,fisherz(x) ,geom ='line') +xlab('r') +ylab('fisher z') +ggtitle("Fisher z transform of r")

Create a new variable in ChickWeight2 that has the transformed data... We should probably use this instead of r.

ChickWeight2$fisher_z =fisherz(ChickWeight2$r)

Compare the 2 histograms for kicks. the bluish thing is the fisher, the red isr

ggplot(ChickWeight2,aes(x = r)) +geom_histogram(fill ='blue')

fisher r

ggplot(ChickWeight2,aes(x = fisher_z)) +geom_histogram(fill ='red')

Note how it's more spread out. More normal perhaps.

Q: Write a function that uses the equation above to calculate the fisher transform of r = .82

Here it is again:.5(log(1+r) - log(1-r))

Double check this with the fisher z function.

Hypothesis testing

Let's get down to business and test our original hypothesis - that one food is better than another.

Do a 2 independent sample t-test for the mean weight and growth curves.

t.test(mn_weight~Diet, data = ChickWeight2)

##
## Welch Two Sample t-test
##
## data: mn_weight by Diet
## t = -1.7975, df = 16.798, p-value = 0.09025
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -46.429406 3.731787
## sample estimates:
## mean in group 1 mean in group 2
## 101.2679 122.6167

t.test(fisher_z~Diet, data = ChickWeight2)

##
## Welch Two Sample t-test
##
## data: fisher_z by Diet
## t = -1.0777, df = 18.756, p-value = 0.2948
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.7140261 0.2289187
## sample estimates:
## mean in group 1 mean in group 2
## 2.310966 2.553520

t.test(beta~Diet, data = ChickWeight2)

##
## Welch Two Sample t-test
##
## data: beta by Diet
## t = -1.5662, df = 15.851, p-value = 0.137
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5.5291623 0.8326404
## sample estimates:
## mean in group 1 mean in group 2
## 6.260875 8.609136

The p value tells us that we are close but no cigar (p > 0.05 in both cases).

Wait! Both tests say Welch? Welch's t-test is an adaptationof Student's t-test that is more reliable when the two samples have unequalvariances and unequal sample sizes. Equal variance is calledhomoscedasticityin statisticospeak.

You can force a classic t-test (not Welch) as follows (var.equal = TRUE),but only if you think that variances are near equal.

t.test(mn_weight~Diet, data = ChickWeight2, var.equal =TRUE)

##
## Two Sample t-test
##
## data: mn_weight by Diet
## t = -1.8588, df = 27, p-value = 0.07398
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -44.914014 2.216395
## sample estimates:
## mean in group 1 mean in group 2
## 101.2679 122.6167

degfree =t.test(mn_weight~Diet, data = ChickWeight2, var.equal =TRUE)$parameter
tval =t.test(mn_weight~Diet, data = ChickWeight2, var.equal =TRUE)$statistic

NOTE: results are almost the same. p is slightly lower here.

In the end, you probably should trust R on whether or not to use Welch. If ituses Welch, probably stick to Welch.

Q: Do the t test for mn_weight~Diet, but this time you have the strong hypothesis that diet 1 is better than diet 2. This is the only hypothesis you wish to test. How do you run the t-test in this case?

Q: Use pt to determine the p value for mn_weight~Diet if the t score was actually -2

Understanding the t-statistic for a 2 sample t-test

IN WORDS: t is the difference between the 2 means divided by the standard error.

The slightly tricky part is calculating the SE as each sample will have adifferent SE.

•Recall that standard error is the estimate of the spread of the samplingdistribution - you need both std and n.

see

Think about the equation: If you have a bigger sample size, it will reducethe variance more for that sample which, because it's on the denominator,increases the value of T, increasing the chance of rejecting the null.

Q: What is the second way, looking at the equation, to increase the chance of rejecting the NULL?

Plot the t distribution for the chick data.

x =seq(-5,5,.1)
qplot(x,dt(x,degfree),geom ='line') +geom_vline(xintercept = tval) +ggtitle('t distribution')

pt(tval,degfree)

## t
## 0.03699066

Effect sizes: Cohen's D

Because the result was not significant it is not necessary to calculate theeffect size. For practice, let's do it anyway...

library(effsize)
cohen.d(mn_weight~Diet, data = ChickWeight2)

##
## Cohen's d
##
## d estimate: -0.7262165 (medium)
## 95 percent confidence interval:
## inf sup
## -1.55135992 0.09892683

cohen.d(fisher_z~Diet, data = ChickWeight2)

##
## Cohen's d
##
## d estimate: -0.4182831 (small)
## 95 percent confidence interval:
## inf sup
## -1.2277768 0.3912106

Conclusion and reporting the result:

No effect of diet was observed on chick weight (two-sample t-test, t(27) =-1.9, p = 0.07).

Mann-Whitney U (Wilcoxon rank sum test) non-parametric test

Use this test if the data wasNOTnormal or purely ordinal (like rankingsin a race). You could also use permutation tests (more on this later).

•Because Mann-Whitney works on RANKS and not the raw data itself, it is notsubject to the effects of outliers (but crappy data is still crappy data soyou should get rid of it if it was because of something like a bad instrumentor flakey subject).

Don't confuse Wilcoxon rank sum with Wilcoxon signed rank test (the signedrank test is for paired data, like a paired t-test.) I confuse these all thetime.

Navarro's description of the 2 sample Mann-Whitney test: Compare everyobservation in A with EVERY observation in B and count the number in which Ais bigger than B (ignore A<=B). This count is your test statistic and thenull is a 'W' distribution. NULL would be equal numbers of A > B and A < B.The threshold becomes like a binomial test - (not really the same but sameidea).

Here it is:

wilcox.test(mn_weight~Diet, data = ChickWeight2, var.equal =FALSE)

##
## Wilcoxon rank sum test with continuity correction
##
## data: mn_weight by Diet
## W = 61.5, p-value = 0.1299
## alternative hypothesis: true location shift is not equal to 0

When reporting this, report the W statistic and p value.

Cliff's Delta: Effect size for Mann-Whitney U

Basically, it measures how often values in one sample are larger than valuesthe second distribution.

see

Because the result was not significant it is not necessary to calculate theeffect size. For practice, let's do it anyway...

library(effsize)
cliff.delta(mn_weight~Diet, data = ChickWeight2)

##
## Cliff's Delta
##
## delta estimate: -0.3526316 (medium)
## 95 percent confidence interval:
## inf sup
## -0.6850200 0.1012717

So it says we have a medium effect size when we know the difference was notsignificant. Confusing. NOTE: the scales of effect size are different betweencliff.delta and Cohen's d. Cohen's d is like std, but cliff's delta I believeis constrained to be between -1 1

Permutation test:

Let's say that you really were not happy about the distributions - that theywere not normal enough for you. As with almost every statistical test, thereis a permutation test alternative. The coin library has a few so let's choosea popular one - the oneway test.

Recall the core idea of a permutation test:create your own null distributionby ignoring category labels and recalculating the difference between means1000s of times. Then, determine where the actual difference lies on thisdistribution.

The B = 10000 indicates how many permutations you will perform. 10000 shouldbe more than enough.

library(coin)
perm_out =oneway_test(mn_weight~Diet, data = ChickWeight2, distribution =approximate(B =10000))
pvalue(perm_out)

## [1] 0.0724
## 99 percent confidence interval:
## 0.06587976 0.07933113

Q: Do the above permutation test with the fisher_r variable.

perm_out =oneway_test(fisher_z~Diet, data = ChickWeight2, distribution =approximate(B =10000))

Conclusions and one more micro assignment.

Shucks, all of that work and we never found any difference between diets 1and 2. It's possible that if we repeated these comparisons for diets 3 and 4we would find a difference. BUT what if we could test all 4 dietssimultaneously? Well that's next week's lecture. That's ANOVA (andKruskal-Wallis and GLMs and MLMs).

Q: As one last hurrah, perform a t-test comparing diets 3 and 4 using

both the fisher_z and mn_weight factors.

HINT: You will start by using the subset command to pull out diets 3,4. Therest is just following the tutorial.

## For those who are super geniuses and need more of a challenge...
- this really has nothing to do with t-test, but it sucks to not feel challenged.
1. Using the iris dataset, use kmeans to cluster the data into 2,3,4,and 5 groups.- how will you determine which number of clusters is the best?2. Now do this with hierarchical clustering - like random forest. How will you select the number of clusters?3. Use principal components analysis to determine how many components describe 80% of the variance. Create a scree plot.4. Use a mixed effects model to model with within-subject variance.

title: "CS_Chicks_and_2_sample_t_test.R"author: "cowen"date: "Sat Oct 21 18:22:26 2017"---