2-way MPA Tutorial
Felipe Munoz-Rubke
3/29/2017
rm(list =ls()) # remove elements from global environment
cat("\014") # clean console
setwd("~/Dropbox/R - Research/R_Tutorial_MP_paper/") # set working directory
set.seed(1000)
analogR2 =function(dataset, MP_results){
aR2 =1 -(sum(abs(MP_results$res))) /(sum(abs(dataset -MP_results$overall)))
return(aR2)
}
boot_2D_ds_COL =function(dataset){
dimensions =dim(dataset)
new_matrix =as.data.frame(matrix(0, nrow = dimensions[1], ncol = dimensions[2]))
for (i in 1:dimensions[2]){
new_matrix[,i] =sample(dataset[,i], replace = T, size = dimensions[1])}
return(new_matrix)}
boot_2D_ds_ROW =function(dataset){
dimensions =dim(dataset)
new_matrix =as.data.frame(matrix(0, nrow = dimensions[1], ncol = dimensions[2]))
for (i in 1:dimensions[1]){
new_matrix[i,] =sample(dataset[i,], replace = T, size = dimensions[2])}
return(new_matrix)}
boot_2wayMP =function(dataset, numberboot){
dimensions =dim(dataset)
boot_overall =matrix(0, numberboot, 1)
boot_row =matrix(0, numberboot, dimensions[1])
boot_col =matrix(0, numberboot, dimensions[2])
for (i in 1:numberboot){
new_matrix =boot_2D_ds_ROW(ds)
MP_new_matrix =medpolish(new_matrix)
boot_overall[i,1] =MP_new_matrix$overall
boot_row[i,] =MP_new_matrix$row +MP_new_matrix$overall}
for (i in 1:numberboot){
new_matrix =boot_2D_ds_COL(ds)
MP_new_matrix =medpolish(new_matrix)
boot_col[i,] =MP_new_matrix$col +MP_new_matrix$overall}
overall_boot_results =quantile(boot_overall, probs =c(.025, .5, .975))
row_boot_results =apply(X =boot_row, MARGIN =2, quantile,
probs =c(.025, .5,.975), na.rm =TRUE)
col_boot_results =apply(X =boot_col, MARGIN =2, quantile,
probs =c(.025, .5,.975), na.rm =TRUE)
list(overall =overall_boot_results, row =row_boot_results, col =col_boot_results)}
Introduction
The data set included with this tutorial was NOT used in the paper.They can be downloaded from
The data set included here has dimensions 80x200 (80 participants x 200 words)
Words 1 to 50 belong to Category 1
Words 51 to 100 belong to Category 2
Words 101 to 150 belong to Category 3
Words 151 to 200 belong to Category 4
Read Data (saved as .csv files)
The first thing we need to do is to load the data set.
ds =read.csv("2wayMPA_data.csv")
categories =as.factor(c(rep("Cat1", 50), rep("Cat2", 50), rep("Cat3", 50), rep("Cat4", 50)))
2-way Median Polish Analysis (MPA)
To run a 2-way MPA, we use the R base function medpolish.
MP_2way =medpolish(ds)
## 1: 12769
## 2: 11759.5
## Final: 11754.88
We then look at the overall value for the entire 2-way model. This value is also known as mu. In this case, the value of mu = 7. This means that the median value across both dimensions (grandmedian) = 7.
MP_2way$overall
## [1] 7
We now focus on the row effects, which in this case represent the Participant Effects (PEs). The PEs are represented in a vector with 80 entries, each one representing one participant.
PE =MP_2way$row
To increase interpretability, we now add mu to each one of the PEs. This should put the PEs in a scale more familiar to us and therefore help us interprete the results. Following this, we plot the PEs.
The plot shows that the majority of participants used a similar approach when rating the words (bulk of participants concentrated in the upper part of the plot).
However, we also see that some of them gave lower values to the words. For instance, 5 participants showed PEs close to 5.
Another participant obtained a PE of 1, which probably means that this participants did not follow the instructions described by the rating scale.
Because the breakdown value of MPA is min (I/2, J/2) (I = # of rows, J = # of columns) and in this case I/2 = 40 and J/2 = 100, we now that this single outlier did not affect the results of MPA. Therefore, there is no need for us to remove this participant from the data set, or to remove any other participant for that matter, as MPA is accounting for these effects when calculating the Word Effects (WEs).
PE_rescaled =MP_2way$row +MP_2way$overall
plot(PE_rescaled, xlab ="Participants", ylab ="Participant Effects", ylim =c(1,7))
We use a similar procedure with respect to the WEs. The plot shows the 200 words and their respective WEs. The words have been color-coded according to the category to which they belong.
Results indicate that the black category was predominantly rated with 3s, but also that there is some variability within this category.
In contrast, words within the red, green, and blue categories were rated with 7s with no variability.
Importantly, this example shows that MPA can assign both more continuous and more discrete values to words.
WE =MP_2way$col
WE_rescaled =MP_2way$col +MP_2way$overall
plot(WE_rescaled, xlab ="Words", ylab ="Word Effects",
col = categories, main ="MPA results", ylim =c(1,7))
In contrast, analyses based on the sample mean find continua irrespective of the scales or stimuli used.
Here, the sample mean seems to differentiate between the red and green words and between the red and blue words, distinction that is not made by the 2-way MPA model.
Because the breakdown value of the sample mean is 0, and we previously identified one outlier while looking at the PEs, we know that the results of this sample mean analysis are mostly unreliable.
We could attempt to remove this outlier or other potential outliers by hand (like the participants with PEs close to 5), but this would bias our results.
In this situation, a better option is to trust the results of the MPA model and rest assured that is protecting us against the presence of at least 40 potential gross outliers.
plot(apply(ds, MARGIN =2, FUN = mean), xlab ="Words",
ylab ="Word Effects", col = categories, main ="Sample Mean Analysis", ylim =c(1,7))
Following this, we evaluate the MPA model fit by calculating its analogR2 value. In this case, the analog R2 = 0.59. This is a strong effect size and therefore implies that our MPA model is a good representation of the data.
analogR2(dataset = ds, MP_results = MP_2way)
## [1] 0.5852489
The next step is to create 95% boostrap CIs for each word. To do this we use the function boot_2wayMP, included at the beginning of this tutorial. This step can take several minutes depending on the number of simulations we decide to run. You can try with different values, but in this tutorial I am simulating only 200 data sets.
bCIs_2wayMP =boot_2wayMP(ds, 200)
The final step is to plot the data. To do this, we use the ggplot2 package.
library(ggplot2)
# bCIs_2wayMP$overall # CI for the overall value
# bCIs_2wayMP$row # CIs for the Participant Effects
# bCIs_2wayMP$col # CIs for the Word Effects
ds_for_plot =as.data.frame(cbind(1:dim(ds)[2], categories, t(bCIs_2wayMP$col)))
colnames(ds_for_plot) =c("Words", "Categories", "LowerBound", "Median", "UpperBound")
ds_for_plot$Categories =as.factor(ds_for_plot$Categories)
CIs_plot =ggplot(ds_for_plot, aes(x = Words, y = Median, color = Categories)) +
geom_point(size =3) +
geom_errorbar(aes(ymin =LowerBound, ymax =UpperBound)) +
theme_bw() +
scale_y_continuous(name ="Word Effects", limits =c(1, 7), breaks =seq(1,7,1)) +
ggtitle("Bootstrap CIs")
CIs_plot
This is only an example of how MPA can be used. Of course, it is possible to use MPA to analyze different data sets and answer innumerable questions.