Chapter 3: Descriptive Statistics: Bivariate Correlation
Bivariate Correlation: Relationship between two variables
Concerned with two things:
- Is there a relationship?
- How strong/weak is the relationship? (if there is)
Can be one of three types:
- High-high, low-low
- High-low, low-high
- Little systematic tendency
Correlation Coefficient:
-A measure of the correlation between two variables
-Ranges from -1 to 1
-Toward -1 means high-low (inverse relationship; negative correlation)
-Toward +1 means high-high (direct; positive correlation)
-Depending on value can use words like weak, moderate, strong to describe
Correlation Matrix
-Table of correlation coefficients between different pairs of variables
Different kinds of correlation coefficients used (depending on type of variables)
Type of correlation coefficient used / Var 1 / Var 2 / RemarksPearson’s Product-Moment (r) / Quantitative (interval or ratio); represents raw score / Quantitative (interval or ratio);
represents raw score
Spearman’s Rho / Quantitative but represents rank / Quantitative but represents rank
Kendall’s Tau / Quantitative but represents rank / Quantitative but represents rank / Similar to Rho but differs in treatment of ties (Tau does a better job)
Point Biserial / Raw Scores / Dichotomous (1 or 0)
Biserial Correlation / Raw Scores / Artificial Dichotomy
Phi / Dichotomous (1 or 0) / Dichotomous (1 or 0)
Tetrachoric / Artificial Dichotomy / Artificial Dichotomy
Cramer’s V / Nominal / Nominal / If only two categories, Cramers’ V identical to Phi
Warnings about correlation:
- Correlation is not equal to Cause
- Coefficient of determination (square of correlation coefficient)
- Suggests how much variability in either variable is explained by the other variable
- Outliers
- Linearity
- Needs to check whether the two variables have a linear relationship to begin with
- If relationship is curvilinear, Pearson’s correlation will underestimate the strength of the relationship
- Correlation and Independence
- The two instruments used to measure the variables should (ideally) be independent
- Hence, the correlation between them should be low
- Relationship Strength
- Choice of words still subjective; check raw data if necessary and possible