Correlation Assignment

Your assignment is to measure the correlation, and produce a scatterplot, that shows the relationship between any two variables. The data need to be real rather than made up, I’d like you to use a minimum of 20 x,y pairs, and I’d like for there to be a question underlying the correlation that makes some kind of sense. The question does not need to be earth shattering and it does not matter whether to two variables end up correlating or not. You can get your data anywhere you want, but there is a ton of stuff on the web. One especially rich source of information is the CIA World Fact Book (https://www.cia.gov/library/publications/the-world-factbook/). Toward the top right you’ll find a pull-down list with every country on the planet. There’s all manner of information about each country; e.g., under ‘economy’ there is GDP, GDP per capita, purchasing power, poverty rate, etc. Under ‘people’ there’s all kinds of interesting information, such as literacy rate, birth rate, life expectancy, infant mortality, typical education levels (total and separately for males and females – “school life expectancy” as the CIA calls it). I pick these just as examples, but there are all kinds of interesting questions you could address with the data on this site. As a second example, just to give the idea of what I’d like you to do, it’s very easy to find data on how much professional athletes are paid (e.g., google ‘mlb team salaries’, without the quote marks). Do teams with high payrolls tend to win more games? Another sports example: a few years ago a student measured the correlation between nations’ GDP and the number of medals won by those nations at the summer Olympics. The findings were quite interesting.

What to turn in: (1) the Pearson r (see below), (2) a scatterplot (see below), (3) a brief explanation of what you conclude from your correlational study – a sentence or two will do, and (4) tell me where the data came from. Notes: (1) put your name on the dang thing, upper left is best, (2) if you turn in more than one page please staple the pages together, or use epoxy, or chicken wire, or a loop of clothes line, or put a nail through it – anything but loose pieces of paper, (3) if you can’t figure out how to make Excel put the axis labels on your scatterplot, just grab a pen and put them in by hand; I’m not that concerned with aesthetics. Important thing about your scatterplots: The convention is to plot the variable you’re trying to predict on the y axis and the predictor variable on the x. For example, if you’re measuring the relationship between students’ SAT scores and their college grade-point average, SAT scores go on the x (horizontal) and GPAs on the y.

Grading: pass/fail. If you’ve made a good faith effort (so far everyone has) and there’s something wrong with it I’ll ask you to fix the problem and resubmit it. If you don’t turn it in at all you’ll lose two points from your final grade.

______

How to Compute a Pearson r with MS Excel

1.  Enter your data into columns A and B; i.e., x values in A, y values in B. Let’s assume you have 20 paired values, meaning 2 columns, 20 rows.

2.  Go to any free cell on the worksheet and type:

=pearson(a1:a20,b1:b20)

The correlation coefficient will appear in that cell.

Note: This assumes you have 20 rows. If you have 30 rows, the command will look like this:

=pearson(a1:a30,b1:b30)

Excel will also draw the scatter plot:

1.  Go to Insert/Chart and pick XY Scatter, then Next.

2.  Under data range, type “=a1:a20,b1:b20” (again, assuming you have 20 rows).

3.  Under “Series in” pick columns.

4.  Hit ‘Next’ to label the axes and whatnot

5.  Hit Finish


A simple example is attached.

Note: I created these instructions with whatever version of Excel was current at the time. Microsoft does like to pointlessly change things when they release new versions, so it’s possible that you’ll have to do some digging around if your version doesn’t work exactly the way I’ve described it.

0 / 22 /
10 / 34
20 / 45
30 / 59
40 / 55
50 / 82
60 / 80
70 / 99
80 / 100
90 / 95
100 / 133
r = / 0.96984