LSP 121
Activity 3
Introduction to Correlation
Learning Goals for this Activity
- You will see how correlation can help explain relationships.
- You will begin to understand the difference between correlation and causality.
1. Using Excel, open the file StateSATs.xls (from the QRC website in the folder OlderData at the bottom of the page) which contains data on average SAT scores and the percent taking the SAT in each state in the US.
a.Sort the data to find which states have the highest average SAT scores and which have the lowest. List the three highest states and the three lowest states.
b.Make a XY-scatterplot of the average score and the percentage of students taking the test. Add a linear trendline and find the R-squared value. Paste this graph into your Word document.
c.Write a short paragraph describing the relationship between average SAT score and percentage student taking the test. Include a reasonable explanation for the type of correlation that is apparent.
d.How does Illinois compare in average SAT score? In percent taking the SAT? Make a conjecture why so few Illinois students take the SAT. How would go about testing your conjecture?
2.Using Excel, open the file TVLifeExpectancy.xls, which contains data on life expectancy and the number ofTV'sper person in selected countries.
a.Make a scatterplot of the data; adjust the scale so that 40 is the minimum on the y-axis. Add a linear trendline and find the R-squared value. Paste this graph into your Word document.
b.Is there a correlation between life expectancy and number of TV's per person?
c.Can we infer from the data that TV's promote (or cause) longevity? Can you name some common underlying causes for both longevity and higher rates of televisions per capita?
3. Using Excel, open the file Nielsen.xls (in the OlderData file), a file derived from data in the 1994 World Almanac and Book of Facts on the Nielsen TV ratings for the favorite syndicated programs for 1992-93. Determine the R-squared value for each of the following pairs. Note that you don't have to graph each one. You can type =RSQ(A7:A26, B7:B26) to calculate the R-squared for the data in columns A7 to A26 and B7 to B26.
- Women and Men
- Women and Teenagers
- Women and Children
- Men and Teenagers
- Men and Children
- Teenagers and Children
In a short well-written paragraph, explain what can be concluded from these correlations and absences of correlation.
4. This last exercise will use SPSS for calculating correlation. But before we import an Excel file into SPSS, let’s add a column to our Excel table. Using Excel, open the file CityCrimeRates.xls, which contains data on the crime rates (total and violent) and population. Add a new column – Population Density – which is calculated by dividing Population by Land Area. Save this Excel file to My Documents or the Desktop. Then run SPSS and import the Excel file. Give the variables appropriate names and change the necessary values from String to Numeric (with appropriate decimal places).
Determine the correlation coefficient for each of the following pairs of variables:
- Crime Rate and Population
- Violent Crime Rate and Population
- Crime Rate and Population Density (Number of people per square mile)
Paste the results of each correlation into your Word document. In a short well-written paragraph, explain what can be concluded from these correlations and absences of correlation.