PRR 475 Math refresher and summary of basic SPSS stat procedures
Review these if you are having trouble with SPSS exercise.
1. Exponential or scientific notation:
5.3 * E -02 is .053 - stands for 5.3 * 10-2
Simply move decimal point n places to left if E -n and n places to right if E +n.
2. Mean of a variable measured as 0-1. Many variables are dichotomous or binary taking on only two values, e.g true-false, yes-no, male-female, participate-don't. These are usually coded as 0 for no and 1 for yes.
Such variables can be treated as any measurement scale so you can run frequencies, crosstabs or means on them. The mean of a 0-1 variable is simply the percent with the 1 response (yes).
Run both FREQ and DESC on a binary variable such as GENDER or MVP, or any activity variable (participate or not). Make sure you understand what each is telling you.
Example : GENDER
FREQPct
Male (0) .55
Female (1).45
MEAN for GENDER .45
Mean is the percentage of 1's in sample. In above example female was coded 1, so it is percentage of females.
3. Selected SPSS Statistical Procedures - runs on HCMA95.SAV file. You should try to replicate these results yourself and make sure you understand the interpretations. I've put those statistics you normally should report in bold on output tables. Also wrote brief interpretation.
a. FREQ - freq table
b. DESCRIPTIVES - means for interval scale variables
c. CORRELATION
d. CROSSTAB - income vs MVP95
e. MEANS HCMATOT (total visits for year) by MVP95 (annual permit)
f. Same run WEIGHTED
g. Independent Sample T-TEST
h. Another weighting example - FREQ on MVP95
i. SPSS Chart to show distribution of HCMATOT
Statistics in bold are usually the best ones to report.
a. FREQUENCIES ON INCOME
TOTAL HOUSEHOLD INCOME BEFORE TAXES
Frequency / Percent / Valid Percent / Cum PercentValid / UNDER $25,000 / 346 / 8.6 / 11.2 / 11.2
$25,000 TO $49,999 / 940 / 23.3 / 30.5 / 41.7
$50,000 TO $74,999 / 955 / 23.7 / 31.0 / 72.7
$75,000 OR MORE / 843 / 20.9 / 27.3 / 100.0
Total / 3084 / 76.5 / 100.0
Missing / CHOOSE NOT TO ANSWER / 592 / 14.7
System / 355 / 8.8
Total / 947 / 23.5
Total / 4031 / 100.0
Interpretation: Focus on the Valid pct column as the population estimates. In above case 11% have incomes under $25K, 27% $75K or more. Might report these four valid pcts in a simple pie chart to show income distribution. Should know how to do this in Excel or could try SPSS charts.
Raw freq are number in sample - not that useful. Can simply report sample size is 4,031 and 23% of sample didn't give income or checked "choose not to answer". Distribution is based on 3,084 subjects who gave their income.
b. DESCRIPTIVES ON AGE - ask for SE Mean
Descriptive Statistics
N / Minimum / Maximum / Mean / Std. DeviationStatistic / Statistic / Statistic / Statistic / Std. Error / Statistic
AGE OF SUBJECT / 3650 / 16 / 89 / 44.01 / .24 / 14.75
Valid N (listwise) / 3650
Interpretation: Average age of visitors is 44. If you want can note this is based on sample of 3,650 cases and 95% confidence interval is 44 + or - .48 (two standard errors) or roughly (43.5, 44.5).
c. CORRELATIONS, PEARSON - ask for Pearson Corr
Correlations
PURCHASE AN ANNUAL MVP 95 / Total females in party / Total males in party / AGE OF SUBJECTMVP 95 / Pearson Correlation / 1.000 / .312 / -.150 / .216
Sig. (2-tailed) / . / .000 / .000 / .000
N / 3842 / 3176 / 3737 / 3524
Females / Pearson Correlation / .312 / 1.000 / -.118 / .132
Sig. (2-tailed) / .000 / . / .000 / .000
N / 3176 / 3329 / 3210 / 3020
Males / Pearson Correlation / -.150 / -.118 / 1.000 / -.129
Sig. (2-tailed) / .000 / .000 / . / .000
N / 3737 / 3210 / 3893 / 3549
AGE OF SUBJECT / Pearson Correlation / .216 / .132 / -.129 / 1.000
Sig. (2-tailed) / .000 / .000 / .000 / .
N / 3524 / 3020 / 3549 / 3650
** Correlation is significant at the 0.01 level (2-tailed).
Interpretation: The table shows all bivariate correlations for 4 variables. I've picked just the age vs MVP95 correlation. Correlation is .216 which is mild positive correlation, meaning those with annual permits (coded 1 vs 0 if not) tend to be older. Correlation is significant at the 95% confidence level (SIG <.000) - this means we can reject null hypothesis that correlation in the population is zero.
d. CROSSTAB - ask for Row and Col Pcts and Chi square
TOTAL HOUSEHOLD INCOME BEFORE TAXES * DID YOU PURCHASE AN ANNUAL MOTOR VEHICLE PERMIT IN 1995? Crosstabulation
TOTAL HOUSEHOLD INCOME BEFORE TAXES / MVP95NO / YES
< $25,000 / Count / 151 / 178 / 329
% within INCOME / 45.9% / 54.1% / 100.0%
% within MVP95 / 11.7% / 10.6% / 11.1%
$25- 49.9K / Count / 405 / 510 / 915
% within INCOME / 44.3% / 55.7% / 100.0%
% within MVP95 / 31.4% / 30.3% / 30.8%
$50KTO 74.9 / Count / 397 / 513 / 910
% within INCOME / 43.6% / 56.4% / 100.0%
% within MVP95 / 30.8% / 30.5% / 30.6%
$75,000 + / Count / 337 / 481 / 818
% within INCOME / 41.2% / 58.8% / 100.0%
% within MVP95 / 26.1% / 28.6% / 27.5%
Total / Count / 1290 / 1682 / 2972
% within INCOME / 43.4% / 56.6% / 100.0%
Chi-Square Tests
Value / df / Asymp. Sig. (2-sided)Pearson Chi-Square / 2.745 / 3 / .433
Likelihood Ratio / 2.749 / 3 / .432
Linear-by-Linear Association / 2.547 / 1 / .110
N of Valid Cases / 2972
a 0 cells (.0%) have expected count less than 5. The minimum expected count is 142.80.
Interpretation: First - stat test. Chi square statistic of table of income vs MVP95 is 2.745, not significant at 95% confidence level (SIG =.433 which is not less than .05). So conclusion is no relationship, MVP purchase not vary with income level. Note- a better stat is to run means as I do below. If there is a significant relationship, you should look for the pattern in the table by comparing row percentages with those at bottom in totals row. Issue is does each row look the same as totals? If so no difference in patterns across income levels.
e. MEANS HCMATOT by MVP95 - ask for ANOVA table and SE Mean - First Unweighted (*wrong results)
Report
MVP95 by hcmatot - Means on total HCMA Visits in 1995
MVP95 / Mean / N / Std. Deviation / Std. Error of MeanNO / 17.39 / 1330 / 39.21 / 1.08
YES / 69.00 / 1846 / 96.01 / 2.23
Total / 47.39 / 3176 / 81.54 / 1.45
ANOVA Table
Sum of Squares / df / Mean Square / F / Sig.MVP95 by HCMATOT / Between Groups / (Combined) / 2059325.526 / 1 / 2059325.526 / 343.087 / .000
Within Groups / 19051413.239 / 3174 / 6002.336
Total / 21110738.765 / 3175
Interpretation: Visitors who did not purchase a MVP last year averaged 17 visits to HCMA parks in 1995 compared to 69 days for visitors with annual permits. Based on F test, this difference is statistically significant at 99% level (SIG <.001). Overall visitors averaged 47 visits to HCMA parks in 1995. You can use standard errors above to compute 95% confidence levels for the subgroup means. I.e. NO group averages 17 visits plus or minus 2 (15,19) and yes group averages 69 plus or minus 4 (65,73). Overall meanis 47 plus or minus 3 visits or between 44 and 50 visits.
f. Means weighted to represent visitors vs visits
Should find it odd that daily permit holders make 17 visits - surely more cost effective to buy permit if this many visits. Also 69 per year seems like a lot even for annual permit holders. Always ask yourself if results make sense and if not, see if something is wrong. Problem here is that cases represent visits NOT visitors and days of use per year is a characteristic of visitors (households) NOT individual Visits.
Solution is to apply weights to adjust sample to one of visitors not visits.
CORRECT procedure is to Weight cases by VISTOR2 variable - this weights cases inversely to number of visits so the sample represents people (visitors) not visits (entries to a park). To weight cases in SPSS, you go to DATA section of menu and under Weighting choose the VISITOR2 variable as weighting variable. Now rerun above procedure.
STAT TEST BASICALLY GIVES SAME RESULT AS UNWEIGHTED, BUT.NOTICE HOW DIFFERENT MEANS ARE. Average visits of 4.6 and 20.7 make more sense. Overall mean is 9.3.
MEANS
Report
HCMATOT - total visits in 95 to HCMA parks
MVP95 / Mean / N / Std. DeviationNO / 4.62 / 2230 / 10.11
YES / 20.67 / 927 / 49.64
Total / 9.33 / 3157 / 29.13
ANOVA Table
Sum of Squares / df / Mean Square / F / Sig.HCMATOT* MVP95 / Between Groups / (Combined) / 168803.095 / 1 / 168803.095 / 212.233 / .000
Within Groups / 2509387.245 / 3155 / 795.368
Total / 2678190.339 / 3156
g. Independent Samples Test - This is another way of comparing two means. I recommend using SPSS Compare Means and asking for ANOVA test (Options) rather than using this procedure, but for the curious, here is brief explanation of independent samples T-test. Test is more involved as it lets you assume variances of two groups are the same or different. (This one is still unwtd). Here's how it works:
First test to see if variances of two groups are the same using Levene's test. If first SIG<.05 you reject equal variances and use second row for t-test. If SIG >.05 you accept equal variance assumption and use first row for t-test. Usually doesn't make any difference. For comparing means you use the t -test significance (in bold). Here we reject null that two means are same in either case. The 95% confidence intervals reported below are of difference in two means - fail to reject null (means the same or difference =0 ) if confidence interval includes 0. In this case it doesn't.
Group Statistics
MVP95 / N / Mean / Std. Deviation / Std. Error MeanHCMATOT / NO / 1330 / 17.39 / 39.21 / 1.08
YES / 1846 / 69.00 / 96.01 / 2.23
HCMATOT / Levene's Test for Equality of Variances / t-test for Equality of Means
F / Sig. / t / df / Sig. (2-tailed) / Mean Difference / Std. Error Difference
Equal variances assumed / 433.415 / .000 / -18.523 / 3174 / .000 / -51.61 / 2.79
Equal variances not assumed / -20.813 / 2604.193 / .000 / -51.61 / 2.48
95% Confidence Interval of the Difference
Lower / Upper
-57.08 / -46.15
-56.48 / -46.75
h. Another weighting example : Here's another comparison of weighted vs unweighted to estimate percentage of visitors buying an annual MVP in 1995.
Weighted by VISITOR2 - FREQ on MVP95
DID YOU PURCHASE AN ANNUAL MOTOR VEHICLE PERMIT IN 1995?
Frequency / Percent / Valid Percent / Cumulative PercentValid / NO / 2606 / 64.7 / 67.4 / 67.4
YES / 1259 / 31.2 / 32.6 / 100.0
Total / 3866 / 95.9 / 100.0
Missing / System / 165 / 4.1
Total / 4031 / 100.0
Unweighted - FREQ on MVP95
DID YOU PURCHASE AN ANNUAL MOTOR VEHICLE PERMIT IN 1995?
Frequency / Percent / Valid Percent / Cumulative PercentValid / NO / 1682 / 41.7 / 43.8 / 43.8
YES / 2160 / 53.6 / 56.2 / 100.0
Total / 3842 / 95.3 / 100.0
Missing / System / 189 / 4.7
Total / 4031 / 100.0
Correct results are weighted by VSITORWT2. A third of HCMA visitors (32.6%) buy an annual permit. They account for 56% of visits (entries).
i. Graph of HCMATOT, weighted by VSTORWT2 and filter out cases >80.
Descriptive Statistics - filter at 80 visits per year (excludes large outliers)
N / Minimum / Maximum / Mean / Std. DeviationStatistic / Statistic / Statistic / Statistic / Std. Error / Statistic
HCMATOT / 3241 / 0 / 80 / 6.80 / .19 / 10.74
Descriptive Statistics - all cases
N / Minimum / Maximum / Mean / Std. DeviationStatistic / Statistic / Statistic / Statistic / Std. Error / Statistic
HCMATOT / 3286 / 0 / 950 / 9.32 / .50 / 28.90
Valid N (listwise) / 3286