PSY 395 Lab #2
Norms and Reliability

Norms

·  Psychological tests are usually relative – we compare one score or groups of scores to another.

·  John is more agreeable than Sam

·  Females tend to have higher verbal ability scores than do males.

·  Norm-based interpretation - when a person’s test score is interpreted by comparing it to the scores of a specific group of people. Many different kinds of groups can be used for this comparison (e.g. the U.S. population, all high school seniors in Wyoming, an MSU psychology class)

Norms are

·  Based on the means and standard deviations (descriptive statistics) of the sample used to represent the comparison group.

·  Expressed as raw scores, standard scores (e.g. z-scores, T-scores), or percentiles.

To calculate norms, we use standard scores which transform a raw score into values that help to compare people’s scores across different scales. For instance, person A’s mechanical knowledge test may have scores that range from 0-200, and person B may have a similar test whose scores range from 0-50. Standardizing helps to make Person A’s and Person B’s scores comparable to one another by putting them on the same type of scale.

Examples of standard scores:

z-score – Expresses the value of a raw score on the standard normal distribution. Raw scores get transformed into z-scores using a formula.

The mean score always has z-score = 0. Scores one standard deviation above the mean = z-score of +1; one standard deviation below the mean = z-score of –1; two standard deviations above the mean = z-score of +2, etc. (So Person A might have a z-score of +1 on a measure of anxiety, and Person B might have a z-score of -1.5 on a different measure of anxiety, but these two scores can usefully be compared with one another, at least to the extent the anxiety measures are measuring pretty much the same thing.)

T-score – Another way to express z-scores, but T-scores use only positive numbers: The mean is always assigned a T-score of 50, and for each standard deviation away from the mean you add or subtract 10 from 50. Scores that are one standard deviation above the mean = T score of 60, one standard deviation below the mean = T score of 40, two standard deviations above the mean = T score of 70, etc.

T = z*10 + 50

We can also express norms in terms of percentiles: the score at or below which a specified percentage of scores in a distribution falls. There are percentiles that correspond to the mean and standard deviations of a distribution

-  Here are example z, t, and percentiles for the normal distribution. These are always the same no matter what the raw scores were – so long as the data are normally distributed:

z / -3 / -2 / -1 / 0 / +1 / +2 / +3
T / 20 / 30 / 40 / 50 / 60 / 70 / 80
% / 1% / 2% / 16% / 50% / 84% / 98% / 99%

Accessing Data

Open Internet Explorer (DO NOT use Netscape)

Go to class website

Click on Lab 2 Materials

Click on the IPIP-self dataset

When it asks you if you want to save or open this file, click SAVE TO DISK

Another window will open, be sure to browse to wherever you want to save the file (AFS space, floppy disk – SAVE YOUR WORK IN TWO PLACES, TWO DISKS OR AFS + DISK ETC.)

When it’s done saving, click OPEN – this will open the data in SPSS
Plotting Norms on National Norm Chart

1)  Need to find your raw score for each scale in the data file.

Find the last five digits of your PID in the data file. Scroll to the right to find your composite scale scores for each scales (‘neuro’= neuroticism scale, ‘extra’ = extraversion scale, etc.)

2)  Need a national norm chart sheet

3)  Plot your raw scores on the national chart (for both genders).

4)  How many standard deviations are each of your scores from the mean? (Calculate your z scores for norms on your gender).

5)  What are the T-scores for each of your scores? (Convert the z's from #4 above to T).

6)  At what percentile do you fall for each of the scales? (use the z table to figure this out).


Plotting Norms on Class Norm Chart

1)  Need to find the norms for our class, which are?

a) Use SPSS, go to Analyze ® Descriptive Statistics ® Descriptives

b) Put each of the IPIP scale variables into the ‘variables’ box

c) Hit ‘OK’

2)  Use means and standard deviations to create a class norm chart

For example: Suppose the mean of the “neuroticism” scale was 28.46 with a standard deviation of 7.91

28.46 + 7.91= 36.37 – one standard deviation above the mean

28.46 + 7.91+ 7.91= 44.28 – two standard deviations above the mean

28.46 – 7.91= 20.55 – one standard deviation below the mean

28.46 – 7.91– 7.91= 12.64 – two standard deviations below the mean

3)  Plot your raw scores on the chart

4)  How many standard deviations are each of your scores from the mean?

5)  What are the T-scores for each of your scores?

6)  At what percentile do you fall for each of the scales?


Reliability

What is reliability: how consistent a test score is. This is the first requirement for good measurement. For a test to be an adequate measure of an attribute, it must be reliable – it must be consistent.

Different types of reliability estimates:

- test-retest – consistency of test scores when the same test is taken at two different times – consistency of scores over time. We find this by correlating the score of the test taken at time 1 with the score of the test taken at time 2.

- internal consistency (alpha) – consistency of a test across its items. Basically assesses if all the items of a test are measuring the same thing, if they are consistent with each other. It is the average correlation of all of the test items, but also corrected for (increased by) the length of the test.


- interobserver (or interrater) – consistency of a measure across different raters (different people completing the measure). It is the correlation between test score of rater 1 and test score of rater 2. This is not interrater agreement (e.g., like kappa). How is reliability different from agreement?

- Why so many different types of reliability?


Acceptable Levels of Reliability

Reliability estimates are like correlations, and the highest correlation = 1, so the closer a reliability estimate is to 1 the higher it is.

Usually, reliability estimates from .8 to 1.0 indicate high to moderate reliability, from .7 to .8 indicate moderate reliability, and under .7 indicate low reliability.

A reliability estimate of .8 indicates that 20% of the variability in the test scores is due to measurement error (because 1 - .80 = .20). A reliability estimate of .5 means that true scores and error have equal effects on test scores (because 1 - .50 = .50), which is not desirable.

Remember that the acceptable level of reliability also depends on what the test is used for. For example, for making rough preliminary decisions or for screening purposes (e.g., on the MMPI eliminate the lowest 1% applying to be postal-service employees), lower reliability levels might be okay.

Split Half Reliability and Spearman-Brown Correction

One way to assess reliability of your measure is to perform a split-half reliability test. To do this, you divide the test into two halves. Then, you correlate performance on one half of the test with performance on the other half of the test.

To the extent that the correlation coefficient is high (closer to 1), this indicates that your test is reliable. To the extent that the correlation coefficient is low (closer to 0), this is indication that your test is unreliable.

In the data set, we have split the conscientiousness scale into two halves (Consc1 & Consc2). One score is for one half of the items, and the other score is for the other half of the items. To assess the split-half reliability, we will need to correlate responses on one half with responses on the other.


In SPSS:

Go to Analyze - Correlate - Bivariate. Highlight the two halves of the conscientiousness scale and select them over to the box on the right. Select OK.

The correlation coefficient, r, should give you the value for split-half reliability.

What is it?

However, this value is the reliability coefficient for only half of the items. If we want the reliability coefficient for the whole test, we need to apply the Spearman-Brown Correction Formula:

Use the formula to calculate what the reliability would be for the full conscientiousness scale. Compare this to alpha when we calculate it below.
Internal Consistency Reliability

Coefficient alpha

Alpha compares every item to every other item – it looks at the consistency across the items in a particular scale, so we use individual items in the computation (not scale scores, which are sums of items scores). Notice the alpha of .63 is higher than any item correlation in the matrix, which is generally true because scales are more reliable than individual items.

Part of your homework assignment for this week is to calculate coefficient alpha for all 5 IPIP scales.


Calculating Alpha

Using SPSS, go to ‘Analyze’® ‘Scale’ ® ‘Reliability analysis’

For each IPIP scale, put the items (not the scale scores) for each particular scale in the ‘items’ box (so, you’ll do this five separate times, once for each scale) – there are 20 items for each scale

Example: n1, n6, n11, n16, n21, etc. until you’ve entered all “n” items. Make sure you use the reversed scored items where appropriate.

After entering the items for a scale, hit ‘OK’

You’ll do this 5 times, one time for each of the IPIP scales.

Homework #2[Ask your TA what to turn inand what to e-mail]

Norms

1.  National norms chart with z, T, and % reported for each scale.

2.  Class norms chart with z, T, and % reported for each scale.

3.  Pick one IPIP scale. Describe in 3-4 sentences why your scores differ when comparing the class norms to the national norms.

Reliability

4.  Report alphas for all 5 IPIP scales.

5.  Answer the following questions: What is alpha if items in a scale are completely uncorrelated? What happens to alpha if you kept adding items to the scale that correlated positively with the other items?

6.  Answer the following questions: What do you conclude if you have a high alpha reliability coefficient? What do you conclude if alpha is low? (Optional question for thought: Can a test still be reliable if alpha is low?)

PSY 395 Lab 2

Pg. 15