4
Analyses of Reaction Time Data
A. General Issues:
- only read in data that are used in the later analyses; in most cases, the output file contains many dependent variables that are of no interest
- give the variables variable names that end in numbers (e.g., rt1, rt2, …, rt60); this procedure allows the researcher to work with vectors (SPSS) or arrays (SAS) later on when groups of variables have to be recoded into new variables; the particular meaning of the numbers should be specified in the codebook
- in SPSS, it is necessary to sort the variables in the datafile (using the SAVE OUTFILE … /KEEP command) so that variables with the same names are next to each other in the data file
B. Delete participants who are outliers:
- create an average error score for each participant; if all correct responses have one value and all incorrect responses another value, it is possible to simply average across all responses; if if the correct response is sometimes one button ("Good") and sometimes the other ("Bad"), it is necessary to first create new response variables where all correct responses have one value and all incorrect responses another value
- create an average RT score for each participant by averaging across all RT's
- get univariate statistics for both average scores and plot them (histogram and/or stem-and-leaf)
- have the statistics software list the values of these two variables for all participants
- if one participant is suspect check in detail what is going on: (a) Was s/he obsessed by accuracy (which would result in exceptionally slow responses and an error rate of close to 0%)? (b) Was s/he obsessed by speed (which would result in exceptionally fast responses and a very high error rate) (c) Is his/her exceptionally short/long average response time due to small number of responses or was there a general tendency for him/her to respond fast/slowly? (d) Is there a problem with the data file, i.e., is his/her exceptionally short/long average response time due to many "weird" values (e.g., 0000, 9999) which are clearly failures of the computer program?
C. Find the appropriate transformation:
- data transformation is necessary because RT data are virtually always positively skewed; given that a normal distribution is one of ANOVA's assumptions one can only trust the results of the inferential statistics it the one has removed the positive skew prior to the analyses (removing the skew also increases statistical power); deletion of individual outlier responses is necessary because it is quite common that the participants either "fall asleep" or press the button box instead of the button on some trials; the issue of data transformation and the deletion of individual outlier responses is tricky one because both are related; transforming data changes some outliers into valid observations and deleting outliers affects the skew; the basic idea is to chose the appropriate transformation for the observations that are clearly no outliers then to do the outlier analysis on the transformed data that include all observations
- create a new data set (e.g., "newdata") where each response latency corresponds to a different observation (e.g., if 40 participants each produced 60 responses, the new data set has 2400 observations); the new data set should contain the following variables: observation number ("idrl", here: from 1 to 2400), participant number ("id", here: from 1 to 40), response number ("respnumb", here: from 1 to 60), the response ("resp", in its new form, i.e., "correct" or "incorrect", probably coded 0 and 1), the response latency ("rl", in msec)
- sort the data based on response latency; in SAS, have the stats software list the smallest 5% and the largest 10% of the observations (here: the first 120 and the last 240 observations)
- check for response latencies that are clearly due to failures of the computer program (e.g., 0000 and 9999) and change them into missing values; check for other inconsistencies
- do univariate statistics on the response latencies to get the mean, the standard deviation and observations at the extremes of the distribution
- create a new response latency variable (e.g., "nrl") which corresponds to the old response latency variable ("rl")
- in the new response latency variable, change the smallest 1% of the observations into missing values (here: the fastest 24 response latencies); if the cut-off point is below 300 msec change all observations below 300 msec into missing values
- change also the largest 5% of the observations into missing values (here: the slowest 120 response latencies); if the cut-off point is more than 4 standard deviations above the mean change all observations higher than 4 standard deviations above the mean into missing values
- create several new variables that correspond to the new response latency variable ("nrl") but that have been subjected to different transformations; the most common transformations are:
a) nv1 = (ov).5 = (square root transformation)
b) nv2 = ln (ov) (logarithmic transformation, natural logarithm)
c) nv3 = (ov)-.5 = (square root reciprocal transformation)
d) nv4 = (ov)-1 = 1/ov (reciprocal transformation)
e) nv5 = (ov)-2 = 1/(ov)2 (squared reciprocal transformation)
- in order to maintain the rank ordering of observations and for easier handling of the data it is suggested to linearly rescale the transformations (which does not affect their distributional properties):
a) nv1 = (ov).5 ( COMPUTE nrlt1 = nrl**.5 . )
b) nv2 = ln (ov) ( COMPUTE nrlt2 = LN(nrl) . )
c) nv3 = (((ov)-.5)*-1000)+60 ( COMPUTE nrlt3 = ((nrl**-.5)*-1000)+60 . )
d) nv4 = (((ov)-1)*-10000)+40 ( COMPUTE nrlt4 = ((nrl**-1)*-10000)+40 . )
e) nv5 = (((ov)-2)*-1000000)+20 ( COMPUTE nrlt5 = ((nrl**-2)*-1000000)+20 . )
- do univariate statistics and plot the newly created variables (histogram and/or stem-and-leaf) in order to find out which of the transformations yields the most normal distribution; decide on a distribution that has a slight positive skew
D. Delete observations that are outliers:
- create yet another new response latency variable ("newrl") that corresponds to the original response latency variable ("rl"), apply the chosen transformation, and do outlier analysis, i.e., change observations that are more than 3 standard deviations above and below the mean into missing values; if the mean is used as the indicator for central tendency within cells (see below), it is advised to apply a stricter exclusion criterion, e.g. to exclude all observations that are more than 2 standard deviations above and below the mean
- calculate the cut-off points and retransform them into the millisecond metric so that one can find out the true value of the cut-off points; check if these values "make sense": the lower cut-off point should be between 300 and 500 msec for fast tasks (word pronunciation, lexical decision) and between 400 and 700 msec for slow tasks (adjective evaluation, category inclusion); the upper limit should be between 1500 and 3000 msec for fast tasks and between 3000 and 6000 msec for slow tasks
- if there is a between-participants variable of theoretical interest check if the response latencies were affected by the experimental manipulation; if there is a doubt do a t-test
E. Check the error rates:
- create a new response variable ("newresp") that corresponds to the old response variable ("resp"), and change all observations that have been considered outliers in the previous step into missing values; then, calculate the overall error rate; error rates of 0% and error rates of 10% and more are suspect
- if there is a between-participants variable of theoretical interest check if the error rate is more or less the same in the different experimental conditions; if there is a doubt do a chi-square test
- create the "ultimate" response latency variable ("ultimrl") that corresponds to the previous response latency variable ("newresp") change all observations with incorrect responses into missing values
- at the end, check how many invalid observations (i.e., missing values) there are in the data set; calculate the percentage (which should be reported in the article); if 5% or less of the RT's have been deleted, everything is OK; if 10% or more of the observations have missing values there is a serious problem; if only very few observations (1% or less) have been changed to missing values consider adopting a stricter criterion for the exclusion of outliers
F. Calculating the dependent variables:
- go back to the original data set and change all RT's that are due to failures of the computer program, that are outliers, or that correspond to incorrect responses into missing values (work with arrays/vectors and conditional expressions!!)
- apply the appropriate transformation
- calculate the median response time for each of the within-participants experimental conditions (e.g., male primes, positive target adjectives); the median is considerably better than the mean which is used by most researchers because the median is less affected by outliers; many of the checking described above and below can be handled lightly if the median is used as the dependent variable; if the researcher decides to use the mean instead of the median, s/he should devote considerable effort to detecting outliers, especially if there are less than 8 observations per cell
- note that reported means should refer to raw latencies (in the millisecond metric) and not to transformed variables; it is suggested to do all the analyses on the transformed data and to transform them back into the millisecond metric when one wants to report the means (note however, that this is only possible when the dependent variables are raw latencies; when the dependent variables are derived from facilitation scores – i.e., difference scores – a different procedure has to be employed, see below)
- when response latencies are used as an individual difference measure (e.g., an implicit prejudice score) the individual difference score is calculated by subtracting the appropriate within-condition means from each other; for example, in a study examining implicit sexism of male participants, there may have been male and female primes (MP and FP) and positive and negative target adjectives (postarg and negtarg); the individual difference score is calculated as follows:
sexism = (RT MP_negtarg – RT MP_postarg) – (RT FP_negtarg – RT FP_postarg)
G. The inclusion of baseline ratings:
- baseline ratings are generally used to calculate facilitation scores; the question is to what extent the access to a particular target adjective (e.g., "lazy") is facilitated when it is preceded by a particular prime (e.g., "BLACKS") compared to when it is not primed or preceded by a neutral prime; note that negative facilitation scores represent inhibition
- note that facilitation scores are formed with untransformed response latencies that no longer contain outliers; however, the determination of the cut-off points for outliers implies that the data have been transformed previously; the analyses therefore consist of several steps: create new data sets, find an appropriate transformation, delete outliers, calculate facilitation scores on raw latencies, find the appropriate transformation for the facilitation scores and apply it (often there is no need to transform facilitation scores because they tend to be unskewed); those who prefer to use the mean (and not the median) of all observations coming from one cell, might consider doing another outlier analysis this time on the transformed facilitation scores
- when baseline ratings are simply derived from neutral primes (e.g., "XXXXX") the baseline RT's tend to be quite similar to those of the experimental trials and the outlier analyses are relatively simple; the baseline RT's should be included in the new data sets ("newdata1" and "newdata2") and the choice of the appropriate transformation and the deletion of outliers should be done with all response latencies
- when baseline ratings are substantially faster (or slower) than experimental ratings, as is the case in Fazio et al.'s (1995) Adjective Evaluation Task, baseline response latencies tend to possess quite different characteristics than experimental response latencies; this is a problem because different cut-off points for outliers may be different: it is suggested to create separate data sets for baseline response latencies and for experimental response latencies and to do the outlier analyses for each data set separately
- it is suggested to first calculate the facilitation scores, to then transform them (if necessary), and finally, to take the median (or the mean) within cells; it is not suggested to first transfer the data, to then take the median within cells, and finally, to subtract the medians from each other to form facilitation scores because the medians may be based on different adjectives