Chapter 14 – Data and Information Analysis (pp. 348-387)

Overall teaching objective: To introduce undergraduate criminal justice research method students to various quantitative and qualitative analytical techniques and to demonstrate their applications.

  • Analysis is a rather useful word. It can mean some form of examination, study, investigation, scrutiny, and/or testing.
  • In research, we use the term ‘analysis’ to describe the process by which researchers evaluate the data they gather and formulate an answer to their research questions or hypotheses.
  • Analysis occurs at the end of the research process, after we have conducted our literature review, designed our research method and collected our data.
  • But our planning of the analysis phase of research should start at the very beginning of the research process. Without an eye on how we plan to analyze our data, we cannot develop an effective research design. Indeed, concerns about analysis underlie the entire research process since analysis is the essential task in research. Here is an example.
  • The purpose of this chapter is to describe the tools that researchers use to analyze data.
  • The chapter is divided into two parts.
  • The first part focuses on the analysis of quantitative data.
  • The second part focuses on the analysis of qualitative data.
  • The analytical tools used by quantitative and qualitative researchers are different because the data these researchers collect is different. But generally speaking, analysis is analysis.
  • In both quantitative and qualitative research, considerations about analysis should be made during the design phase of the research process to ensure that the proper analysis can take place.
  • During the analysis phase, researchers evaluate the data they gather to answer their research questions or hypotheses. Even though analysis occurs near the end of the research process, considerations of analysis should occur earlier in the research process.

Making Research Real 14.1 – Anticipating Analysis (p. 348)

  • A high school counselor and school resource officer are interested in the relationship (correlation) between illegal drug use and juvenile delinquency.
  • They decide to distribute a survey that, among other things asks, do you use drugs (yes or no) and before your 18th birthday did you violate the law (yes or no)
  • The most accurate way measure correlation is with the Pearson r statistic.
  • Unfortunately for our researchers this option is not available because this statistical technique requires interval or ratio level data. Their data is nominal.
  • Had the researchers considered the level of data they needed to perform this analysis (and answer their research question) earlier then they could have revised their questions and response sets so that they yielded interval or ratio level data.

Quantitative Data Analysis (p. 349)

  • Statistics summarize large amounts of data into a single number and enable us to communicate information efficiently.
  • There are two general types of statistics;
  • descriptive statistics and
  • inferential statistics.

Descriptive Statistics (p. 350)

  • Descriptive statistics describe the current state of something. An important set of descriptive statistics are known as the measures of central tendency. These measures include the mean, median, and mode.

Measures of central tendency (mean, median, mode) (p. 350)

  • The mean is calculated by adding together all of the values for a particular variable and dividing that sum by the total number of cases. Although it is a good measure of central tendency, it is sensitive to extreme values, or outliers.
  • The median is referred to as the middlemost value because it is the value that is situated in the middle, with half the cases equal to or greater than and half the cases equal to or lesser than this value. It is less susceptible to extreme values or outliers than the mean.
  • The mode is the most frequently occurring value in a population or sample. Like the median, the mode is less susceptible to extreme values or outliers than the mean.
  • The decision about which measure of central tendency to use should be based on two factors;
  • whether the data are skewed toward extreme scores, and
  • what level the variables are measured at.

Table 14.1 - Level of measurement andmeasures of central tendency. (p. 355)

Level of
Measurement / Measure of
Central Tendency / Example
Nominal / Mode / With the exception of the driver, the most frequently injured occupant in a vehicle crash (the mode) is the person in the front passenger seat.
Ordinal / Mode, Median / A review of the top scores in the Sergeant’s Promotional Exams indicates that the patrol officers who most frequently place near the top of the list (the mode) are those with 10 or more years of experience.
Of the eleven officers that took the latest promotional exam, six scored 95 percent or higher and six scored 95 percent or lower, meaning that 95 percent is the median score among these test takers. According to this department’s promotional policy, only the six officers who scored 95 percent are higher are eligible for further promotional consideration.
Interval/Ratio / Mode, Median, Mean / The most frequently occurring age (the mode) at which juveniles begin offending is 12 years of age.
The median age at which juveniles begin offending is 12 years of age. Half of all juvenile offenders begin offending at 12 years of age or younger; half begin offending at 12 years of age or older.
The average (mean) age at which juveniles begin offending is 12 years old.

Variability (range, standard deviation, percentages, percentiles, percent change) (p. 356)

  • Measures of variability are descriptive statistics that tell us how much variation exists within a sample or population.
  • Among the measures of variability is the range, which is the difference between the highest and lowest value in a sample or population. This descriptive statistic, like the mean, is susceptible to extreme scores or outliers.
  • The range, which is the difference between the highest and lowest value in a sample or population. The range is computed by subtracting the smallest value from the largest value.
  • The standard deviation is a descriptive statistic that describes how much variability exist within a sample or population. Because the standard deviation considers both the mean and the total number of cases in the sample or population, it is a much more stable statistic than the range.
  • A percentage is a descriptive statistic that describes a portion of a sample or population. Percentages are calculated by dividing the number of like cases by the total number of cases, then multiplying that quotient by 100.
  • A percentile is a statistic that tells us where a value ranks within a distribution. Sometimes this is referred to as the percentile rank. We calculate the percentile rank by dividing the number of cases below the value by the total number of cases and then multiplying that quotient by 100.
  • Percent change is a descriptive statistic that indicates how much something changed from one time to the next. We calculate the percent change by subtracting the original number from the new number, dividing that difference by the original number and then multiplying that quotient by 100.
  • Rates are a descriptive statistic that enable us to compare similar behaviors across multiple locations. Rates factor in population size and report incidents per n units.

The normal distribution (p. 361)

  • In normally distributed data, the mean, median and mode are equal because all of the data are distributed equally around the same value.
  • In a normal distribution;
  • 68.2 percent of all cases fall within one standard deviation of the mean,
  • 95.4 percent of all cases fall within two standard deviations of the mean, and
  • 99.9 percent of all cases fall within three standard deviations of the mean.

Making Research Real 14.2 – How Intelligent are the Inmates in our System?(p. 362)

  • Using what he knows about normally distributed data, a researcher evaluates the results of intelligence tests administered to inmates.
  • Using this information enables a correctional administrator to make more appropriate assignments (i.e. to programs) for inmates.

Inferential Statistics (p. 362)

  • Inferential statistics enable analysts to determine the probability of certain outcomes. When reading inferential statistics, we are concerned with statistical significance, which is a measure of the probability that the statistic is due to chance. If the statistical significance of a statistic is .05 or less, we can conclude that the results are not due to chance.

Statistical significance (p. 363)

  • Statistical significance is a measure of the probability that the statistic is due to chance. As a general rule, if the statistical significance of a statistic is .05 or less, we can conclude that the results are not due to chance. The .05 level of statistical significance means that there is a 5 in 100 chance that the results are due to pure chance.

t-tests (p. 363)

  • The t-test is a statistical technique used to determine whether or not two groups are different with respect to a single variable. T-tests can only be run using interval or ratio level data. If the statistical significance of the t-score is .05 or less, it can be concluded that the difference between the two groups is not due to chance.

Making Research Real 14.3 – Improving the Self Esteem of Juvenile Offenders

  • In this research a juvenile probation officer compares the results of an experiment that tested the effect of a new program designed to improve the self esteem of probationers in her caseload.
  • She uses a t-test to determine whether the difference in self esteem scores between the experimental and control groups (post treatment) are statistically significant (i.e. not due to chance)

Analysis of variance (ANOVA)(p. 363)

  • The analysis of variance (ANOVA) model allows analysts to compare two or more groups to see if they are different with respect to a single variable measured at the interval or ratio level.
  • An ANOVA produces an F-ratio statistic. If the statistical significance of the F-ratio is .05 or less, it can be concluded that the difference between at least two of the groups is not due to chance.

Making Research Real 14.4 – Improving the Self Esteem of Juvenile Offenders – Part II (p. 364)

  • In this research a juvenile probation officer compares the results of an experiment that tested the effect of a new program designed to improve the self esteem of probationers in her case load.
  • In this case she divided her case load into three groups.
  • She used an ANOVA to determine whether the difference in self esteem scores between the three groups (post treatment) are statistically significant (i.e. not due to chance)

Chi Square (p. 365)

  • The Chi Square test is used to determine whether there is a statistically significant difference between what we expect to happen and what actually happens.
  • The operative statistic is called the chi-square statistic.
  • If the statistical significance of the chi square statistic is .05 or less, it can be concluded that the difference between what actually happened and what was expected to happen was not due to chance.

Making Research Real 14.5 – Profiling at the Airport (p. 365)

  • In this research a researcher attempts to determine whether racial or ethnic minorities (particularly Muslim appearing travelers) are selected for more invasive searches at airports.
  • The researcher calculates the percentage of each racial and ethnic group that comes through the security gate.
  • The researcher then compares these baseline figures (what is supposed to happen) with the population of individuals (by race or ethnicity) what were actually searched (what actually happened)
  • The researcher concluded that Muslim appearing individuals were searched more frequently.

Pearson r (p. 367)

  • The Pearson r is used to determine whether two variables measured at the interval or ratio level are correlated.
  • The Pearson r coefficient ranges from -1 to +1. The closer it is to -1 or +1, the higher the level of correlation between the two variables.
  • Positive Pearson r coefficients indicate a positive correlation.
  • Negative Pearson r coefficients indicate a negative correlation.

Making Research Real 14.6 – Keeping Kids Involved (p. 368)

  • This research attempts to determine the relationship (correlation) between participation in extracurricular activities, illegal drug use and grades.
  • The researchers determine that there is some correlation, however in some cases the relationships are weak.

Spearman rho (p. 370)

  • The Spearman rho statistic is similar to the Pearson r, but it indicates the level of correlation between variables measured at the ordinal level and ranges from -.80 to +.80.

Multiple regression (p. 370)

  • Multiple regression enables the analyst to measure the individual and combined effects of various independent variables on a dependent variable. A multiple regression requires data collected at the interval or ratio levels.

Making Research Real 14.7 – Keeping Kids Involved – Part II (p. 371)

  • In a continuation of the study described in Making Research Real 14.6, researchers attempt to determine if grade point average can actually be predicted by involvement in extracurricular activities and drug use.
  • The researchers conclude that in school activities do have an effect on grades while out of school activities do not.

Selecting an appropriate inferential statistical technique (p. 372)

  • The decision as to which inferential statistical technique to use depends on the level at which the data are measured and the type of hypothesis that the study is testing.

Table 14.2 - Commonly used inferential statistical techniques. (p. 373)

Level of measurement / Type of hypothesis / Appropriate statistical technique
Association / NA
Nominal
Difference / Chi-Square
Association / Spearman rho
Ordinal
Difference / NA
Association / Pearson r (without prediction)
Regression (with prediction)
Interval/Ratio
Difference / t-test (two groups)
ANOVA (three or more groups)

Qualitative Data Analysis (p. 373)

  • Qualitative researchers focus more on analyzing words than they do numbers; they attempt to explain the ‘how’ and ‘why’ of social processes.

Making Research Real 14.8 – It Wasn’t What He Said; It Was How He Said It! (p. 373)

  • A highway patrolman gets in trouble for telling a motorist to “Have a nice day”
  • Objectively, this comment seems benign, even nice.
  • But, the way the trooper said appears to anger the woman.
  • In the end the lesson is not in what is said, but in how it is said, which is an important part of qualitative analysis.

Transcription (p. 374)

  • The process of producing a written transcript of interviews that have been video- or audio-taped is known as transcription.
  • These transcripts provide the written data that qualitative researchers analyze.

Memoing (p. 375)

  • Qualitative researchers use a process called memoing to record their thoughts and ideas on the research data.
  • Memoing is typically on-going throughout the data collection process.

Segmenting (p. 376)

  • Segmenting is a process used by researchers to organize or categorize qualitative data.
  • This stage of qualitative data analysis occurs after the researcher has familiarized themselves with the data.

Making Research Real 14.9 – A Typology of Violence (p. 376)

  • In this study the researcher develops an inventory to classify behaviors in terms of their level of violence.
  • This is an example of how qualititative researchers use segmenting.

Coding (p. 377)

  • After segmenting the data, qualitative researchers go through their data and code it.
  • Coding refers to a process whereby researchers identify recurring themes, label these themes with a descriptive word or phrase (“codes), and organize their notes or transcripts according to these themes.

Diagramming (p. 378)

  • Diagramming is a process by which researchers develop flow charts or hierarchical diagrams to illustrate relationships between different parts of their qualitative data.

Matrices (p. 378)

  • Researchers also use matrices, or tables, to illustrate such relationships.

Getting to the Point (Chapter Summary) (p. 380)

  • During the analysis phase, researchers evaluate the data they gather to answer their research questions or hypotheses. Even though analysis occurs near the end of the research process, considerations of analysis should occur earlier in the research process.
  • Statistics summarize large amounts of data into a single number and enable us to communicate information efficiently. There are two general types of statistics: descriptive statistics and inferential statistics.
  • Descriptive statistics describe the current state of something. An important set of descriptive statistics are known as the measures of central tendency. These measures include the mean, median, and mode.
  • The mean is calculated by adding together all of the values for a particular variable and dividing that sum by the total number of cases. Although it is a good measure of central tendency, it is sensitive to extreme values, or outliers.
  • The median is referred to as the middlemost value because it is the value that is situated in the middle, with half the cases equal to or greater than and half the cases equal to or lesser than this value. It is less susceptible to extreme values or outliers than the mean.
  • The mode is the most frequently occurring value in a population or sample. Like the median, the mode is less susceptible to extreme values or outliers than the mean.
  • The decision about which measure of central tendency to use should be based on two factors: (1) whether the data are skewed toward extreme scores, and (2) what level the variables are measured at.
  • Measures of variability are descriptive statistics that tell us how much variation exists within a sample or population. Among the measures of variability is the range, which is the difference between the highest and lowest value in a sample or population. This descriptive statistic, like the mean, is susceptible to extreme scores or outliers.
  • The standard deviation is a descriptive statistic that describes how much variability exist within a sample or population. Because the standard deviation considers both the mean and the total number of cases in the sample or population, it is a much more stable statistic than the range.
  • A percentage is a descriptive statistic that describes a portion of a sample or population. Percentages are calculated by dividing the number of like cases by the total number of cases, then multiplying that quotient by 100.
  • A percentile is a statistic that tells us where a value ranks within a distribution. Sometimes this is referred to as the percentile rank. We calculate the percentile rank by dividing the number of cases below the value by the total number of cases and then multiplying that quotient by 100.
  • Percent change is a descriptive statistic that indicates how much something changed from one time to the next.