1

Basic Data Analysis Guidelines

for Research Students

Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW

University of Mary Hardin-Baylor

Social Work Program

January 30, 2012

1

© 2003 Isaac V. Gusukuma

1

Table of Contents

Page

Introduction...... 4

Organization of the Guide...... 4

Basic Guidelines for Constructing a Survey Question...... 5

Constructing Your Response Categories - Establishing Your Level of Measurement..5

Associating Response Categories of a Question to Statistical Procedures...... 6

Basic Guidelines for Analyzing Data...... 7

Data Analysis: Making Sense of Those Numbers...... 8

Check To Be Sure Your Data is Accurate...... 8

Conducting a Frequencies Analysis for Each Variable...... 9

Example of a Survey Question and SPSS Frequencies Output for the Variable SEX..9

Univariate Data Analysis...... 10

Analysis of a Nominal Level Variable...... 10

Example of a survey question and SPSS output for a nominal level variable.....10

Analysis of an Ordinal Level Variable...... 12

Example of a survey question and SPSS output for an ordinal level variable.....12

Analysis of an Interval/Ratio Level Variable...... 13

Example of a survey question and SPSS output for an interval level variable.....14

Bivariate (2 variables) Data Analysis...... 15

Chi Square (Goodness of Fit) Test...... 15

Example 1 - Chi square test...... 16

Example 2 - Chi square test...... 17

t-Test (Difference of Means Test)......

Example 1 - One sample t-test......

Example 2 - Independent samples t-test......

Example 3 - Paired samples t-test......

Analysis of Variance (ANOVA) Test......

Example of a one-way ANOVA......

Pearson’s Product Moment Correlation (r)......

Example - Pearson’s (r)......

Conclusion......

Appendices SPSS Output Screens

Appendix 1 Frequencies SPSS Screens...... 19

Appendix 2 Crosstab and Chi Square SPSS Screens...... 21

Appendix 2 t-Test SPSS Screens......

One Sample t-Test Screens......

Independent Samples t-Test Screens......

Paired Samples t-Test Screens......

Appendix 3 Analysis of Variance (one-way) SPSS Screens......

Appendix 4 Pearson’s r SPSS Screens......

References...... 23

Basic Data Analysis Guidelines for Research Students

Introduction

Research and statistics are inseparable. Knowing this is one thing. Understanding and using this relationship is another, especially for a research student. Anoversight of many researchstudents is that of waiting until later rather than considering early in the research process the relationship between the problem statement, research question, hypotheses, the kinds of data one will be collecting, and the statistical analysis ofthe data.

This basic guide for analyzing data is presented to encourage you to consider early rather than later in the research process the relationship that exists between questions asked on a survey, the response categories and data that is generated, and statistical procedures available to create some sense from the collected data.Thinking about data and its analyses should be part of the first steps in the development of a research proposal and like many other parts of the research process should be continually revisited, updated, and refined as your project draws to a conclusion.

This guide provides examples of univariate (single variable) and bivariate (two variables) analysis. It begins by encouraging youto be certain that your data set is accurate and “error free,” then proceeds to discuss several basic univariate and bivariate data analysis procedures.Univariate procedures are essentially what you already know as descriptive statistics. Bivariate statistical procedures presented in this guideinclude: the chi square test, the t-test, analysis of variance (ANOVA), and the Pearson’s r (correlation). This guide does not discuss multivariate (more than two variables) statistical analysis procedures.

Organization of the Guide

This guide begins with two very brief sections on constructing questions for a survey and general reminders about data analysis. The points in these two sections should serve as “memory joggers” as you begin to consider the relationship between your research design and statistical analysis. The Data Analysis section re-introduces you to the important task of insuring your data is “clean” by conducting a “Frequencies” procedure. Once you are fairly certain your data is accurate, you can begin thestatistical analysis procedures, initially conducting univariate data analysis then moving on to bivariate procedures.

This guide for data analysis assumesan understanding of basic statistics and basic skills and experience with SPSS ™.

Basic Guidelines for Constructing a Survey Question

Though this guide will not present all aspects of designing a research project, you may find it helpful to have a few reminders about constructing questions for a survey instrument. This will enable you tobe mindful that how you ultimately construct a question and its response categories determine what you can do, statistically, with it.

When constructing survey questionsor when selecting questions to use from a standardized instrument, you may want to keep in mind the following questions:

  1. What’s the purpose of my research? Am I trying to describe, to explain, to predict, or evaluate some occurrence and given the purpose of my research, will I need to generate descriptive statistics, inferential statistics, or both descriptive and inferential statistics?
  2. For each question on a survey instrument, does this survey question provide information about the independent variable(s), the dependent variable, the control variables, or is this question on the survey to provide some demographic information about the respondents?
  3. Which of the variables/questions do I intend to analyze together, i.e., gender of the respondents by their education level?
  4. What is the best or most appropriate level of measurement (nominal, ordinal, interval/ratio) for this variable? Should I create response categories so that I get nominal, ordinal, or interval/ratio level data?
  5. Will I have a random or nonrandom sample and is my sample of sufficient size that I can assume the scores approach that of a normal distribution?
  6. What is my anticipated sample size and will I have a sample of sufficient size such that I can conduct the statistical procedures I have planned to run?

How you answer these questions will, to a degree,influence the questions you ask on your survey andhelp establish the response categories for the questions. Most importantly they will influence the kinds of statistical procedures you are able to conduct for your study.

Constructing Your Response Categories- Establishing Your Level of Measurement

If you are constructing your data collection instrument, you have the opportunity to establish the level of measure for many of your variables. As an example, the variable education can be constructed in such a way that your data may be a nominal, ordinal, or an interval/ratio measure.

Education as a nominal measure:

Do you have a high school diploma?

____Yes ____No

Education as an ordinal measure:

What is your current class standing?

___Senior ___Junior ___Sophomore ___Freshman

Education as an interval/ratio measure:

How many years of education do you have?

______Years

As you examine the examples above of how you could construct a question about one’s level of education, you should recognize that designing and constructing a survey instrument is both a science and an art, and you should think of a question in terms of its response categories and level of measure. The next section further illustrates the importance ofthe response categories of your questions.

Associating Response Categories of a Question to Statistical Procedures

This section presents the relationship between level of measure of the response categories of a question and possible basic statistical procedures you can conduct.As noted earlier and illustrated in sections still to come, you should think in terms of both univariate and bivariate data analysis. The tables below provide a basic guide for the types of univariate and bivariate data analysis you can conduct, based on the measurement level of your variables. In the tables below, measurement level refers to the response categories for a given question on a survey.

Table 1: Univariate Procedures

Measurement Level / Basic Statistical Procedures
Nominal measures
EX: gender; ethnicity; religious preference / Mode, Percentages, Ratios
Ordinal measures
EX: socioeconomic status as high, medium, and low;class standing as Senior, Junior, Sophomore, Freshman / Mode, Median, Percentages, Ratios, Quartiles
Interval /ratio measures
EX: age in years; income in dollars; test scores / Mode, Median, Percentages, Ratios, Quartiles, Mean, standard deviation

In Table 2 Bivariate Statistical Procedures, you will notice a row and column identified as dichotomous. Dichotomous variables are a special category of variables that only have two meaningful response categories. Dichotomous variables, for the purpose of this guide, will be treated as though they are nominal level variables. Examples of dichotomous variables include Sex (Male/Female), US Citizen (Yes/No), Race (White/Nonwhite), and Religion (Christian/NonChristian).

Table 2 provides also you with recommendations about statistical procedures you may desire to conduct when examining two variables. Table 2 is read by looking at the intersection of the row and column that represents the level of measure of your two variables. Thus, if you have two interval level variables (interval x interval) you should probably conduct a Pearson’s r (correlation).

Table 2: Bivariate Statistical Procedures

Measurement Level of First Variable / Measurement Level of Second Variable
Dichotomous / Nominal / Ordinal / Interval/Ratio
Dichotomous / Chi square
Phi
Nominal / Chi square
Cramer’s V
Lambda / Chi Square
Cramer’s V
Lambda
Ordinal / Chi square
t-test (for interval like data) / ANOVA
One-way (for interval-like data) / Gamma, Somers' d, Tau B, Tau C, Spearman’s rho,
Pearson’s r (for interval-like data)
Interval/Ratio / t-test for
independent, paired, and one-sample / ANOVA
One-way / ANOVA
One-way
Pearson’s r (for interval-like data) / Pearson’s r

Basic Guidelines for Analyzing Data

Before you actually begin to conduct your data analysis, there are a few preliminary points to consider that may impact your statistical analysis. The statements below are for you to consider once you have collected your surveys and as you enter and begin the statistical analysis of your data.

  1. “Junk in, junk out,” meaning if your data isnot entered accurately (is not “clean”), the conclusions drawn from your statistical analysis may not be correct.
  2. You aregenerally more likely to find statistical significance with larger samples. Thus, if you have a small sample (exactly what “small” means will need to be covered in a research methods course) you are less likely to find significance, which leads to the next point.
  3. While an alpha level of .05 (level of significance,  = .05) is standard for most social science research, you may decide to establish either a higher or lower alpha based on your research design, question, and sample size. Consult with your professor or a statistical consultant about the alpha to establish for your analysis. The important point to remember is that you should establish your alpha before you conduct your statistical analysis.
  4. In statistical analysis a relationship is either significant or not significant. There is no relationship that can be described as “highly significant” or “strongly significant.” If you have established your alpha as .05, then whether the computed probability (p) is .049 or .0001, you canonly state that you have a “significant” relationship.
  5. Remember that a high or “strong” correlation is not the same as causation.

Data Analysis: Making Sense of Those Numbers

CheckTo Be Sure Your Data Is Accurate

One of the first steps in data analysis is to insure the information in your data file is accurate. In other words you should have some level of certainty the data entered into your SPSS data fileare correct. One way to check for errors in data entry is to run the Frequencies procedure. This will help you identify one type of data entry error, specifically when you enter a numeric value that does not represent a response code. For example, for the variable Sex, you have the numeric codes of 1for “Male” respondents, 2 for “Female” respondents, or 99 representing responses that are “Not Answered.” Upon running the Frequencies procedure you note that a 7 has been entered for the variable. The 7 is a data entry error because you should only have codes of 1, 2, or 99 for the variable Sex.

The Frequencies procedure, however, will only help you identify one type of data entry error. The output from a Frequencies procedure will not identify data entry errors where, for the variable Sex, you entered a code of 1 for a respondent when it really should have been a 2. In other words, you miscoded the respondent as “Male” instead of “Female” but the numeric code you entered, a code of 1, is a valid code for thevariable Sex. Identifying and correcting this and other types of data entry errors will require other procedures and processes on the part of the researcher or person entering the data.

Conducting a Frequencies Analysis for Each Variable

Check for the following:

  1. Is the total number of responses, the number of records entered, correct for each variable, i.e., if you entered 40 records, do you have 40 in the data file for each variable - good responses plus those you have identified as “system missing?”
  2. Are all the numeric codes entered correctly, i.e., if you are only supposed to have 1’s forMales, 2’s forFemales, and 99’s for Not Answered (NA), did you check to insure you don’t have any other numeric value entered for that variable?
  3. If you note errors in the data, correct them before you conduct your statistical analysis, then rerun “Frequencies” for those variables where corrections were made.
  4. Frequencies is not appropriate for string variables that have alpha numeric characters such as street addresses and names.

Example of aSurvey Question and SPSS Frequencies Output for the Variable SEX

Example of a survey question about the respondent’s sex with pre-coded responses:

1. What is your sex?

____ 1 Male ____ 2 Female

Example of SPSS Frequencies output for the variable Sex:

Statistics

RESPONDENTS SEX

N / Valid / 40
Missing / 0

RESPONDENTS SEX

Frequency / Percent / Valid Percent / Cumulative Percent
Valid / 0 / 1 / 2.5 / 2.5 / 2.5
1 MALE / 17 / 42.5 / 42.5 / 45.0
2 FEMALE / 22 / 55.0 / 55.0 / 100.0
Total / 40 / 100.0 / 100.0

Thoughthe Frequencies procedure will not totally eliminate the problem of data entry error, it will help reduce the error in your data. The Frequencies procedure can also generate basic descriptive statistics that will allow you to both check your data for errors and begin to develop a sense of the distribution of scores for your variables. The next section discusses univariate statistical procedures that can be conducted as you are running the Frequencies procedure.

Univariate Data Analysis

Univariate data analysis is the analysis of a single variable as opposed to conducting data analysis using two (bivariate) or more (multivariate) variables. The term “descriptive statistics” is most often associated with summarizing the characteristics of a variable or a set of variables.Another general term, “measures of central tendency,” is also used as a reference to the statistical procedures associated with describing the distribution of values of the responses to a single variable. Measures of central tendency include the mode, median, and mean. Other information about the distribution of scores in a variable that further assist with describing the variable include the range, upper and lower limits, variance, standard deviation, and confidence interval.

Analysis of a Nominal Level Variable

A nominal variable is a categorical variable that is measured in such a way that the categories indicate differences among respondents with no hierarchy or rank order implied in those differences. When constructing a survey question with nominal level response categories, the response categories should be mutually exclusive and exhaustive. Common examples of nominal level variables are Sex (Male/Female), Ethnic Background (Anglo, Hispanic, African American, Asian, Pacific Islander, etc.), and Religion (Protestant, Catholic, Jewish, Islamic, Buddhist, etc.).

The following statistics may be appropriate for nominal variables/data:

  • Frequencies (mode)
  • Percentages
  • Ratios

Example of a survey question and SPSS outputfor a nominal level variable

Example of a survey question and nominal response categories with pre-coded response categories:

1.What is your religious preference?

___1 Protestant ___2 Catholic ___3 Jewish ___4None __5Other

Example of SPSS outputs for the variable Religious Preference:

Statistics

RELIGIOUS PREFERENCE

N / Valid / 1477
Missing / 9
Mode / 1

RELIGIOUS PREFERENCE

Frequency / Percent / Valid Percent / Cumulative Percent
Valid / 1 PROTESTANT / 886 / 59.6 / 60.0 / 60.0
2 CATHOLIC / 367 / 24.7 / 24.8 / 84.8
3 JEWISH / 26 / 1.7 / 1.8 / 86.6
4 NONE / 146 / 9.8 / 9.9 / 96.5
5 OTHER / 52 / 3.5 / 3.5 / 100.0
Total / 1477 / 99.4 / 100.0
Missing / 9 NA / 9 / .6
Total / 1486 / 100.0

Example of SPSS pie graph with percentages for the variable Religious Preference:

Brief Interpretation of an Analysis of the Variable Religious Preference Using the Mode

The 1,486 respondents in this survey most often reported they were of a Protestant faith followed by those reporting they were of the Catholic faith.

Brief Interpretation of an Analysis of the Variable Religious Preference Using Percentages

Of the 1,486total respondents, 59.6% reported they were Protestant, followed by those reporting they were Catholic (24.7%) and Jewish (1.7%), while 9.8% reported they had no religious preference, 3.5% noted they had another religious preference, and 0.6% were “missing,” meaning they did not respond to the question.

Brief Interpretation of an Analysis of the Variable Religious Preference Using a Ratio

Slightly less than three of every five respondents reported they were of the Protestant faith.