Analyzing Data from an Experiment

Analyzing Data from an Experiment

Now that I have my experimental data, how do I begin to understand them?

Contents

I. Once I have my experimental data, how do I begin to understand them?

II. What type of data do I have?

Discrete data:

Continuous data:

Categorical data:

Normal Data

III. Is my data paired or unpaired?

IV. Do I want to compare my values to the mean or to the standard deviation?

V. Testing a Hypothesis

Why p < 0.05 ?

VI. Using an Unpaired T-Test

One Population Sample

Two Populations

VII. Using a Chi-Square Test for Variance

VIII. Using a Paired T-Test

IX. Using a Chi-Square Test for Goodness-of-Fit

Appendix 1: Critical Values for a T-Test

Appendix 2: Critical Values for a Chi-Square Test

I.Once I have my experimental data, how do I begin to understand them?

The point of analyzing any quantitative data that you collect in an experiment is to see if it does or does not support your original hypothesis. One way to determine whether or not this happens is to use a statistical test, which can tell you whether or not your data are statistically significant.

There are many different types of data that you can collect in your experiment, and each type has its own statistical test to determine significance. However, it can sometimes be difficult to figure out exactly what test to use, and exactly how to use that particular test.

Follow this flow chart and use the resources below to match the type of data you have to the appropriate statistical test!

II. What type of data do I have?

Data can be collected in a variety of forms.

When data are presented as numbers (known asnumerical or quantitative data), instead of words (known ascategorical, descriptive, or qualitative data), they can be eithercontinuous

ordiscrete.

Discrete data:

  • This is in contrast withcontinuousdata. If your data is discrete, the statistical tests on this page are not appropriate for your calculations.
  • Discrete data can be counted. It is made of numbers that cannot be split up into an infinite amount of values.
  • Some examples of variables that would be discrete are shoe sizes or the counts of an item (number of tomatoes on a vine, number of people in a room). Something like a size 9.452 shoe or 12.8516 tomatoes is not a meaningful piece of data.

Continuous data:

  • Continuous dataContinuous data is data that can be measured. It has an infinite number of points that exist within a range of values, and any of the values have a specific meaning.
  • For example, consider 0 to 100, -17.5 to 9371, or Ðinfinity to +infinity as possible ranges.
  • In the range of 0 to 100, you can get whole numbers like 1, 2, 3, or 4, but you can also get an unlimited amount of values between each whole number, like 1.1, 1.001, 1.0001, and so on. If you were to visualize a set of continuous data, you would be able to draw it on a graph without picking up your pencil.
  • Some examples of data sets that would be continuous:
  • growth measures (height, finger length, foot width)
  • things that depend on time (acceleration of a car, walking speed)
  • distances (how far each person walks on an average day)
  • Continuous data can be represented on acoordinate plane or scatterplot.
  • Some examples of how a continuous graph looks:

On the other hand, when data is presented with words that can be described in numbers, it is categoricaldata.

Categorical data:

  • Some examples of data that would be categorical:
  • types of music played on the radio and how many stations play each type
  • the number of each species of animal in an area
  • the number of hotels in each state of the US.
  • Categorical data can be represented on abar graph, pie chart, or a dot chart(a chart that shows the percentage of each category in the data). The data is split into different categories (hence the name!) and the numbers describe each category.
  • Some examples of how a graph of categorical data looks:

Normal Data

  • Many statistical tests are valid only if the population (the body from which measurements were taken) has anormal distribution. For example, if a study was done on the heights of kindergarteners in a class, the population would be all kindergarteners in general. Normally distributed populations have certain measures associated with symbols that are commonly used in statistical calculations.
  • Anormal distributionis data that looks like a bell curve when plotted on a density curve (the independent axis showing the values of the data points and the dependent axis showing the frequency of those values). The bell curve is symmetric and can describe many bodies of data, with a majority of the subjects in the middle range and fewer and fewer subjects in the lower and higher ranges.
  • A normal distribution looks like this:

III. Is my data paired or unpaired?

Paired datais data in a study that includes two sets of measurements about the same observed subject.

Some general situations involving paired data:

  • before and after (the height of a group of plants plant before applying a fertilizer and the height of the same group of plants after applying fertilizer)
  • comparing closely related things (weight gained over a span of time in sets of twins, the number of nights experiencing insomnia in married couples)

The body of subjects in a set of paired data is called amatched population, because each subject in the population is matched with another subject.

Independent populations, on the other hand, contain data that do not affect each other. Two or more sets of measurements are not influenced in any way by each other. These measurements would be unpaired.

Some situations involving unpaired data:

  • comparing a sample to an expected standard (average temperature of a city for 100 years compared to data taken over the last two years)
  • comparing a treatment group to a control group (sleep time of a group treated by a sleeping pill compared to sleep time of a group not treated by a sleeping pill)

One population samples versus multiple population samples

Sometimes data comes from one population while other times it comes from two or more populations.

Examples of situations that would come from one population:

  • comparing sample data to a population (income of one city versus national income, fur color in one community of squirrels versus fur color in all squirrels of the same species)
  • matched paired data (two measurements of the same population at different times or under different conditions – for example, before and after a hurricane)

Situations that would come from more than one population would be comparing data from two different groups (comparing lifespan of men versus women, comparing an experimental group versus a control group).

IV. Do I want to compare my values to the mean or to the standard deviation?

A population includes all possible objects of interest, whereas a sample includes only a portion of the population. For example: in a study of United States middle school students, the population is every middle school student in the United States and a sample would be the middle schoolers in one school.

The mean (represented by μ ("mu") for a population and("x-bar") for a sample) is calculated like any arithmetic mean Ð by adding all of the values and dividing it by the total number of values.

The standard deviation (represented by σ ("sigma") for a population, andsfor a sample) or sample standard deviation (s) is a number that describes the spread of the numbers in a set of data, or how far apart the numbers are. The larger the value of the standard deviation, the more spread out the numbers. It is calculated by taking the square root of the average distance that each point is from the mean.

Standard deviation is calculated using these formulas:

Thedegrees of freedomin a statistical study is one less than the total number of subjects. For instance, if 100 people were surveyed in a study, the degrees of freedom would be 99.

V. Testing a Hypothesis

In performing statistical analysis, hypotheses must first be designated for determining statistical significance.

Thenull hypothesis(represented by h0) is the statement being tested and usually assumes that the expected effect in a study is NOT true.

Thealternate hypothesis(represented by h1) is the statement of the expected effect and is accepted if the null hypothesis can be rejected based on the statistical calculations.

For instance, in determining whether there is a significant difference in the average grades of one class versus another class, the hypotheses would look like this:

  • Null hypothesis (h0) : There is no difference in the average grades of the two classes.
  • Alternate hypothesis (h1) : There is a difference in the average grades of the two classes.

An alpha value (α) is used to determine what qualifies as statistically significant. The alpha value is the percentage likelihood that the difference found in the calculations is due to chance.

To determine the alpha value for a given statistical test, you must compare your statistic to a table of critical values. For example, for a t-test, you would compare your t-statistic, given your specific degrees of freedom, to find your alpha value.

  • A table of critical values for T-tests can be found on page 17.
  • A table of critical values for Chi-Square tests can be found on page 18.

Why p < 0.05?

Analpha value (α)is set at a certain value to determine what qualifies as statistically significant.

Thep valueis the percentage likelihood that the difference found in the calculations is due to chance.

The scientifically accepted alpha value for statistical significance is 0.05, which says that there is a 0.05 or 5% likelihood that the data is due to chance. So if you were to find a p value lower than 0.05, your calculations would show the research is statistically significant.

In testing for the validity of an alternate hypothesis, you need to decide whether to use a one- or two-tailed test.

  • Aone-tailed testdetermines whether the sample in question is either statistically higher than an expected value or statistically lower than an expected value. The test only determines whether it is ONE of these extremes and does not account for statistical differences in the other extreme.
  • Atwo-tailed testdetermines whether the sample in question is either statistically higher OR lower than an expected value. The test determines whether it is either one of these extremes in the same test.

If you find that your results are statistically significant, then you can reject the null hypothesis and accept the alternate hypothesis. If you do not find that your results are statistically significant, then you do not have a conclusive statistical analysis.

VI. Using an Unpaired T-Test

One Population Sample

Null Hypothesis: The sample mean is equal to the population mean.

How the test works:

  • The test statistic is calculated using a formula that has the difference between the means in the numerator; this makes the test statistic get larger as the means get further apart. The denominator is the standard error of the difference in the means, which gets smaller as the sample variances decrease or the sample sizes increase. Thus the test statistic gets larger as the means get farther apart, the variances get smaller, or the sample sizes increase.
  • The probability of getting the observed test statistic value under the null hypothesis is calculated using the t-distribution. The shape of the t-distribution, and thus the probability of getting a particular test statistic value, depends on the number of degrees of freedom. The degrees of freedom for a t-test is the total number of observations in the groups minus 2, or n1+n2-2.

Assumptions: The t-test assumes that the observations within each group are normally distributed and the variances are equal in the two groups.

Example: In the past semesters for the last 5 years, students in the 5 p.m. section of my Biology class had an average height of 64.6 inches. This year there were 5 students with the following heights. Is the average heights of the previous sections significantly different from their heights?

5 p.m. / 68 / 62 / 67 / 68 / 69

Here is the data:

  1. Determine the mean.

Mean: (68 + 62 + 67 + 68 + 69) / 5 = 66.8

  1. Determine the standard deviation.
  1. Calculate the t-statistic.
  1. Find the degrees of freedom.

df = n - 1 = 4

  1. Use the degrees of freedom and your t-statistic to determine your p-value based on a chart of critical values.
    According to this table, 0.05 < p < 0.010, and thus the null hypothesis can NOT be rejected.

Two Populations

When to use: An unpaired t-test is used when you are comparing the means for two different groups or populations. It is usually used when you are comparing the mean from an experimental group to the mean of a control group. So the individuals in group 1 are not the same individuals in group 2.

How to do an unpaired t-test: Suppose you expose caterpillars to different temperatures (25°C and 33°C) to see if they develop into the next instar (stage) faster at the higher 33°C temperature.

  1. First, identify null and alternative hypotheses:

Null hypothesis: The mean time for a caterpillar to develop into the next instar is the same between the control and experimental group.

Alternative hypothesis: The mean time for a caterpillar to develop into the next instar is faster in the experimental group than in the control group.

  1. Now calculate the means for each group:
    Control group: (2+4+3)/3 = 3
    Experimental group: (3+2+3)/3 = 2.67
  1. Now add the squares of the group mean minus each data value for both the control and experimental group:
  1. Now calculate variance:
  2. Now perform an unpaired t-test:
  1. Now look at the table of critical values of t-distributions and determine the range the p-value must fall in (p < 0.05 being significant) (degree of freedom= n1+ n2- 2).

Degree of freedom = 3 + 3 - 2 = 4
p > 0.25
The means between the control and experimental group were NOT significantly different therefore; you cannot reject the null hypothesis.

VII. Using a Chi-Square Test for Variance

When to do this test?

  • This test is useful if you have data that follows a normal distribution. This test compares the data to the standard deviation. To use this test you need continuous data.

What is the equation?

  • Where:
  • n = the total number of data
  • s = sample variance (standard deviation)
  • sigma = population variance (known variance)

What does it tell you?

  • The chi-squared test for variance is similar to the chi squared test for goodness of fit. You still have a null hypothesis and an alternative hypothesis. This test will tell you if the variance in a sample matches the hypothesized variance.
  • When you complete the chi-square test for variance on a data set, you will compare your final chi-square value and your degrees of freedom to a chart of critical values. Your critical value will then determine whether or not you can reject your null hypothesis.A table of critical values for the chi-square test can be found on page 18.

Why is it useful?

  • Imagine you work in a factory and know that the boxes get filled accurately to 50 pounds with an average standard deviation of about 1 pound. You hire a new set of packers and want to make sure they are as accurate as the last. You would use the chi-squared test for variance to compare the standard deviation of the current packers to the known average standard deviation of the factory.
  • Overall, this test compares a set of data to the standard deviation and reveals the variance in the data set compared to the expected variance in the data set.

Example:

A cereal manufacturer wishes to test the claim that the variance of sugar content of its cereals is 0.644. Sugar content is measured in grams and is assumed normally distributed. A sample of 20 cereals has a standard deviation of 1.00gram. At α = 0.05, is there enough evidence to reject the manufacturer’s claim?

  1. State the hypotheses and identify the claim.
  1. Compute the test value.

    Since the standard deviation s is given in the problem, it must be squared for the formula.
  1. Find the critical values.

Find the critical values. The degrees of freedom are 19; since this test is a two-tailed test you need to find p/2. So let 0.25 < p/2 < 0.05, then 0.05 < p < 0.10. The critical values for 0.025 and 0.1 must be found; hence, the critical values are 32.852 and 8.907, respectively. Find this using a chi squared table.A table showing critical values for a chi-square test can be found on page 17.

  1. Make the decision.

Do not reject the null hypothesis, since the test value falls between the critical values region.

  1. Summarize the results.

There is not enough evidence to reject the manufacturer’s claim that the variance of the sugar content of the cereals is equal to 0.644.

VIII. Using a Paired T-Test

Purpose:The paired t-test usually compares the mean of a population before and after a treatment. For a paired t-test, you must take the mean of the difference between the two pairs (before and after). This test assumes that the difference between pairs is normally distributed.

How to do a paired t-test:Suppose you take the blood pressure of three sixth graders before they eat a piece of cake and again 30 minutes after they eat the cake and get the following:

Sixth Grader / Blood Pressure Before Eating Cake / Blood Pressure After Eating Cake
Student 1 / 115 / 117
Student 2 / 122 / 123
Student 3 / 118 / 121

You want to know if their blood pressure is significantly higher after they eat the cake than before, so you decide to use a paired t-test.

  1. First you must find the differences:
    Student 1: 117 - 115 = 2
    Student 2: 123 - 122 = 1
    Student 3: 121 - 118 = 3
  1. Now calculate the mean:
    2 + 1 + 3 = 6
    6 / 3 = 2
  1. What is the null hypothesis?

The null hypothesis assumes that the mean difference between the pairs is zero.
So H0 = 0, which in the equation is subtracted from the mean difference you calculated in step 2.

  1. What is the alternative hypothesis?

The alternative hypothesis assumes that the mean difference in blood pressure will be higher after eating the cake. Therefore, the alternative hypothesis would be H1: μ > 0.