Content Materials – Excel Tutorial
Using Statistical Analysis to Verify Conclusions
Overview
This tutorial will provide students with instructions on how to organize their data into a contingency table. Students will also be able to calculate expected frequencies using HIV mortality data. This tutorial will also guide students through the calculation of the chi-square statistic.
Tutorial
Open Excel
Create a title for your contingency table. Enter the title in the first row.
Use the printout for the data table from California to create your own contingency table. Enter the variables you will be considering in your table.
Once the data is entered, click on B11 and then the summation symbol to calculate row totals. Press the enter key when you are ready to have the total calculated. Do this for each of the columns and rows. Totals should be in the following cells: C11 (total for 35-44 age group), D11 (total for 45-54 age group), E9(total for number of males), E10(total for number of females), and E11(grand total).
Enter totals as follows:
Now that the observed and total frequencies have been entered, the expected frequencies must be calculated.
Create a similar table below, modifying it to include the categories representing the expected frequencies.
The new table should look like this:
You may link the data by cell referencing. This is beneficial in case you decide to make modifications to the information in your contingency table. An example of cell referencing is as follows:
Press enter for each cell reference to be recorded.
The table should look like this at this stage.
Now the expected observations will be calculated as follows:
((row total for specific gender)X(column total for a specific age group))/grand total of population
For example, Expected Male, 25-34 age group would be calculated as: ((454)(101))/529. To calculate this in Excel, select the Exp M, 25-34 cell and type in the = symbol. Type in a left parentheses, click on the first cell to be referenced, include the multiplication symbol and click on the second cell to be referenced and then type in a right parentheses. To complete the calculation, type the division symbol and reference the grand total of the population.
This process is done for each cell. The table should look like this:
The calculation for the chi-square statistic is found in cell E14.
This formula is (Observed frequency-Expected frequency)^2/Expected frequency.
This is applied to each cell. After entering the calculation in each cell, press the enter key.
The table should look like the following:
To calculate the Chi-Square statistic for the data in the contingency table, the sum of all of the calculated differences between the observed and expected values (cells: E14, F14, G14, E16, F16, G16) is computed and compared to the Chi-Square critical value at a p-value of .005 or any p-value of your choice. For any comparison, the Chi-Square critical value is determined by the degrees of freedom (df) for the contingency table. The rule is the following:
df = (number of rows minus one) x (number of columns minus one),
not including the “total” rows or columns.
For our California HIV mortality rate data, the df = (3-1) x (3-1), therefore our degrees of freedom is 4. Now, you can refer to the appendix of any statistics text for critical values of the chi-square distribution or to the Chi-Square distribution table from Camosun College, Victoria, BC, Canada website. This table may be found in the content materials folder.
The cells used to calculate the chi-square statistic is found below.
The final value for the calculated chi-square statistic is shown below:
Compare the expected frequencies with the observed frequencies. What can you deduce from this information? Do you believe that the two variables will be independent of each other? Why or why not?
Compare the critical value to the calculated chi-square value. If the chi-square value is larger than the critical value, the data describes a statistically significant relationship between the variables chosen in your table. In comparing the critical value of 14.860 with 4 degrees of freedom at a p-value of .005 and calculated chi-square statistic, 4.318699213 in our example, there is reason to believe that there is no association between age group and gender with respect to mortality counts due to HIV in California in 2001.