Data Analysis from Questionnaires

In this tutorial, you will learn simple research using MS excel Data Analysis – Descriptive Statistics and Pivot Table to analyze your data from questionnaire survey. You may download the spreadsheet example of this tutorial here.

Click any topics below to enter the tutorial:

Data Example & Research Questions

Data Coding

Descriptive Statistics Data Analysis

Contingency Table

Data Example & Research Questions

We have the following data from Parks visitors (extracted from several Parks study, the actual data may include hundreds of families and hundreds of variables to more than 10 parks in the city). Here is the meaning of each variable

·  Family is the observation number of the questionnaire. One questionnaire was filled by for one family, not for individual.

·  Time is a quantitative data type, measured in minutes. It is the activity time of the family in the park

·  Mode is a nominal data type consist of four choice of mode to go to park:

1.  walk,

2.  car,

3.  cycling,

4.  bus.

The choice is mutually exclusive, that is only one mode for one family.

·  Activity is a nominal data type consist of 6 choices of activity in the park:

1.  sport,

2.  picnic,

3.  reading,

4.  walk (including with the dog),

5.  meditation,

6.  jog.

The choices are multiple choices, that one family may have several activities in the park.

·  Satisfaction is an ordinal scale with 5 values:

o  -2 = Very dissatisfied,

o  - 1 = dissatisfied,

o  0 = indifference,

o  1 = satisfied,

o  2 = Very satisfied.

It measures family satisfaction toward the park's services.

·  Playground is a nominal scale (Yes or No) about the existence of children playground

The data is show in this table

Family / Time / Mode / Activity / Satisfaction / Playground
1 / 30 / 1 / 1, 2, 3 / 0 / N
2 / 30 / 3 / 4,6 / 1 / Y
3 / 60 / 2 / 1, 2 / 2 / Y
4 / 45 / 1 / 5 / -1 / N
5 / 30 / 1 / 6 / 1 / N
6 / 60 / 2 / 2 / 2 / Y
7 / 30 / 3 / 4 / 1 / N
8 / 45 / 2 / 3, 4 / -1 / N
9 / 15 / 1 / 6 / 1 / Y
10 / 60 / 2 / 2 / 2 / Y
11 / 180 / 4 / 1, 2, 3, 4 / 2 / Y
12 / 120 / 2 / 1,2,4 / 2 / Y

With these very limited data, we want to know the following research questions :

In general, what is the level of visitors' satisfaction toward parks in the city?

1.  What activity most people do in the park?

2.  What mode most people use to go to park?

3.  Is the existence of children playground in the park may gain higher visitors satisfaction ?

4.  Is there any relationship between people activity and their activity time of the family in the park

5.  Is there any relationship between activity time and their mode to go to park?

How are you going to do to answer those research questions? Of course, with this simple data, you can even use calculator to analyze the data manually. However, we will use MS Excel to demonstrate what kind of tools is available in MS Excel for your research.

We assume the data above is valid sample (though they are actually too little to be real sample) for the shake of demonstration purposes only.

Data Coding from Questionnaires

We are going to use Microsoft Excel. Firstly, we need to transform the data into another format that Microsoft Excel could understand, that is to convert them into numbers one for each of value. Only variable Activity and Playground need modifications while the others are already in the correct format. Variable Activities need to be converted to binary format (1 and 0, 1 = chose that activity, 0 = did not do that activity) for each value. Thus, we will convert it into 6 columns according to the number of values in the variable Activity . Variable Playground need to be converted to binary (1 = yes, and 0 = no) so that we can use Data Analysis tool in MS excel that can only accept numerical data. You may download the spreadsheet example of this tutorial here.

Activity
Family / Time / Mode / 1 / 2 / 3 / 4 / 5 / 6 / Satisfaction / Playground
1 / 30 / 1 / 1 / 1 / 1 / 0 / 0 / 0 / 0 / 0
2 / 30 / 3 / 0 / 0 / 0 / 1 / 0 / 1 / 1 / 1
3 / 60 / 2 / 1 / 1 / 0 / 0 / 0 / 0 / 2 / 1
4 / 45 / 1 / 0 / 0 / 0 / 0 / 1 / 0 / -1 / 0
5 / 30 / 1 / 0 / 0 / 0 / 0 / 0 / 1 / 1 / 0
6 / 60 / 2 / 0 / 1 / 0 / 0 / 0 / 0 / 2 / 1
7 / 30 / 3 / 0 / 0 / 0 / 1 / 0 / 0 / 1 / 0
8 / 45 / 2 / 0 / 0 / 1 / 1 / 0 / 0 / -1 / 0
9 / 15 / 1 / 0 / 0 / 0 / 0 / 0 / 1 / 1 / 1
10 / 60 / 2 / 0 / 1 / 0 / 0 / 0 / 0 / 2 / 1
11 / 180 / 4 / 1 / 1 / 1 / 1 / 0 / 0 / 2 / 1
12 / 120 / 2 / 1 / 1 / 0 / 1 / 0 / 0 / 2 / 1

Descriptive Statistics Data Analysis

Microsoft Excel has excellent Data Analysis menu and one of them contain Descriptive statistics. To use the data analysis menu (suppose your MS excel do not have it), you need to install the Add-ins: Go to menu Tools – Add Ins – check the Analysis Tool pack and then press OK button. Next time when you open the Tools menu again, you will see Data Analysis in the bottom of Tools menu.

To use Descriptive statistics tool from the Data Analysis:

·  Click menu Tools – Data Analysis and you will see Data Analysis dialog. Scroll down and you will see Descriptive Statistics. Select it and click OK button.

You will get Descriptive Statistics dialog form. In the Input range, select range of your data that you want to be analyzed. Include the label in the first row and check that check box. Check also the Summary statistics check box and then click OK button.

The result of Descriptive Statistics tool, after a few formatting, is shown in the figure below. You may download the spreadsheet example of this tutorial here.

There are many results of data analysis, but do not be confused with all those numbers. Be focus only on your research questions . (That is why you need to formulate your research questions before to do any research as an important step). Which one of your research questions could be answered using the results of your analysis?

From the figure above, you see that the mean satisfaction is 1 (from the scale - 2 to 2 ). You may conclude that people are quite satisfied with the park conditions. This answers the first research question of “In general, what is the level of visitors' satisfaction toward parks in the city?”

You may also see that the sum of values in each activity may produce frequency that you may graph it further.

Activity / frequency / Relative frequency
(1) sport / 4 / 18.2%
(2) picnic / 6 / 27.3%
(3) reading / 3 / 13.6%
(4) walk (including with the dog) / 5 / 22.7%
(5) meditation / 1 / 4.5%
(6) jog / 3 / 13.6%
Sum / 22 / 100.0%

We may conclude that the majority of the family who visit the parks are doing picnic or walking with their dog. This answers the second research question of “What activity most people do in the park?”

From the Mode of Mode, you may conclude that majority of family who visit the parks are using car. This answers the third research question of “What mode most people use to go to park?”

Contingency Table

Using the data that we have formatted, we want to answer the next research questions:

4. Is the existence of children playground in the park may gain higher visitors satisfaction ?

5. Is there any relationship between people activity and their activity time of the family in the park?

6. Is there any relationship between activity time and their mode to go to park?

Because those research problems are about relationship between two variables, we need to do what is called Cross Tabulation. Cross Tabulation (or CrossTab for short) is a frequency table between two or more variables. For readability, it is normally involve less than 4 variables. Cross Tabulation table has many name for different people. Actually they refer to the same thing. Some statistician called it Contingency Table while MS excel call it Pivot Table .

In Microsoft Excel, CrossTabs can be automated using Pivot Table. You may use either Pivot Table icon in the toolbar or using MS Excel Menu Data – Pivot Table and Pivot Chart Report .

When you click the toolbar or menu, Pivot Table wizard will pop up, click Next

In the step 2 of the wizard, you highlight the data including the label of the data in the top as shown in the following figure

In step 3 of the Pivot Table Wizard, select Layout button

To answer the relationship between variable Playground and Satisfaction , drag and drop the name of the variables on the right into the diagram. Put Satisfaction button in the row and Playground button in the column and make another drop to put Satisfaction once again to the Data . It will appear as Sum of Satisfaction . After that, double click the last button ( Sum of Satisfaction ) and Pivot Table Field dialog will appear. Select summarized by Count and then click the OK button twice.

When you go back to the Step 3 of Pivot table wizard, click Finish button.

MS excel will automatically create the Cross Tabulation table. Personally, I don't like to use it directly because it may contain very long formula. Thus, I prefer to highlight this Pivot Table and use Menu Edit Copy (CTRL-C).

Then select another cell, and use menu Edit - Paste Special . Click Values options and click OK button

After we reformat, we need to count the Independent values of the table. We need to do so because we want to know whether variable Playground has relationship with variable Satisfaction , or not. We will perform some simple test, called Chi-square test . If the result of the chi-square test shows that, variable Playground is independent from variable Satisfaction, and then we cannot conclude any relation between the two variables. Otherwise, we can conclude the relationship. Sound easy, isn't it?

To get the independent values, we to compute

Don't bother with the formula if you don't understand notation . It is just a symbol for summation. (click here to learn more about this sigma notation) The meaning is like this: to get the cell of independent table, we need to multiply the total of rows with the total of columns then divide this with the total of all data.

Example, for Satisfaction = 1 and Playground = 1, we have data of 2 respondents. The total of rows is 7 and total of columns is 4, the total of all data is 12. Then, the independent value for that cell is 7*4/12 = 2.333. Do the same thing for all cells in the table we get the table of independent values. This table has meaning that if variable Playground is 100% independent from variable Satisfaction , then the contents of the cells must be equal to these values.

To make sure that variable Playground has relationship (not independent) from variable Satisfaction , we need to put the degree of independent as small as possible, say less than 5%. The number of 5% is called the error or mistake that may happen by chance (who can avoid that we get it out of luck). Some people called it significant level .

The problem now is how to get the index to indicate degree of independent? The mathematician are very smart, they invented probability to represent that “degree of independent”. To determine that probability, we need to compute the difference between the observe values (from the Pivot table) and the expected values (from the Independent table), square this difference, and then divide with the expected value (to get the same unit back) and we sum for all entries of the table. In short,

To get the probability, we need to compute degree of freedom (df), that is

The probability can be obtained using MS Excel function = CHIDIST( , ). Now the Chi-square test is sound like this:

If probability is lower than 0.05, the two variables have relationship, otherwise we cannot conclude any relationship between the two variables in the contingency table.

As shown in the figure above, the value of probability is 0.048, thus we may conclude that there is a relationship between existences of playground with satisfaction level of parks' visitors. This answers the research question of “Is the existence of children playground in the park may gain higher visitors satisfaction ?” positively.

Note that if follow old statistical book, it will ask you to compare the Chi-square value ( ) with the Chi-square from the table. You can get Chi-square value from the table in MS Excel using function =CHIINV(probability, degree of freedom). If you put the probability 0.05 and 3 degree of freedom, you will get 7.8147 (look at your Chi square table if you have it, whether it is the same number). Because you have put probability here, you should use the value of the Chi-square to compare. The chi-square test will sound like this

If your computed chi-square is larger than the Chi-square value from the table, the two variables has relationship, otherwise we cannot conclude any relationship between the two variables in the contingency table.

Another note that MS excel also provide function =CHITEST (actual-range, expected range) to simplify the computation of probability. In this case, you don't need to create the third table to compute the value of Chi-square. The result is directly probability that you can compare with the value of significant level (i.e. 0.05).