An Introduction to Business Analytics Using Excel

An Introduction to Business Analytics using Excel

Class 4: Turning customer feedback into actionable insights

Dr. Curry Guinn

Computer Science

Contents

Exercise 4.1: Using a Column Chart for Customer Ratings 1

Exercise 4.2: Comparing Customer Responses on Multiple Products 3

Exercise 4.3: Understanding the Results from a Single Response 4

Exercise 4.4: Are the Differences Between Two Means Significant? 5

Exercise 4.5: An Example of a Bimodal Distribution 6

Exercise 4.6 Sentiment Analysis Using an Annotated List of Words 7

Exercise 4.7: Word Frequency in Excel 9

Exercise 4.8: Semantria 11

Data files can be found here: http://people.uncw.edu/guinnc/Day4_CustomerFeedbackData.zip

Exercise 4.1: Using a Column Chart for Customer Ratings

A column chart is a useful visualization tool to display customer preferences that are expressed in scales. Column charts can be augmented with labels and error bars for readability and expressiveness.

Steps:

a. In Excel, open the file “SupraSedan.xlsx”. Column A gives a unique number to each customer feedback response. There are 10 customer feedback responses in this file. The scale of responses is between 1 and 5 for each attribute. Let’s assume 1 is the lowest rating and 5 is the highest rating.

b. Let’s calculate the mean (average) for each column Overall Satisfaction, Price, Performance, and Attractiveness. Here’s one way to do this:

a. Click on B12. On the Home menu, select AutoSum – Average. Notice that Excel will automatically select the appropriate range.

b. Now, using B12, copy (or fill) that value to the right in columns C, D, and E.

c. Let’s put a label in cell A12: “Average”.

c. Now, we will create a column chart using row 1 and row 12. Here’s one way to do this:

a. Select elements 1A:1E and 12A:12E. To select two non-adjacent rows, hold down the CTRL key while selecting.

b. In the Insert menu, choose Column. Select the first 2D column icon.

c. To change the look of the graph, notice the “Chart Layouts” menu on the ribbon. If you pull down the menu of choices, you can choose any that you like. For this presentation, I chose the 2nd one on the 2nd row.

d. You can change the title of the graph by selecting it and typing a new title “Supra Sedan”.

e. You can change the y-axis label too by selecting it and typing. It’s weird typing sideways.

d. Instead of computing the average (mean) rating, you could choose the median rating. What’s the difference?

a. To calculate the median, click on B13. Enter the formula “=MEDIAN(B2:B11)”. Be careful NOT to include B12.

b. If you fill right, you can calculate that result for all columns.

c. Are the rows 12 (Mean) and 13 (Median) different? Yes, they are.

e. A third way of looking at the numbers is to look at the MODE. This is the number that occurs most frequently.

a. To calculate the mode, click on B14. Enter the formula “=MODE(B2:B11)”. Be careful NOT to include B12.

b. If you fill right, you can calculate that result for all columns.

c. Are the rows 12 (Mean) and 13 (Median) and 14 (Mode) different?

f. Which is best? There is no right answer to this question. You could view all three in a column chart and see if there are any substantial differences. In a customer survey like this one, it is unlikely that mean and median will be considerably different.

g. Adding Error Bars. Remember that in calculating these statistics, we are only using a limited amount of data. How confident can we be in the results? Adding Error Bars gives an indication of how far off the “actual” answer might be.

a. Click on your graph.

b. Under Chart Tools – Layout, select Error Bars – Error Bars with Standard Error.

Exercise 4.2: Comparing Customer Responses on Multiple Products

Steps:

Column charts are particularly good at illustrating the difference between ratings such as those we find in customer surveys.

a. Open the spreadsheet named “SupraAndMega.xlsx”. Rows 2 through 11 contain customer ratings for the Supra Sedan (there are 10 ratings) and Rows 15 through 26 contain customer ratings for the Mega SUV (there are 12 ratings).

b. First, let’s calculate the mean (average) ratings for each vehicle individually.

a. In B12, calculate the average of B2:B11. Copy right through column E. Put the label “Supra Sedan” in A12.

b. In B27, calculate the average of B15:B26. Copy right through column E. Put the label “Mega SUV” in A27.

c. To be able to contrast customer ratings of each product, we can put both sets of data on the same column chart.

a. Select A1:E1 (first row), A12:E12, and A27:E27.

b. In the Insert menu, choose Column, and select the first 2D column icon.

c. If you want, you can change the design. Click on the chart, in Design, there are different “Chart Layouts”. I like the 2nd one on the 2nd row.

d. Change the title and axis labeled as desired.

e. Add error bars, if desired.

d. Visually, it’s easy to contrast the results.

Exercise 4.3: Understanding the Results from a Single Response

Examining the average response for a question is only a summary. What if you want a better understanding of how customers feel? A histogram can give you a better sense about how customers are responding to a single item.

Steps:

a. Open the spreadsheet labeled “Mega.xlsx”. Let’s see how customers responded to Performance.

b. A histogram reports how many responses are in a certain “bin”. A bin can be a range of numbers or a particular number. In this case, we have 5 bins (1, 2, 3, 4, and 5) – the possible responses. Let’s make a column to store those bin values:

a. Click on G1. Type “Ratings”. In G2, type 1. In G3, type 2. In G4, 3. G5, 4. G6. 5.

c. Under the Data portion of the Excel ribbon, on “Data Analysis”. Choose Histogram.

d. Select the column of interest for the input range: in this case, Column D2:D13. For the Bin Range, choose G1:G6. For the output range, choose I1. Choose Chart Output (if not already checked).

e. In columns I and J you can see the frequency count for each bin. It should also have created the Histogram.

f. We can edit that a bit to clean it up.

a. Removing the “More” from the chart by grabbing the bottom right corner of cell J7 and moving the bottom right corner to J6.

b. You could change the title if you wanted.

Exercise 4.4: Are the Differences Between Two Means Significant?

The purpose of this exercise is to use a T-Test to determine whether the difference between two means is statistically significant.

Steps:

a. Let’s use “SuperAndMegaTTest.xlsx”.

b. Under the Data menu, click on Data Analysis. The analysis we want to do is “t-Test: Two-Sample Assuming Unequal Variances.”

a. Let’s look at the customer’s opinion of Price first between the Supra Sedan and the Mega SUV.

b. For variable 1 range, select C2:C11.

c. For variable 2 range, select C15:C26.

d. You can click on any cell for Output range.

e. The two-tailed p-value is 2.15739E-06 or 0.000002157 or 0.0002157%. That means the chance that these two means are the same is very, very low. Therefore, we would conclude that the difference between these two means IS significant.

c. Now try the same analysis on Overall Satisfaction.

a. Variable 1 range is B2:B11. Variable 2 range is B15:B26.

b. The probability that these two means are the same is quite high (80%). Therefore, we would conclude that the difference between these two means is NOT significant.

d. How about Attractiveness?

a. It is different.

Exercise 4.5: An Example of a Bimodal Distribution

In this exercise we will look at the histogram that results from a bimodal distribution and compare it with the mean and median.

Steps:

a. Open the “Obamacare” spreadsheet. This is totally made up data. Imagine that subjects were asked this question: “How do you feel about Obamacare on a scale of 1 to 7 where 1 means ‘Hate It!’ while 7 means ‘Love It!’.

b. Compute the mean of Column B (the ratings).

c. Compute the median of Column B.

d. Now, use Data – Data Analysis – Histogram to produce a histogram.

a. The bins can be found in Column E.

b. What do you see?

c. Is the mean misleading? How about the median?

Exercise 4.6 Sentiment Analysis Using an Annotated List of Words

In this exercise we’ll use a freely available macro from Mike Alexander.

Steps:

a. Open “LinksysSentiment”.

b. Column A contains the customer’s review.

c. B contains the value output by the formula: GetSentimentValue.

d. Just for fun, type some of your own text in A2 and see what sentiment score you get.

e. If you want to see how that value is calculated, press Alt-F11 and take a look at the code. Notice that it contains numerous lists of words and their positive or negative values.

f. Now, suppose you want to know what percentage are very negative (less than -1), somewhat negative (less than -.04), neutral (between -0.4 and 0.4), somewhat positive (greater than 0.4 but less than 1), and very positive (>1). To do this, we use the countif and countifs functions.

a. To count the number less than -1 use the formula: =COUNTIF(B2:B201, "<-1")

b. To count the number between -1 and -0.4, use =COUNTIFS(B2:B201,">=-1", B2:B201, "<-0.4")

c. Between -0.4 and 0.4, use =COUNTIFS(B2:B201, ">=-0.4", B2:B201, "<=0.4")

d. Between 0.4 and 1, use =COUNTIFS(B2:B201, ">0.4", B2:B201, "<=1")

e. And greater than 1, use =COUNTIF(B2:B201, ">1")

f. Use your favorite graph (Column, Bar or even Pie Chart to visualize the results).

g. Imagine these reviews were arranged in order of time. If you select Column B and insert a Line Chart, you can observe whether there are any patterns over time.

Exercise 4.7: Word Frequency in Excel

We’ll approach this two ways: First, we will use the Pivot Table. It’s fast and simple. Then we will use a macro that does a bit more work for us.

Steps:

a. Open “LinksysDataEmpty”. Not much there. We are going to input the text in the file words.txt.

a. Under Data menu, choose From Text. Choose words.txt to import. Hit “Finish”.

b. Now, we will use Pivot Table to count the frequency of each word.

c. Select the “Words” cell (A1). Insert a Pivot Table. Go ahead and put it in a new worksheet. It’s less confusing that way.

d. In the Pivot Table, make sure “Words” is checked in the top box. Now, drag “Words” to the box labeled “Row Labels”. Now, also draw “Words” to the box labeled “Values”. Voila! We have a count.

e. To sort by most frequent, it’s probably easiest to copy those two columns “Row Labels” and “Count of Words” into two new columns (pasting by value). Then select those two new columns. Let’s sort them. Choose “Data” – Sort. Sort by the column containing the word count. Choose Largest to Smallest.

f. The most common word is “the”. But as you scroll down you can start seeing what other words are common.

g. The presence of all of those common words like “the” and “a” and “at” is annoying. How can we get rid of them? There’s a handy macro online that will do all of this for us.

h. Open “WordCount”. This Spreadsheet has three tabs. The first tab contains the reviews. The third tab (“Stoplist”) contains the words that will be removed and not counted. The results will be displayed in the second tab (“Issue”). To run the macro, choose View – Macros. Select “WordCount” and choose Run. The second tab now contains a list of all the word frequencies. Notice that with this list it’s much easier to identify the sorts of issues that customers are talking about.

Exercise 4.8: Semantria

Semantria is a commercial product that allows a limited number of free analyses per month. It will give you an idea of what the capabilities of these sorts of software packages can do. There are other alternatives such as Repustate, Text2Data, and QI Macros.

Steps:

a) Open “LinksysSemantria”. I’ve already produced the output from the Semantria software that is an add-on to Excel.

b) Sheet 1 provides the original data.

c) LinksysBasic sheet shows the affect score for each review. I’ve created a few plots to show how reviews have changed over time. The histogram is particularly revealing. Look at the changes around January 11.

d) The “PositiveAndNegative” sheet illustrates Semantria’s ability to identify facets (topics) and show and analyze each according to sentiment. In addition, if there are common attributes (adjectives usually) associated with those facets, they can be determined. The column chart has some interesting findings.

e) In “UserQueries”, I’ve demonstrated the ability to define topics by supplying relevant keywords. The Queries Breakdown chart shows the relative number of reviews that discuss those topics. The Query Sentiment Chart uses a Bubble Chart to chow sentiment of each category. The size of the bubble is related to the number of queries for each topic.