- List the detailed business questions that you would like the Customer Support Click Stream data mart to be able to answer. Specify clearly the measures and constraints for each question.
Business questions about causal relationships between customer support efforts and sales
1)What are the total and average customer support task times and total number of tasks by product, product category, customer demographics, agent demographics, task begin time and task category?
Measures: Task time and the number of tasks
Constraints:
Product name, product category name;
Customer demographics: name, gender, city, state, income_level, education_level and Internet_usage_level;
Agent demographics: name, gender, position, experience_level, hiring_date, manager;
Product name and product category name;
Task and task category;
Task begin time: hour, day of week, day of month, month, week, quarter and year.
2)What’s the sales statistics by customer demographics, product, and login_time?
Measures: Sales amount, quantity and shipping fee
Unit_price, sales amount, qty and shipping fee
Constraints:
Customer Login_time - date, day of week, data of month, month, quarter and year;
Customer - name, gender, city, state, country, income_level, education_level and Internet_usage_level;
Product name and product category name
3)What’s the ratio of sales to task time and the ratio of sales to the number of tasks for customers who purchased and received customer support by customer demographics, agent demographics, customer login time, task begin time, task category and product infor?
Measures: Sales amount, quantity and shipping fee; the task time; the number of tasks.
Constraints:
Login_time - date, day of week, data of month, month, quarter and year
Customer who purchased and received customer support - name, gender, city, state, country, income_level, education_level and Internet_usage_level
Product name and product category name
Agent demographics: name, gender, position, experience_level, hiring_date, manager;
Task and task category;
Task begin time: hour, day of week, day of month, month, week, quarter and year.
Business question about the usefulness of web pages for agents to perform their customer support tasks
1)What’s the total and average customer support agent visiting time by web page, page type, task, task type, product, product category, customer demographics, agent demographics and time parameters?
Measures: Web page visiting time.
Constraints:
PageURL and page type;
Product name, product category name;
Customer demographics - name, gender, city, state, income_level, education_level and Internet_usage_level;
Agent demographics - name, gender, position, experience_level, hiring_date, manager;
Task and task category;
Entry_Time - hour, day of week, day of month, month, week, quarter and year.
2)What’s the number of effective clicks for customer support pages by web page, page type, task, task type, product, product category, customer demographics, agent demographics and time parameters? An effective click for customer support page is a click lasting more than 15 seconds and its status should be OK.
Measures: the number of effective clicks
Constraints:
PageURL and page type;
Product name, product category name;
Customer demographics - name, gender, city, state, income_level, education_level and Internet_usage_level;
Agent demographics - name, gender, position, experience_level, hiring_date, manager;
Task and task category;
Entry_Time - hour, day of week, day of month, month, week, quarter and year.
3)What’s the number of page visits in the pages required by the actions in a task by agent, task type and year?
Measures: The number of page visits in the page required of an action
Constraints:
Agent demographics - name, gender, position, experience_level, hiring_date, manager;
Task and task category;
Task begin time - hour, day of week, day of month, month, week, quarter and year.
- List any data that is needed to answer your business questions but is not available from the source data identified so far. Give and justify your recommendations for what to do with the unavailable data.
1)Assume that task time is
Task.End_Time - Task.Begin_Time
2)Assume that web page visiting time is
Agent_Click.Exit_Time – Agent_Click.Entry_Time
3)The number of page visits in web pages required by the actions in a task is not available because we cannot determine which action an agent performs when visiting a page. To get this information, the source data needs to collect the information by revising the ER model and the relational schema as in Figures 1 and 2.
- Write down the steps you take to design the star schema for the Customer Support Click Stream data mart.
Step 1: determine the business questions in focus and source information systems
We have already described business questions for customer support systems above. Additionally, the source data schema is given in the problem (page 5-7). Accordingly, we can design star schema for the customer support click stream data mart from these given information.
Step 2: determine the facts and their grain
Step 2.1: determine the facts
In terms of business questions, we plan to analyze customer support effort, sales and usefulness of customer support web pages used by agents. Furthermore, we can find customer support effort analysis is based on the customer support task analysis, and the usefulness of web pages analysis is based on the agent click and analysis. Therefore, three facts should be considered, task fact, agent click fact, and sales fact. However, since sales fact is already included in the existing SYA.com data warehouse, we concentrate on the rest two facts.
Step 2.2 determine grains of the facts
Usually, data warehouse stores facts at their lowest level of detail. According to the source data schema, we conclude:
- Grain of task fact is task, which is the lowest level of detail for the records in the Task table.
- Grain of agent click fact is agent click, which is the lowest level of detail for the records in the Agent_Click table.
Step 3: determine the dimension tables and keys
According to the business questions, customer efforts should be analyzed by product, product category, customer demographics, agent demographics, and task begin time, and task category. Hence, we need product, customer, agent, begin time and task dimensions for customer efforts analysis. Usefulness of web pages should be analyzed by web page, page type, task, task type, product, product category, customer demographics, agent demographics and entry time. Therefore, we need product, customer, agent, task, page, and entry time dimensions for usefulness of web page analysis. In summary, we need four conformed dimensions (customer, product, agent, and task) for both analyses. We also need begin time dimension for customer support effort analysis. The page and entry time dimension are required for the usefulness of web pages analysis. We also notice that customer and product dimensions are also the conformed dimensions for existing data mart. The dimension tables and their natural keys are list as below:
- CUSTOMER –> CID
- PRODUCT –> PID
- AGENT –> AID
- TASK –> TID
- BEGIN_TIME –> BEGIN_TIME
- PAGE -> PAGE_URL
- ENTRY_TIME -> ENTRY_TIME
Their corresponding data warehouse keys are:
- CUSTOMER –> CUSTOMER_KEY
- PRODUCT –> PRODUCT_KEY
- AGENT –> AGENT_KEY
- TASK –> TASK_KEY
- BEGIN_TIME –> BEGIN_TIME_KEY
- PAGE -> PAGE_KEY
- ENTRY_TIME -> ENTRY_TIME_KEY
Step 4: determine fact attributes and dimension attributes
Step 4.1: determine the keys for the fact tables.
First, we find the natural keys that can uniquely identify records in the fact tables. Since the grain of the task and agent click facts are the grain of Task and Agent_Click table in the transaction databases, task fact (TASKFACT) and agent click fact (AGENTCLICKFACT) can be identified by
- TASKFACT -> TID
- AGENTCLICK -> TID, Entry_Time
The corresponding data warehouse keys are:
- TASKFACT -> TASK_KEY
- AGENTCLICK -> TASK_KEY, ENTRY_TIME_KEY
Then, we add all the data warehouse keys from dimensions that constraint the fact table. Accordingly, the keys for fact tables are:
- TASKFACT -> TASK_KEY, CUSTOMER_KEY, PRODUCT_KEY, AGENT_KEY, BEGIN_TIME_KEY
- AGENTCLICK -> TASK_KEY, ENTRY_TIME_KEY, CUSTOMER_KEY, PRODUCT_KEY, AGENT_KEY, PAGE_KEY, ENTRY_TIME_KEY
Step 4.2: determine the measurements for the fact tables.
In terms of business question, we obtain following measurements for the fact tables:
- TASKFACT
- Task time – an attribute (TASK_TIME) in the fact table
- Number of tasks – no corresponding attribute in the fact table, it can be obtained by counting the number of records in the fact table.
- AGENTCLICKFACT
- Web page visiting time – an attribute (VISITING_TIME) in the fact table
- the number of effective clicks – can be calculated by counting the records that VISITING_TIME > 15 seconds and STATUS = 200. No additional attribute is required.
- the number of page visits in the pages required by the actions in a task – can be obtained by counting the records in the fact table (STATUS = 200). No additional attribute is required.
- Attribute STATUS should be added for counting the clicks.
So far, we get the two fact tables as follow:
Step 4.3: determine the attributes for the dimension tables
Usually, the attributes of each dimension table consist of the attributes extracted from corresponding relational table in the source data schema and the data warehouse key we determined in the step 3
Therefore, the dimension tables are presented as below:
Step 5: build the star schema
We associate the dimension tables with their related fact table to build the star schema for customer support data mart (see answer to sub problem 4)
- Draw the star schema using Microsoft Word