Week 5 Individual Assignment
QNT 561
Here are the directions posted in the course syllabus:
Using the research question and the two variables that your learning team developed for the Week 2 Business Research Project Part 1 assignment, write a paper and create a spreadsheet that includes the following:
- State the research question.
- State the variables of interest that will be measured or analyzed in the study.
- Include the (mock) data as an Appendix or in the spreadsheet.
- Select a hypothesis test (test of means, test of proportions, Chi Squared test, etc) that can be used to test the research question.
- Conduct the hypothesis test on the data in Excel. Summarize the results of the test in the paper.
- Use the level of significance that you think is appropriate for the task (α = 0.05 or 0 01 are commonly used).
- Interpret the results and share your findings as they relate to the research question.
Format your paper consistent with APA guidelines. The paper should be no more than 350 words.
Submit both the spreadsheet and the paper.
Click the Assignment Files tab to submit your assignment.
For instructional purposes, I have decided to do one example for each team in no particular order. These are not the only ideas that you could have used, but they are ideas that fit the research plan described by the teams in earlier weeks as well as fit the requirement listed above.
TEAM 1
Organization: NRG Oil & Gas
Research Question: Several suggestions were made by the team. They all deal with oil spills, their causes, and potential adverse effects on the coastal population. For example: (1) For those residents effected by a spill, is the average health care cost for spill related treatment more than $300,000? (2) Do all causes of the spill – Fire, Collision, Grounding, Hull Failure – have an equal chance of occurring? (3) Are the mean spill volume for each cause the same?
Population and Sample: These will also vary based on the research question. The population may be all residents of coastal areas. It may be all documented oil spills in the last xx years. In any case, a sample should be taken using a probabilistic technique – random sampling, cluster sampling, stratified random sampling, etc – that seeks to be representative of the population. The sample should be large enough to minimize sampling error, likely in the hundreds of elements. (385??)
Summary of Variables: The variables being studied differ for each research question. Cause of spill is a nominal variable (Fire, Collision, Grounding, Hull Failure). Spill Volume is a quantitative variable at the ratio level. Health Care Cost is a quantitative variable at the ratio level.
Hypothesis Test: For the purposes of this example, I chose the research question: For those residents effected by a spill, is the average health care cost for spill related treatment more than $300,000?
Suppose I have data for 300 residents in coastal Texas, Louisiana, and Mississippi. The average health care cost for the members of this sample was $302,435 with a standard deviation of $23,210.
Since this is a large sample (300 is vastly higher that 30), and I want to test the sample mean against the hypothesizes value or $300,000, I will use a z-test for the mean. See section 6.3 (p. 331) in Statistics for Business and Economics.
Choose a level of significance: α = 0.05
For a one tailed test of hypothesis, the critical value for the normal distribution with α = 0.05 is z = 1.645.
Since the test statistic is greater than the critical value, it falls in the rejection region. Therefore we say that the sample contains sufficient evidence to conclude that the mean health care cost for the entire population is greater than $300,000.
Any questions?
TEAM 2
Organization: Wally’s Discount SuperMart
Research Question: Again, several suggestions were made by the team. They all deal with the average customer wait time and the number of lanes that are open for business. For example: Is there a correlation between the average wait time at registers and the number of lanes in operation?
I believe there is an obvious answer to this question, but it is still worthwhile from a statistics viewpoint to complete the analysis. Clearly if there is a line in a store and then another lane opens, the total wait time will decrease.
Population and Sample: The population here is all customers of Wally’s and a sample must be taken so that measurements are made at various stores and at various times of the day/week/month/ or year. Depending on how the questions is specifically framed, the population and sample may be a collection of cash register lanes.
Summary of Variables: The number of lanes open for business is a quantitative variable at the ratio level. The wait time per customer is also a quantitative variable at the ratio level.
Hypothesis Test: Correlation by itself is not a test of hypothesis. It is a descriptive statistic based on the sample of data that makes up the scatter plot. However, we can test if the correlation coefficient is sufficiently different from 0.
(See page 10-12!)
I will start this by entering the data in Excel and calculating the correlation. Construct a scatter plot and add a trendline.
So if we round r to -0.8, the test statistic is
The p-value is found with Excel by =T.DIST.2T(7.06, 28)which is extremely small. The null hypothesis would be rejected for any reasonable level of significance. Thus, there is a correlation between these variables.
TEAM 3
Organization: CellTel, a wireless service provider
Research Question: Again, several suggestions were made by the team. They all deal with the dilemma of a high customer churn rate, or the percentage of subscribers to a service that discontinue their subscription to that service in a given time period. In order for a company to expand its clientele, its growth rate must exceed its churn rate.
For the purposes of this example, let me choose the research question: Does the plan type effect the frequency of people who discontinue service?
Population and Sample: The population here is all customers of CellTel. A sample should be taken so that customers of each plan type being investigated is included in the sample. A stratified random sample is likely best. For the sake of arguments, :B, and Plan C.
Summary of Variables: The plan type is one categorical variable. Most likely it is nominal, though you could frame it as ordinal if the plans were ordered by something like cost or data/minute/text allowance. The other variable of interest is another qualitative variable called “Renew.” We can count the frequency of the customers who choose to renew or discontinue service.
Hypothesis Test: The data suggests that a Chi-Squared test should be used. The data (made up for this purpose) can be summarized in a contingency table like:
Plan A / Plan B / Plan CRenew / 100 / 105 / 60
Cancel / 50 / 45 / 90
H0: The observed frequencies are all equal (statistically equal to α level of significance).
Ha: At least one of the frequencies is different.
Alpha has not been specified but 0.05 and 0.01 are common values. The user can set the level. It you want to require more evidence to reject the null hypothesis, choose a smaller alpha value.
The CHISQ.TEST function in Excel returns the p-value for the test. If the p-value is less than alpha, then the null hypothesis is rejected.
Here, 5.19689E-05 is scientific notation for 0.0000519689. So the p-value is very small.
We can reject the null hypothesis and conclude that the observed frequency of cancelation for one of the plans is significantly different. While this test doesn’t tell us which plan is different, it is safe to assume that Plan C cancelations are significantly higher than the other.
So, perhaps CellTel needs to change the services or price for Plan C to retain more customers.
Questions?