Analysis and Interpretation Questions

Chapter 1 questions

Multiple choice questions

1. a2. b3. c4. b5. c

6. d7. c8. b9. a10. c

11.c12. c13. a14. a15. c

Analysis and interpretation questions

Problem 1

a. For each observation we have only the range of possible values. Therefore we cannot compute exact value of the mean in the sample.We can compute interval that has to contain the true mean. You can think about it as “100% confidence interval”.

We need to consider two scenarios. To compute the lower bound for the mean we will assume that all observations in the sample take the lowest possible value in each bin.

In a similar way we can compute average over upper bounds of bins.

We can be sure that .

b.This is much harder question. We can find lower and upper bounds for the sample variance and therefore we can also find bounds for the sample standard deviation.

In order to find bounds for the sample variance, we can use technique similar to the one used in the part a.. For each possible value of the mean we can find the sample variance. The most important difference is that now for the variance lower bound we need to put all observations as close as possible to the mean. Therefore for some bins we need to assume that all observations take the smallest possible value, in the some other bins we need to assume that all observations take the largest possible value and finally in one bin all observations may be equal to the sample mean.

As an illustration consider cases of the sample mean equal to $130. The lower bound for the variance is equal to

Notice that for the bin containing $130, assuming that all observations are close to the mean isequivalent to assuming that they have to be equal to the mean. Similarly we can compute theupper bound

This procedure would need to be repeated for all possible values of the mean from part a.. We can use Excel Solver to find the maximal upper bound and minimal lower bound. Notice that proposed method does not need to give the tightest bounds possible since the allocation of observations may not be consistent with the considered sample mean.

Problem 2

For computations presented in this solution you can use Excel. All calculations are presented for illustration purposes only.

a. On average consumer buys

bagels.

b. Average number of bagels purchased by a consumer who purchased at least one bagel is equal to

c. Sample variances of the number of purchased bagels and the number of purchased bagels conditional on purchasing at least one bagel are equal to

d. Sample probabilities of purchasing at least one bagel or at least two bagels are equal to

e. The sample size is . Therefore standard errors are

For we have

With the probability 0.95, the probability of purchasing at least one bagel by the consumer entering the store is between 0.5392 and 0.6836.

Problem 3

a. Assuming that the numbershopping customers has normal distribution, implies that with the positive probability we can have negative number of shopping consumers. This makes the model less realistic but at the same time computationally simple.

b. We need to find such that probability that the number of shopping consumers is larger than is equal 0.05. In this case and therefore desired parking lot size is equal to 897.

c. Since only 80% of consumers come in their own car it is enough to find 80% of . In this case desired parking lot size is 718.

Problem 4

a. The sample probability of purchase is equal to. The standard error is equal to.

b. and the confidence interval is

c. and the confidence interval is

d. and the confidence interval is

e. The 99% confidence interval is the widest. All of those intervals are concentrated around the same middle point. Notice that the 90% confidence interval has to be a subset of the 95% confidence interval which has to be a subset of the 99% confidence interval.

f.Lower bound below zero is not consistent with the notion of being estimate of the purchase probability. Therefore we can truncate the confidence interval at zero.

Exam questions

Problem 1

a. If you produce 70 units then the probability that the unit cost is below 5 equals to 0.854. If you produce 80 units then the probability that the unit cost is below 5 equals to 0.987. Therefore, assuming that you can produce either 70 or 80 units but nothing between those quantities, you need to produce at least 80 units.

To find those probabilities you can use Excel function NORMDIST(5, MEAN, SQRT(VARIANCE), 1).

b. Using similar logic you need to produce at least 60 units to make probability that the unit cost is below 5 equal to 0.5.

c. It is enough to check the necessary initial cost only for the smallest size of the production. After using Excel function NORMINV(0.7,B2,SQRT(C2)) we get initial cost equal to 10.7866.

Problem 2

a. The sample mean of monthly sales is equal to 118.3584.

b. The sample mean of the annual sales has to be exactly the same as the 12 times the sample mean monthly sales. To see why notice that

c. d. We can use Kstat to compute necessary statistics.

mean / standard
deviation / standard error
of the mean / 90% confidence interval
Jan / 101.17 / 7.74 / 3.46 / 93.79 / 108.55
Feb / 118.09 / 6.46 / 2.89 / 111.93 / 124.24
Mar / 116.78 / 11.02 / 4.93 / 106.27 / 127.29
Apr / 130.05 / 6.95 / 3.11 / 123.42 / 136.68
May / 124.33 / 5.67 / 2.54 / 118.92 / 129.73
Jun / 125.65 / 6.27 / 2.80 / 119.68 / 131.63
Jul / 135.35 / 7.32 / 3.27 / 128.37 / 142.33
Aug / 125.41 / 11.69 / 5.23 / 114.26 / 136.56
Sep / 121.08 / 8.18 / 3.66 / 113.28 / 128.88
Oct / 114.15 / 8.67 / 3.88 / 105.88 / 122.41
Nov / 108.13 / 10.88 / 4.86 / 97.76 / 118.50
Dec / 100.12 / 5.81 / 2.60 / 94.59 / 105.66

The sample means in each month are different because we use only the subset of the sample to compute those means. Notice that standard errors of the mean are also different and therefore confidence intervals differ in length.

e. We implicitly assumed that the sample was constructed with a good sampling procedure – all observations are drawn independently from the same probability distribution.

Chapter 2 questions

Multiple choice questions

1.a, b2. a3. b4. c5. b

6. d 7. a8.c9.d10. a

11. a12. b13. a14.b15. b

Analysis and interpretation questions

Problem 1

a. b. Kstat output is

Saturday / Sunday
Mean / 50.4 / 53.4
standard deviation / 4.03732585 / 3.36154726
standard error of the mean / 1.80554701 / 1.50332964

c.We want to test vs. . The test statistic is equal to

d.We want to test vs. .The test statistic is equal to

e. Test statistic hast distribution with 8 degrees of freedom.

f. In this case we should replace sample variances by known population variances. In this case test statistic has standard normal distribution.

Problem 2

a. b. With Kstat we get the following results.

July / August
Mean / 75.500 / 73.267
standard deviation / 2.384 / 2.737
standard error of the mean / 0.973 / 1.117

c., . See section 2.2 in Chapter 2 for explanation.

d., .

e. Both test statistics have t distribution with 185 degrees of freedom.

f., .

Problem 3

a.With Kstat we get the following results.

Weekdays / Weekends
Mean / 262.1000 / 281.4000
standard deviation / 7.7093 / 5.4813
standard error of the mean / 2.4379 / 1.7333

b.t statistics for the 95% confidence interval is 2.2622 (9 degrees of freedom)

c. d.The marketing department wants to test the following hypothesis:

The value of the test statistic is

with 18 degrees of freedom. p-value in this case is equal to 0.00000226 (TDIST(6.4521,18,1)) so we have to reject the null hypothesis at and at .

e. The lowest significance level at which marketing department’s claim remains true is equal to the computed p-value.

Problem 4

a. Kstat output is presented below. Notice that we have different number of observations for comedies and dramas and therefore we cannot use Kstat to compute everything at once. More importantly, t-statistics are different.

Commedies / Dramas
mean / 38.9474 / 34.1429
standard deviation / 2.9716 / 3.1588
standard error of the mean / 0.6817 / 0.8442
Number of observations / 19 / 14
t-statistic for computing / 2.1009 / 2.1604
95%-confidence intervals

c. We want to test vs. . The value of the test statistic is

p-value in this case is equal to 0.00011 (TDIST(4.4276,31,2))and therefore we have to reject the null hypothesis.

d. e. We want to test vs. . The value of the test statistic is the same as before. p-value in this case is equal to 0.000055 (TDIST(4.4276,31,1)) and therefore we have to reject the null hypothesis for and for .

Exam questions

Problem 1

a. b. After introducing additional incentives the mean number of orders in the sample increased from 114.12 to 124.24.

Incentives / No incentives
Mean / 124.24 / 114.12
standard deviation / 9.12085522 / 9.51016999
standard error of the mean / 1.82417104 / 1.902034

c. The confidence intervals do not overlap.

d. e. We want to test vs. . The value of the test statistic is

p-value for this t is close to 0 and we have to reject the null hypothesis. If we reject the null at then we will also reject the null hypothesis at because p-value remains the same but is higher.

Problem 2

a.Sales of paperbacks are higher on average.

Paperback / Hardcover
Mean / 10533.3214 / 9748.9643
standard deviation / 580.2444 / 474.8376
standard error of the mean / 109.6559 / 89.7359

b. 95% confidence intervals for both means are

c.We want to test vs. . The value of the test statistic is

with p-value (TDIST(29.2917,54,1)) close to 0. We have to reject the null hypothesis.

d. The publishing company should publish only paperbacks if we assume that all consumers who purchased hardcover editions would purchase paperback edition when hardcover is not available.

e.If a hardcover edition brings 10% higher profits then each hardcover book is worth 1.1 of a paperback book. Therefore means have to be differ by

We want to test vs. . The value of the test statistic is

with p-value (TDIST(22.1760,54,1)) close to 0. We have to reject the null hypothesis again – paperback editions still bring higher total profits.

Chapter 3 questions

Multiple choice questions

1. c2. a3. a4. b5. a

6. c7. c8. a9. d10. b

Analysis and interpretation questions

Problem 1

a.One thousand dollars more spend on advertising should increase mean sales by 408.046572.

b.95% confidence interval for the value of the Advertising coefficient is

c.p-value for the Advertising coefficient is equal to 0.0000% and we have to reject the hypothesis that Advertising coefficient is equal to 0.

d. From part c. we know that the Advertising coefficient is significant and the R-squared statistic is close to 100%. Therefore we can conclude that there exist significant relationship between the volume of sales and the advertising expenditure.

Problem 2

a. One more hour spend on the task should decrease number of units that require service under warranty by 1.0293 percent on average.

b. 95% confidence interval for the value of the Hours coefficient is

c. p-value for the Hours coefficient is equal to 0.0000% and we have to reject the hypothesis that Hours coefficient is equal to 0.

d. From part c. we know that the Hours coefficient is significantly different from zero and the R-squared statistic is quite high. Therefore we can conclude that there exist significant relationship between the dependent and the independent variables.

Problem 3

a. Each additional test should decrease the mean share of faulty units by0.2987 percent point.

b. 95% confidence interval for the value of the Effort coefficient is

c. p-value f or the Effort coefficient is equal to 0.7611% and even at a 1% significance level we can reject the hypothesis that the Effort coefficient is equal to 0.

d. From part c. we know that the Effort coefficient is significantly different from zerobut the R-squared statistic is quite low. Therefore it is not clear whether there exist significant relationship between the dependent and the independent variables.

Problem 4

a. b. The scatterplot suggests that the relationship between the dependent and the independent variables is not linear. The estimated coefficient of the Price variable would be to low for small values of the Price variable and too high for large values of the Price variable.The scatterplot suggests that we should use dummy and slope dummy variables.

Problem 5

a. b.The scatterplot suggests that the relationship between the dependent and the independent variables is not linear. Using linear regression model without additional variables does not seem to be justified in this case.

The estimated coefficient of the Investment variable would be probably positive but close to zero.The scatterplot suggests that we should use dummy and slope dummy variables.

Exam questions

Problem 1

a. b. The scatterplot is presented below. The relationship between those two variables seems to be linear. We can expect that estimated coefficient of Advertising will be positive.

c.The linear regression equation is

d. Estimated regression equation is

Regression: Profits
constant / Advertising
coefficient / 4.70321174 / 2.04817526
std error of coef / 0.85183886 / 0.13719116
t-ratio / 5.5212 / 14.9294
p-value / 0.0001% / 0.0000%
beta-weight / 0.8908
standard error of regression / 3.03999575
R-squared / 79.35%
adjusted R-squared / 79.00%
number of observations / 60
residual degrees of freedom / 58
t-statistic for computing
95%-confidence intervals / 2.0017

e. p-value of the Advertising coefficient is close to 0 so we have to reject the hypothesis that the Advertising coefficient is equal to 0.

f. The 95% confidence interval for the Advertising coefficient is equal to

g. Predicted profits are equal to

Problem 2

a. b. After looking at the scatterplot we can conclude that the relationship between those two variables is possibly linear.

c.The linear regression equation is

Regression: Quality
constant / # of employees
coefficient / 78.04953 / 0.18089236
std error of coef / 0.77985798 / 0.01267079
t-ratio / 100.0817 / 14.2763
p-value / 0.0000% / 0.0000%
beta-weight / 0.8823
standard error of regression / 2.84088661
R-squared / 77.85%
adjusted R-squared / 77.46%
number of observations / 60
residual degrees of freedom / 58
t-statistic for computing
95%-confidence intervals / 2.0017

e. p-value of the ‘# of employees’ coefficient is close to 0 so we have to reject the hypothesis that this coefficient is equal to 0.

f.The 95% confidence interval for the ‘# of employees’ coefficient is equal to

g. h. Dependent variable can take any value between 0 and 100. Numbers larger than 100 are not feasible since the dependent variable measures share (multiplied by 100) of units that didn’t have to be repaired under warranty.

Since the estimated coefficient for the ‘# of employees’ is positive therefore for a very large value of the independent variable we will have predicted value of the dependent variable larger than 100, which is not feasible.

For large values of the independent variable, the model gives predictions that are not in the range of feasible values of the dependent variable.

Chapter 4 questions

Multiple choice questions

1. a2. c3. b4. d5. a

6. a7. c8.d9. c10. c

11. b12. d13. c

Analysis and interpretation questions

Problem 1

a. If we want to find out whether the sales depend on the weekly temperature, then the regression equation should be

b. When the average weekly temperature goes up by one degree, the mean change in weekly sales is 196.20 gallons.

Regression: Sales
constant / Temperature
coefficient / 3.46896574 / 0.19620474
Std error of coef / 0.85568891 / 0.01427431
t-ratio / 4.0540 / 13.7453
p-value / 0.0213% / 0.0000%
beta-weight / 0.9045
standard error of regression / 2.67864523
R-squared / 81.81%
adjusted R-squared / 81.38%
number of observations / 44
residual degrees of freedom / 42
t-statistic for computing
95%-confidence intervals / 2.0181

c. d. The value of R-squared statistic implies that 81.81% of variation of the dependent variable is explained by the independent variable. The Temperature variable explains substantial part of the variation in the Sales variable.

e.It is enough to check p-value of the Temperature coefficient. In this case we have to reject the hypothesis that the Temperature coefficient is equal to 0 at any reasonable significance level since the p-value is close to 0.0000%.

Problem 2

a.If we want to find out whether the sales depend on the price, then the regression equation should be

b.When the price goes up by $1 the mean demand goes down by 495.09 units. Notice that this result is consistent with the economic theory – ceteris paribus in most cases increased price should case decrease in the demand.

Regression: Sales
constant / Price
coefficient / 4972.2 / -495.09091
std error of coef / 32.8270629 / 5.29056023
t-ratio / 151.4665 / -93.5801
p-value / 0.0000% / 0.0000%
beta-weight / -0.9995
standard error of regression / 48.0538997
R-squared / 99.91%
adjusted R-squared / 99.90%
number of observations / 10
residual degrees of freedom / 8
t-statistic for computing
95%-confidence intervals / 2.3060

c. d. The value of R-squared statistic implies that 99.91% of variation of the demand is explained by the price. This is very strong relationship. We can say that only thing that really matters in determining the demand for your product is its price.

e. It is enough to check p-value of the Price coefficient. In this case we have to reject the hypothesis that the Price coefficient is equal to zero at any reasonable significance level since the p-value is close to 0.0000%.

Problem 3

a.In the table we have results of the estimation of the following regression equation:

b. After spending additional $1000 on advertising the mean sales should go up by 182.31.

c. In order to increase the mean sales by one unit we need to spend additional 1000/182.31=5.49 dollars.

d.From the c. results we know that to sell one more unit on average we need to spend additional $5.49 but we will get earn $5. Therefore the expected profit from increasing the advertising expenditure is negative and it is not profitable to increase the advertising expenditure.

e. 87.37% of variation of the demand is explained by the variation in the advertising expenditure.

f. p-values for the constant and the Advertising coefficients are close to 0.0000%. Therefore we can conclude that both of those coefficients are significantly different from zero.

g.We have to remember that in the dataset, the sales were expressed in hundreds of units while the advertising expenditure was expressed in thousands of dollars. Using the estimated values of the parameters we have

The predicted mean volume of sales is equal to 2258.73 units.

Problem 4

a. In the table we have results of the estimation of the following regression equation:

b.The mean profits from the project increase by $1810.63 after the team spends additional hour working on a project.

c. 78.05% of variation of the profitability is explained by the variation in the time spent on the project.

d. p-value of the constant is above 10%. Therefore we cannot reject the null hypothesis that the true constant is equal to zero.

e. p-value of the coefficient of Time variable is below 5%. Therefore we have to reject the null hypothesis that the true coefficient of Time variable is equal to zero.

f.The mean profits from the project in this case are equal $184,987.40 because

g. vs. .

Exam questions

Problem 1

a. b. The results of regressing Competitor vs. Market are presented in the table below. We have .

Regression: Competitor
constant / Market
Coefficient / -0.3300166 / 1.46260027
std error of coef / 0.7919562 / 0.25047361
t-ratio / -0.4167 / 5.8393
p-value / 67.7577% / 0.0000%
beta-weight / 0.4558
standard error of regression / 8.94399943
R-squared / 20.78%
adjusted R-squared / 20.17%
number of observations / 132
residual degrees of freedom / 130
t-statistic for computing
95%-confidence intervals / 1.9784

A 95% confidence interval for this estimate is