DS 303
Spring 2005
Exam # 3
Name: ______Key______
1.The information below represents the relationship between the selling price (Y, in $1000) of a home, the square footage of the home (), and the number of bedrooms in the home (). The data represents 65 homes sold in a particular area of a city and was analyzed using simple linear regression for each independent variable. Use the information to answer the following questions.
Summary measuresMultiple R / 0.8148
R-Square / 0.6640
Standard Error / 8.5572
Regression coefficients
Coefficient / Std Err / t-value / p-valueConstant / 52.157 / 7.4784 / 6.9744 / 0.0000
Square Footage / 4.646 / 0.4164
Summary measures
Multiple R / 0.6487
R-Square / 0.4208
Standard Error / 11.2344
Regression coefficients
Coefficient / Std Err / t-value / p-value
Constant / 100.628 / 5.2324 / 19.2316 / 0.0000
Number of Bedrooms / 11.035 / 1.6310 / 6.7660 / 0.0000
a)Is there evidence of a linear relationship between the selling price and the square footage of the homes? State the null, the alternative hypothesis, the test statistic, the decision criteria at = 5% and your decision.
ANSWER:
Yes; ; This model shows that homes in this area start at an average of $52,157 and the selling price increases by approximately $4,646 for each square foot in house size.
b)Identify and interpret the coefficient of determination () and the standard error of the estimate (Sy.x) for the model in the above question.
ANSWER:
R2 = 0.6640; This represents 66.4% of the variation in selling price can be explained by this regression equation. se= 8.5572; This represents the standard deviation of the residuals.
c)Is there evidence of a linear relationship between the selling price andnumber of bedrooms of the homes? If so, interpret the least squares line and characterize the relationship (i.e., positive, negative, strong, weak, etc.).
ANSWER:
Yes; ; This model shows that homes in this area start at an average of $100,628 and the selling price increases by approximately $11,035 for each bedroom in the house.
d)Identify and interpret the coefficient of determination () and the standard error of the estimate () for the model in Question c.
ANSWER:
R2 = 0.4208; This represents 42.08% of the variation in selling price can be explained by this regression equation. se = 11.2344; This represents the standard deviation of the residuals.
e)Which of the two variables, the square footage or the number of bedrooms, is the relationship with home selling price stronger? Justify your choice.
ANSWER:
Square footage seems to have a stronger relationship with the selling price. When using square footage as the explanatory variable, the R2 value is higher (0.6640 > .4208) and the sevalue (8.5572 < 11.2344) is lower. This indicates that the first model (using square footage) is a better fitting model.
2.The following time series plot shows the monthly data on new homes sales in the United States.
To check the data for trend and seasonality, we also produced a correlogram for the new homes sales.
Based upon examination of the time-series plot and correlogram of new homes sales, are the data seasonal? Is there an underlying trend? Explain
ANSWER
Both plots indicate the existence of trend in the data. The new homes sales are gradually increasing. ACF value at time lag 12 is significant at 5% level, indicating existence of seasonal components. The time series plot shows that new homes sales are the lowest in the 12th month. There is also cyclical component in this time series data as indicated by the not so regular fluctuations around the underlying trend.
- In choosing the “best-fitting” line through a set of points in linear regression, we choose the one with the:
- smallest sum of squared residuals**
- largest sum of squared residuals
- smallest number of outliers
- largest number of points on the line
- none of the above
2.The regression line -3 + 2.5 x has been fitted to the data points (28,60), (20,50), (10,18), and (25,55). The sum of the squared residuals will be:
- 20.25
- 16.00
- 49.00
- 94.25**
- none of the above
3.If an estimated regression line has a y-intercept of –7.5 and a slope of 2.5, then when x = 3, the actual value of y is:
- 0**
- 5
- 10
- –20
- unknown
4.In a test of the distribution of the anti-fungus activity of a chemical compound, fungus is grown in petri dishes with different concentrations of the compound and the diameter of the fungus colonies is measured after one day. There are 20 dishes, two at each of 10 concentrations. A plot of diameter against concentration shows a straight-line pattern, with higher concentrations giving smaller diameters. Least squares regression is used to analyze the data. What distribution is used in the test of the hypothesis that concentration has no effect on diameter?
A)t- distribution with 9 degrees of freedom.
B)t- distribution with 8 degrees of freedom.
C)t- distribution with 19 degrees of freedom.
D)t- distribution with 18 degrees of freedom.**
E)None of the above.
5.Stepwise regression is an approach to choosing the independent variables to be included in a multiple regression equation.
A) True**B) FalseC) Not enough information
6.A time series can consist of four different components: trend, seasonal, cyclical, and random (or noise).
a.True**b.False
7.The Y-intercept of the simple regression model
A)rarely has a useful interpretation. **
B)almost always has a useful interpretation.
C)is always a positive number.
D)is always positive when the correlation between the dependent and independent variable is positive.
E)All the above.
8.The following regression equation was estimated: Y = -2.0 + 4.6X. This indicates that
A)there has been an error since "b" cannot be a negative number.
B)there is a negative relationship between the two variables.
C)Y equals 44 when X is 10. **
D)the correlation coefficient for Y and X will be negative.
E)None of the above.
9.Visual inspection of the data will help the forecaster identify
A)trend.
B)seasonality.
C)linearity.
D)nonlinearity.
E)All the above. **
10.A multiple regression model using 200 data points (with three independent variables) has how many degrees of freedom for testing the statistical significance of individual slope coefficients?
A)199.
B)198.
C)197.
D)196. **
11.Which time-series component is said to fluctuate around the long-term trend and is fairly irregular in appearance?
A)Trend.
B)Cyclical. **
C)Seasonal.
D)Irregular.
E)None of the above.
12.The difference between seasonal and cyclical components is:
A)Duration.
B)Source.
C)Predictability.
D)Frequency.
E)All the above. **
13.When a time series contains no trend, it is said to be
A)nonstationary.
B)seasonal.
C)nonseasonal.
D)stationary. **
E)filtered.