Chapter 16: General Linear Model

§16.1 Gerenal Linear model

E(Y) = Y/X =0 + 1Z1 + 2Z2+ ….. + pZp

Y = 0 + 1Z1 + 2Z2+ ….. + pZp + 

Where Zj = Xj or some modification of Xj

Modeling Curvilinear relationship

Y = 0 + 1X1 + 2+

Example:

Reynolds, Inc.,is a manufacturer of industrial scales and laboratory equipment. Managers at Reynolds want to investigate the relationship between length of employment of their salespeople and the number of electronic laboratory scales sold.

Dependent variable (Y) = Scales sold

Independent variable (X) = Months employed

Model: Y = 0 + 1X + 2X2 +

Interaction

Y = 0 + 1X1 + 2X2 + 3+ 4+ 5X1X2 + 

Example:

Tyler Personal Care for one of its new shampoo products. Two factors believed to have the most influence on sales are unit selling price and advertising expenditure. To investigate the effects of these two variables on sales, prices of $2.00, $2.50, and $3.00 were paired with advertising expenditures of $50,000 and $100,000 in 24 test markets.

Transformations Involving the Dependent Variable

Ln(Y) = 0 + 1X1 + 

This transformation is useful when there is nonconstant variance of .

Example:

Consider the data in Table 16.4, which shows the miles-per-gallon ratings and weights for 12 automobiles.

Y = MPG rating

X = Weight of the automobile

The scatter diagram in Figure 16.8 indicates a negative linear relationship between these two variables. Therefore, we use a simple first-order model to relate the two variables.

Model 1: Simple linear model: Y = 0 + 1X +

Residual plot shows nonconstant variance.

Model 2: Ln(Y) = 0 + 1X +

Nonlinear Models That Are Intrinsically Linear

E(Y) = 01X

Ln(Y) = Ln(0) + Ln(1)X

Y’ = 0’ + 1’X andŶ’ = b0’ + b1’X

Then Ŷ = eŶ’

This model is appropriate when the dependent variable y increases or decreases by a constant percentage, instead of by a fixed amount, as X increases.

Example:

Reynolds, Inc.,is a manufacturer of industrial scales and laboratory equipment. Managers at Reynolds want to investigate the relationship between length of employment of their salespeople and the number of electronic laboratory scales sold.

Dependent variable (Y) = Scales sold

Independent variable (X) = Months employed

Model: E(Y) = 01X

§16.3 Analysis of a Larger Problem

Use multicolinearity analysis and eliminate independent variables such that the multicolinearity problem is eliminated.

§16.4 Variable Selection Procedures

Best-Subsets Regression

Determine the best fit regression for each size of regression equation starting from 1 independent variable to k independent variables, in increments of 1.

Step 1:Run all regressions with 1 independent variable and select the one with the best fit.

Step 2:Run all regressions with 2 independent variables and select the one with the best fit.

Repeat this process until you have found the best regression of all sizes.

Forward selection:

Step 1:Run all regressions with 1 independent variable and select the one with the best fit.

Step 2:For each of the remaining variables compute the p-value of individual significance and add the variable with least p-value to the regression estimate.

Repeat this until there is no variable with individual significance p-value <= Alpha to Enter. If you set Alpha to enter = 100%, Forward selection will stop only after adding all the variables are included.

Backward elimination:

Step 1:Run regressions with all the independent variables.

Step 2:Determine the p-value for individual significance of each of the variable in the model. Remove the variable with the largest p-value from the regression estimate.

Repeat this until there is no variable with individual significance p-value > Alpha to Remove. If you set Alpha to remove = 0%, Forward selection will end only when there is only one variable in the regression estimate.

Stepwise regression

Step 1:Run all regressions with 1 independent variable and select the one with the best fit.

Step 2:For each of the remaining variables compute the p-value of individual significance and add the variable with least p-value to the regression estimate.

Step 3:Determine the p-value for individual significance of each of the variable already in the regression estimate. If the variable with the largest p-value is >= Alpha to Remove, remove the variable from the regression estimate.

Repeat this until there is no variable with individual significance p-value <= Alpha to Enter, or with individual significance p-value >= Alpha to Remove.

Example:

Y = Total sales credited to the sales representative

X1 = Length of time employed in months

X2 = Market potential; total industry sales in units for the sales territory*

X3 = Advertising expenditure in the sales territory

X4 = Market share; weighted average for the past four years

X5 = Change in the market share over the previous four years

X6 = Number of accounts assigned to the sales representative*

X7 = A weighted index based on annual purchases and concentrations of accounts

X8 = Sales representative overall rating on eight performance dimensions; an aggregate rating on a 1-7 scale