FUNCTIONAL FORM OF REGRESSION RELATION

Sometimes theory indicates appropriate functional form form. EX: Concentration of a drug in the blood as a function of time after intake often follows a negative exponential curve. More often the functional form is derived from the data. The regression function may be approximated by a linear function, a quadratic, a higher degree polynomial, or a combination of linear functions ("splines").

In the next example the relationship is clearly not linear, but a straight line can still represent a useful first approximation of the relationship.

DATA FOR REGRESSION ANALYSIS

1. Observational Data

Data obtained from nonexperimental studies so that values of X are not controlled. Observational data do not directly offer strong support for causal interpretations.

2. Experimental Data

Experimental units are randomly assigned to treatments (i.e., different values of the independent variable(s)). Experimental data allow stronger causal inferences.

  • ML Estimation of the Mean 

Assume Y normally distributed and variance known ( = 10); estimate mean  from sample with n=3 (Y1 = 250, Y2 = 265, Y3 = 259). Given an estimate of , the likelihood of the sample (Y1, Y2, Y3) is the product of the probability densities of the Yi given the value of the estimated mean.

The normal probability density function is f(Y) = (1/(SQRT(2))) exp(-(1/2)((Y- )/)2) (see NKNW A.34 in Appendix A)

EX: If  = 230, f(Y1) = .005399, f(Y2) = .000087, f(Y3) = .000595 so that likelihood L (  = 230) = (.005399)(.000087)(.000595) = (.279)10-9

Similarly, L (  = 259) = (.026609)(.033322)(.039894) = .0000354 The likelihood of  = 259 is greater than the likelihood of  = 230.

One could calculate the likelihood of the sample over a range of closely-spaced values of and graph the resulting likelihood function of

The value of corresponding to the maximum of the likelihood function (here = 258) is the ML estimate of In practice the value(s) of the parameter(s) that maximize the likelihood function are found by iterative numerical optimization methods.

8. USING SYSTAT'S CALCULATOR IN LIEU OF STATISTICAL TABLES

SYSTAT provides cumulative, density, inverse and random variate functions for the 13 distributions listed in the table below. The functions are named systematically with 3-letters names with suffix -CF, -DF, -IF, or -RN, accrding to the type of function.

Cumulative distribution functions (suffix -CF) compute the probability that a random value from the specified distribution falls below or is equal to the given value.

Density functions (suffix -DF) is the height at x of the ordinate under the density curve of the specified distribution.

Inverse (cumulative) distribution functions (suffix -IF) take a specified alpha (a probability value between zero and one) and return the critical value below which lies that proportion of the specified distribution.

Random variate functions (suffix -RN) generate pseudo-random variates from the specified distribution.

Exhibit: Table of SYSTAT's distribution functions (SYSTAT 6.0 DATA p. 143)

1. Cumulative Distribution Functions

Use the -CF distributions to obtain probabilities (i.e., p-values) associated with observed sample statistics.

EX: calculate the 2-sided p-value of the slope of a simple regression model (i.e., with 1 independent variable plus a constant) with t* = 1.79 and n=27. Use the Student t distribution with n-2 = 25 df:
calc 2*(1-tcf(1.79,25))
0.085575
This calculates the 2-sided p-value as twice the area under the curve above 1.79.

EX: calculate the 2-sided p-value of a regression coefficient in a multiple regression model with p-1 = 5 independent variables plus a constant (so that p=6), with t* = 1.79 and n=27. Use the Student t distribution with n-p = 27-6 = 21 df:
calc 2*(1-tcf(1.79,21))
0.087885
Note this is slightly larger than in the previous example because of fewer df (21 versus 25).

EX: calculate the 2-sided p-value of a regression coefficient in a regression model (simple or multiple, it doesn't matter) when n is large (say n= 100), with t* = 1.79. Use the standard normal distribution:
calc 2*(1-zcf(1.79))
0.073454
Note that you can use the t distribution tcf with any n. The tcf result will automatically converge to the zcf result when n becomes large.

EX: calculate the p-value for an F test. F* is 4.14; the F distribution has 3 and 7 df.
calc 1 - fcf(4.14,3,7)
0.055480
Here the result is not multiplied by 2 because the F test is one-sided.

EX: calculate the one-sided p-value of a regression coefficient from a multiple regression when n is large (say above 100), so one can assume the sampling distribution is normal, t* = -2.033, and the research hypothesis is that the regression coefficient is negative:
calc zcf(-2.033)
0.021026
Here the result is not multiplied by 2, because one wants the one-sided p-value. Note also that (unlike normal tables) the zcf function returns the probability for negative values of the sample statistic.

2. Density Functions

Use the -DF density functions to calculate the probability density at x.

EX: NKNW (pp. 30-32) illustrate the calculation of the likelihood of a sample of 3 observations given values of the mean , assuming normally distributed errors. Y1 = 250,  = 10, =259. Thus the likelihood of Y1 is
calc zdf((250-259)/10)
0.266085
Note that the values given by NKNW (p. 31) are incorrect. They are all shifted one decimal to the right.

3. Inverse (Cumulative) Distribution Functions

Use the -IF inverse cumulative distribution functions to calculate critical values given alpha and to construct confidence intervals.

EX: In a simple regression model with n = 17, and  = 0.05, what is the critical value ofthe t-ratio t* such that t* greater than this value indicates that the regression coefficient is significantly different from zero at the .05 level (2-tailed)? Use the inverse t distribution with n-2 = 15 df.
calc tif(0.975,15)
2.131450
OR
calc tif(0.025,15)
-2.131450

EX: in a multiple regression analysis a regression coefficient is 3.77 with s.e. 0.23. Calculate the 95% CI assuming that n is large, so the sampling distribution can be assumed normal.
calc 3.77 + zif(0.975)*0.23
4.220792
calc 3.77 - zif(0.975)*0.23
3.319208

EX: same thing, but now n = 24, and there are p-1 = 3 independent variables plus a constant term, so that p = 4. You now need the t distribution with 24-p = 20 df
calc 3.77 + tif(0.975,20)*0.23
4.249772
calc 3.77 - tif(0.975,20)*0.23
3.290228
The CI is a bit wider, as one would expect.

4. Random Variate Functions

The -RN functions generate psudo-random numbers distributed according to the particular distribution. They are mostly useful to generate large samples of random observations to do Monte-Carlo studies, using SYSTAT's programming language. However, there may be situations when you want to get single random values.

EX: pick a uniformly distributed random number between 0 and 1
calc urn(0,1)
0.179755

EX: assign yourself a random IQ score (with mean = 100 and sd = 15)
calc zrn (100, 15)
95.609537 (ouch!)
EX: flip a coin
calc nrn (1,0.5)
1.000000
(nrn(1,0.5) is the binomial function with 1 trial and p=0.5)

Module 4 - MATRIX REPRESENTATION OF REGRESSION MODEL

  • Using matrices we can represent the simple and multiple regression model in a compact form.
  • Definition of Matrix :A matrix is a rectangular array of elements arranged in rows and columns Or, A = [aij] for i = 1, 2, 3; j = 1, 2 (In the expression aij the first subscript always refers to the row index and the second subscript to the column index.)
  • A square matrix is a matrix with the same number of rows and columns.
  • A (column) vector is a matrix with one column. A row vector is a matrix with one row. "Vector" alone refers to a column vector.
  • The transpose of a matrix A = [aij] is the matrix A' = [aji] in which the row and column indexes have been exchanged. An alternative notation found for the transpose A' is AT.
  • Equality of Matrices: The matrices A and B are equal if they have the same dimensions and all corresponding elements are equal.
  • The matrices involved in addition or subtraction must have the same dimensions. In general, with Arxc = [aij], Brxc = [bij],

A + B = [ aij+bij ]rxcA - B = [aij - bij]rxc

  • Multiplication by a Scalar. In general, if A = [aij ] and  is a scalar ( = an ordinary number, or a symbol representing an ordinary number) A = A = [aij] To multiply by a scalar, multiply each element of the matrix by the scalar. Note that one can factor out a scalar that is a common factor of every matrix element. the order of multiplication by a scalar does not matter
  • Multiplication of a Matrix by a Matrix
  • Symmetric Matrix: A is symmetric if A = A' In regression analysis, many expressions of the type X'X or Y'Y are symmetric.
  • Diagonal Matrix: Only diagonal elements are non-zero.
  • Identity Matrix I3x3. Diagonal is 1 everything else is 1. In general AI = IA = A, so that I can be dropped in simplifying expressions.
  • A scalar matrix is a diagonal matrix with all diagonal elements the same. Multiplying by I is the same as multiplying by .
  • Vectors and Matrix with All Elements Unity. lrx1 is a column vector with all elements 1
  • Jrxr is a square matrix with all elements 1

These matrices may seem strange but they are very useful for representing sums and means in matrix notation. EX:l'l = n (where l is n x 1);ll' = J (where l is n x 1 and J is n x n)

  • Zero Vector: A vector 0 composed entirely of zeroes.

M5

4. Historical Note.

The use of multiple regression analysis as a means of controlling for possible confounding factors that may spuriously produce an apparent relationship between two variables was first proposed by G. Udny Yule in the late 1890s. In a pathbreaking 1899 paper "An Investigation into the Causes of Changes in Pauperism in England, chiefly in the last Two Intercensal Decades" Yule investigated the effect on change in pauperism (poverty rate) of a change in the ratio of the poor receiving relief outside as opposed to inside the poorhouses (the "out-relief ratio"). This was a hot topic of policy debate in Great Britain at the time. Charles Booth had argued that increasing the proportion of the poor receiving relief outside the poorhouses did not increase pauperism in a union (unions are British administrative units). Using correlation coefficients (a then entirely novel technique that had been just developed by his colleague and mentor Karl Pearson), Yule had discovered that there was a strong association between change in the out-relief ratio and change in pauperism, contrary to Booth's impression. The 1899 paper is the first published use of multiple regression analysis. In it Yule uses multiple regression to confirm the relationship between pauperism and out-relief by controlling for other possible causes of the apparent association, specifically change in proportion of the old (to control for the greater incidence of poverty among the elderly) and change in population (using population increase in a union as an indicator of prosperity). It is hard to improve on Yule's description of the logic of the method. (See also Stigler 1986: 345-361.)

Exhibit (forthcoming): Algebra of specification bias.

M6

  • EX: the model with response function E{Y} = 1,740 - 4x12 - 3x22 - 3x1x2 yields the following quadratic response surface The following exhibit is another example of a quadratic surface