from Encyclopedia of Social Science Research Methods (Lewis-Beck, Bryman, and Liao, eds.). Thousand Oaks, CA: Sage 2003.

Normalization

by Paul T. von Hippel

In many popular statistical models, we assume that some component of a variable Y has a normal distribution. For example, in the linear regression model Y = + X + , we typically assume that the error term  is normal. Although minor departures from normality may be acceptable, distributions with heavier-than-normal tails can compromise statistical estimates. In such cases, it may be preferable to transformY so that the pertinent component is closer to normality. Transforming a variable in this way is called normalization.

If the pertinent component of Y has one heavy tail (skew), then we often apply a power transformation. True to their name, power transformations raise Y to some power p (i.e., they transform Y into Yp). Powers greater than 1 reduce negative skew; an example is the quadratic transformation Y2 (p = 2). Powers between 0 and 1 reduce positive skew; an example is the square-root transformation or (p = .5), which is common when Y represents counts or frequencies. For a power of 0, the power transformation is defined to be log(Y), which reduces positive skew in much the same way as a very small power. Negative powers have the same effect as positive powers applied to the reciprocal and are used when the reciprocal has a natural interpretation—as when Y is a rate (events per unit time), so that is the time between events.

In sum, the family of power transformations can be written as follows:

Power transformations assume that Y is positive; if Y can be zero or negative, we commonly make Y positive by adding a constant. There are formal procedures for estimating the best constant to add, as well as the power p that yields the best approximation to normality (Box & Cox, 1964). However, the optimal power and additive constant are usually treated only as rough guidelines.

If the pertinent component of Y has two heavy tails (excess kurtosis), we may use a modulus transformation (John & Draper, 1980),

,

which is a modified power transformation applied to each tail separately. Non-negative powers p less than 1 reduce kurtosis, while powers greater than 1 increase kurtosis. Again, there are formal procedures for estimating the optimal power p (John & Draper, 1980). If Y is symmetric around 0, then the modulus transformation will change the kurtosis without introducing skew. If Y is not centered at 0, it may be advisable to add a constant before applying the modulus transformation.

Other normalizations are typically used if Y represents proportions between 0 and 1: the arcsine or angular transformation , the logit or logistic transformation, and the probit transformation , where is the inverse of the cumulative standard normal density. The logit and probit are better normalizations than the arcsine. On the other hand, the arcsine is defined when Y=0 or 1, whereas the logit and probit transformations are not.

Even the best transformation may not provide an adequate approximation to normality. Moreover, a transformed variable may be hard to interpret, and conclusions drawn from it may not apply to the original, untransformed variable (Levin, Liukkonen, & Levine, 1996). Fortunately, modern researchers often have good alternatives to normalization. When working with non-normal data, we can use a generalized linear model that assumes a different type of distribution. Or we can make weaker assumptions by using statistics that are “distribution-free” or nonparametric.

In addition to the definition given here, the word normalization is sometimes used when a variable is standardized. It is also used when constraints are imposed to ensure that a system of simultaneous equations is identified (e.g., Greene, 1997).

Paul T. von Hippel

References

Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society (B),26(2), 211252.

Cook, R. D., & Weisberg, S. (1996). Applied regression including computing and graphics. New York: Wiley.

Greene, W. H. (1997) Econometric analysis (3rd ed.). Upper Saddle River, NJ: Prentice Hall.

John, N. R., & Draper, J.A. (1980). An alternative family of transformations. Applied Statistics,29(2), 190197.

Levin, A., Liukkonen, J., & Levine, D. W. (1996). Equivalent inference using transformations. Communications in Statistics, Theory and Methods,25(5), 10591072.

Yeo, I.-K., & Johnson, R. (2000). A new family of power transformations to improve normality or symmetry. Biometrika,87, 954959.