13–Consequences of the Log Transformation
13. 1 Estimating the typical value of a single population
Example: Mercury Concentrations of Minnesota Walleyes
Data File: Walleyes (1990-1998) Major Waterways
These data come from walleyes sampled from major waterways in Minnesota during the years 1990 – 1998. One the major characteristics of interest to fishery biologists is the mercury contamination (in parts per million or PPM) found in the tissues of walleyes.
We begin by examining a histogram and summary statistics for the mercury contaminations found in the sampled walleyes.
Clearly the distribution of mercury concentrations is extremely skewed to the right. For variables with markedly skewed distributions the median is generally a better measure of typical value than the mean because the mean is inflated by the extreme cases in the tail of the distribution. The median mercury contamination found in the sampled walleyes is .25 ppm, while the mean is .365 ppm. When working with an extremely skewed right distribution it is common practice to work with the characteristic of interest in the logarithmic scale, the base of which is unimportant.
To transform a variable in JMP you can use the JMP Calculator which allows you to perform a variety of data transformations and manipulations. To create a new column containing a function of another column double-click to the right of the last column to add a new column to the spreadsheet. Next double-clickat the top of the column to obtain the Column Info window. In the window change the name of the new column to log10(Hg) and select Formula from the New Property pull-down menu and click Edit Formula.
The Column Info box in JMP
The JMP Calculator should then appear on the screen. To take the base 10 logarithm of the HGPPM variable, first select Transcendental from the menu to the right of the calculator keypad because the logarithm is a transcendental (non-algebraic) function. In the list that appears in the rightmost menu select base 10 logarithm (i.e. log10). In formula window you should see log10. Now you need supply the name of the variable you wish to take thelogarithm of, which is HGPPM in this case by selecting it from the variable list on the left of the calculator window.
The JMP Calculator
When finished the formula window will then look like:
Log10(HGPPM)
Finally click Apply and close the calculator window. The new column you created should now contain the base 10 logarithm of the mercury concentrations. The histogram and summary statistics for the log 10 Hg readings are shown below. We can clearly see approximate normality has been achieved through the log transformation.
Histogram, Boxplot, and Normal Quantile Plot for log10(Hg)
Summary Statistics for log10(Hg)
Here we see that both the median and mean are approximately -.600 ppm in the log base 10 scale.
Back-Transforming the Mean and Median to the Original Scale
We can back-transform the mean and median values for the log base 10 mercury level as follows:
Median back-transformed to the original scale = = .250 which is the median we found when looking at the data in the original scale above! This is an extremely important fact.
Mean back-transformed to the original scale = = .2513 which is well below the sample mean in the original scale above! This is an very important observation also.
What we have seen is that the median of the data in the original scale is the same as the back-transformed median of the data in the log scale. Put another way, we see that the log base 10 of the sample median in the original scale is the same as the sample median of the data in the log base 10 scale.
However, the mean in the original scale is NOT the same as the back-transformed mean of the data in the log scale. In other words, we see that the log base 10 of the sample mean in the original scale is NOT the same as the sample mean of the data in the log base 10 scale.
If we define the following:
pop. mean (original scale) pop. median (original scale)
= pop. mean (log scale) pop. median (log scale)
For the median we have:
or equivalently,
This is also holds true for the population medians as well, and.
In contrast for the mean we have for the sample mean that and for the population mean,.
If we have a symmetric distribution after log transformation the median in the log scale is the same as the mean in the log scale. Thus the any inferences (e.g. CI’s & hypothesis tests) made for the mean in the log scale can thought of as inference for the median in the log scale as well!
Using the above notation above we have:
if the distribution in the log scale is symmetric. For example, if the log transformed values are approximately normal then we have symmetry, because the normal distribution is symmetric.
Confidence Interval for the Typical Mercury Level
For our example we have a 95% CI for, the population mean Hg concentration in the log scale (and the population median ()) is given by (-.633, -.567).
From JMP
Back-transforming the endpoints of this interval to the original scale gives the following interval (.233 ppm, .271 ppm). THIS IS A CONFIDENCE INTERVAL FOR THE POPULATION MEDIAN IN THE ORIGINAL SCALE! (Again this is because of the fact that the median in the original scale).
So we estimate that the median mercury level found in walleyes in major fisheries in Minnesota is between .233 ppm and .271 ppm with 95% confidence.
Hypothesis Testing
Suppose we wish to test:
The typical mercury level of walleyes in MN .20 ppm
The typical mercury level of walleyes in MN > .20 ppm
Because our data is so right skewed the typical mercury level is best measured by the population median. To make an inference for the median for right-skewed data we can use the log transformation again. Restating our hypotheses in the log scale we have:
(Note: )
The typical log mercury level of walleyes in MN -.699 log base 10 ppm
The typical log mercury level of walleyes in MN > -.699 log base 10 ppm
Using the Test Mean... option from the log10(Hg) pull-down menu we obtain the following results.
We have extremely strong evidence against the null hypothesis in favor of the alternative hypothesis. Hence we would conclude that the median Hg concentration (original scale) found in Minnesota walleyesexceeds .20 ppm.
13. 2 - Comparative Analyses in the Log Scale
We have seen that the consequence of the log transformation for single population inference is that our inferences are being made about the median in the original scales vs. the mean. When comparing two (or more) populations where the variable of interest has a right-skewed distribution the log transformation again is frequently used. The consequences of the log transformation on comparative analysis are similar in nature to the single population case discussed above. Our inferences will be about how the population medians compare in the original scale.
Example: Mercury Levels in Walleyes from FishLake vs. IslandLake
Data File: Walleyes Fish vs. Island
The key property of logarithms we will be using in our discussion is as follows:
i.e. the differences of two variables, x and y, in the log scale is equivalent to the log of their ratio.
Comparative Analyses in JMP
Comparative Display and Summary Statistics in the Original Scale
Comparative Display and Summary Statistics in Log 10 Scale
Comparing the Population Variances/Standard Deviations (log 10 scale)
Independent Samples Test for Comparing Means/Medians (log 10 scale)
A 95% CI for () or equivalently () is given by
(.484, .728). Using the fact that and the difference of logarithms property, we find that this is also a confidence interval for the following:
So (.484, .728) is a confidence interval for the log base 10 of the ratio of the population median Hg level for Island Lake to the population median Hg level for Fish Lake Flowage. If we back-transform the endpoints of this interval we will obtain a confidence interval for the ratio of medians in the original scale, i.e. .
Doing this we obtain:
. Therefore we estimate with 95% confidence, that the median Hg level found in walleyes from IslandLake is between 3.05 and 5.35 times larger than the median Hg level found in walleyes from Fish Lake Flowage. Walleyes in Island Lake Reservoir have between 3.05 and 5.35 times as much mercury found in their tissues on average as those found in Fish Lake Flowage.
We will see this type of comparative analysis of data in the logarithmic scale when we examine pair-wise comparisons in ANOVA in Section 14 of the notes.
1