REGRESSION OF NON-LINEAR FUNCTIONS:
Not all analytical methods yield straight-line plots. Regression analysis can and should still be performed on curvilinear functions as well as straight-line functions. There are basically two ways to do this.
- Mathematically convert the data such that it plots as a straight line and perform linear regression as we have learned. For example plot [x2 vs. y], [x vs. y] [log x vs. y], [log x vs. log y], etc. instead of simply plotting [x vs. y]. Often a curved plot can be rendered linear after one of these operations.
- Excel (and most scientific calculators) can perform non-linear regression directly on non-linear data. This route is usually easier.
LINEAR REGRESSION OF NON-LINEAR DATA (after converting the data):
We will consider one example here.
A very common example is a multipoint calibration with specific ion electrodes. These instruments can accurately measure concentrations of analytes over wide concentration ranges. For example, a single calibration curve can range from 1ppm analyte to 10,000ppm (a factor of 105)!
It is seen that &
According to the Nernst equation, electrodes for univalent ions, e.g., F-, Na+, etc. display (at 25ºC) 59.2 mV change for a decade (10-fold) change in analyte concentration. Divalent ion electrodes (e.g., Ca+2, SO4-2) display half this response (29.6 mV per decade).
Traditionally a linear graph can be obtained by plotting concentration on the logarithmic axis of semi-log graph paper and mV on the linear axis. The plot is linear over most of the range cited above.
We can linearize the data for use in linear regression by using the log10 conc. rather than just conc. Then linear regression can be used to solve for concentrations of unknown samples whose mV readings are measured.
In industrial laboratories both graph paper plots and regression calculators and computer programs are used. Graphs are used at regular intervals and kept to demonstrate proof of calibration. For everyday analysis, regression analysis is commonly used.
Exercise:
Using an NH3 gas-sensing electrode, the following calibration data was obtained:
NH3 as ppm N / log10 N / reading (mV)0.1 / + 95
1.0 / + 38
10.0 / - 20
100 / - 78
- Complete the middle column and enter this data on an Excel spreadsheet. Note that the independent variable (x) is log10 (ppm N) and the dependent variable (y) is mV.
- Use the Trend function to determine the conc. (ppm N) of samples. Note you must enter the
log conc. (log x) not conc. (x) values or results will be incorrect. The Trend function returns predicted log conc. values. To get sample concentrations, you must take the antilog of these values, i.e., 10log x. Be sure to include at least 2 measured sets of (log x, y) data and treat it as if it were sample data. The estimated conc. values determined from these must be the same or very close to the original data. If they are not very close, check your work for an error.
- Now fill in the following table from your spreadsheet. Assume these are sample data.
mV / log10 (ppm N) / ppm N
+ 80
0
- 25
- 70
- Apply LINEST to obtain regression results. Again note that LINEST must be applied to linear data, i.e., (log x, y), not (x, y) data. Compare it to the spreadsheet above.
- Obtain regression data via Data Analysis. Data analysis is a linear regression tool. Before applying this tool, calculate the log10 of conc. values and select these values as x-values in the Data Analysis dialog box. Eliminate the Analysis of Variance data. Compare the results to those obtained by LINEST.
PLOT THE GRAPH OF LOG Conc. vs. mV:
Select the log conc. and mV data (log x and y data) for NH3 calibration and use the Chart wizard to plot a graph. Select XY(Scatter) without any connecting lines.
Use the add Trendline tool to obtain a Linear type and include the R2 and ‘display equation on chart’ options. Set the Trendline name to custom and label it Linear Reg. Line.
To move the X-axis scale from the middle of the graph to lie below the graph, click to select the Y-axis. Note: After clicking once you must see the selection handles of the Y-axis appear and the pop-up label appear saying ‘Value (Y) axis’. Then choose Format, Selected axis. On the Format Axis dialog box, under the Scale tab, change ‘Value (X) axis crosses at:’ to the same value listed as ‘Minimum’, i.e., -150, then OK.
To move the Y-axis scale from the middle of the graph to the left of the graph, click to select the X-axis. Note: After clicking once you must see the selection handles of the X-axis appear and the pop-up label appear saying ‘Value (X) axis’. Then choose Format, Selected axis. On the Format Axis dialog box, under the Scale tab, change ‘Value (Y) axis crosses at:’ to the same value listed as ‘Minimum’, i.e., -2, then OK.
PLOT A SEMI-LOG CURVE, ADD A NON-LINEAR REGRESSION TRENDLINE:
You have just worked through the process of converting conc. values to log conc. for regression analysis and ultimately you converted the estimated log conc. of samples back to conc. This was necessary because we chose to use linear regression.
Excel also performs non-linear regression. The various types available in Excel are shown in the Add Trendline dialog box, i.e., Linear, Logarithmic, Polynomial, Power, Exponential, and Moving Average.
LOGARITHMIC REGRESSION:
From the Nernst equation, we know that ion selective electrodes will exhibit a semi-log mathematical relationship between concentration and mV. See the graph below.
Select the conc. and mV data (x and y data) for the NH3 calibration and use the Chart wizard to plot a graph. Select XY(Scatter) without any connecting lines.
Use the add Trendline feature selecting Logarithmic type and the including the R2 and ‘display equation on chart’ options. Set the Trendline name to custom and label it Log Reg. Line.
Note that Excel gives the equation of the line as y = cLn(x) + b where b is the y-intercept and c is the coefficient of Ln(x). The equation can be converted from ln to log. Recall that (2.3026log x = ln x).
Caution: Log (or Ln) can only be applied to numbers greater than 0. It is impossible to take the log of zero or negative numbers.
Recall that the log of a number is the exponent of base 10 that gives that number.
Similarly, the ln of a number is the exponent of base e that gives that number.
For example 103 = 1000. The exponent (the log) of 10 that gives 1000 is 3, i.e., log10 1000 = 3
Similarly, e2 = 7.389. The exponent (the ln) of e that gives 7.389 is 2, i.e., lne 7.389 = 2.
Recall that e is a constant of value = 2.7183 (approx.).
Any base (10, e, etc.) raised to any exponent (+ or -) cannot yield 0 or a negative number.
When your x-axis data contains a zero or negative value, the logarithmic add Trendline option is not available. Zero or negative values can only be y-values for logarithmic regression in Excel. In the case of ISE analysis, plot mV values on the y-axis and do not enter a concentration value of zero.
LOGARITHMIC REGRESSION AND A SEMI-LOG PLOT:
We are accustomed to plotting semi-log relationships on semi-log paper to produce straight-line graphs. This is easily done in Excel.
- After creating the graph shown above, drag a copy of the graph and reformat the X-axis as follows. Click to select the X-axis. Note: After clicking once you must see the selection handles of the X-axis appear and the pop-up label appear saying ‘Value (X) axis’.
- Choose Format, Selected Axis. On the Format Axis dialog box, under the Scale tab, select Logarithmic scale, OK.
Note that the regression equation is the same in the last two plots. By changing the axis from linear to logarithmic, we have only changed the appearance of the graph.
Exercise:
- Plot the following calibration data for Na+ conc. analysed by ISE on an XY(Scatter) plot. Plot two graphs, one with a linear X-axis and one with a log scale for the X-axis. You can drag and drop (CTRL+ drag) to copy a chart.
Na+ Calibration Data
meas. x / meas. y
ppm Na+ / log ppm Na+ / mV
0.1 / 20
1 / 77
10 / 134
100 / 191
1000 / 248
- Perform non-linear regression by adding a logarithmic Trendline and display R2 and the regression equation. Your chart should look like the one below. Format all its components to look the same as this example.
LOGARITHMIC REGRESSION PLOTS FOR Na+
Logarithmic Regression with a linear X-axis scale.
Logarithmic Regression with a logarithmic X-axis scale.
- Obtain regression analysis via Data Analysis for the Na+ ion calibration data.
Data analysis is a linear regression tool only. Before applying this tool, calculate the log10 of conc. values and select these values as x-values in the Data Analysis dialog box.
Na+ Calibration Datameas. x / meas. y
ppm Na+ / log ppm Na+ / mV
0.1 / -1 / 20
1 / 0 / 77
10 / 1 / 134
100 / 2 / 191
1000 / 3 / 248
SUMMARY OUTPUT
Regression Statistics
Multiple R / 1.00000
R Square / 1.00000
Adjusted R Square / 1.00000
Standard Error / 0.00000
Observations / 5.00000
Coefficients / Standard Error / t Stat
Intercept / 77 / 0 / 65535
log ppm Na+ / 57 / 0 / 65535
RESIDUAL OUTPUT
Observation / Predicted mV / Residuals / Standard Residuals
1 / 20 / 0 / 65535
2 / 77 / 0 / 65535
3 / 134 / 0 / 65535
4 / 191 / 0 / 65535
5 / 248 / 0 / 65535
The analysis of variance (ANOVA) cells have been deleted. Note the R2 is exactly 1.0 and the residuals are all zero, indicating that the data is a perfect fit. (The data is not real; it was simply calculated).
The equation of the line is y = 57x + 77, but recall that x is really log10conc.,
so the true eqn. is: mV = 57log10conc. + 77
The residuals plot (not shown here) shows all data points have zero deviation from the predicted trend line.
PREDICTING CONC. VALUES WITH THE GROWTH FUNCTION:
The ‘GROWTH’ function can be used to predict sample conc. (x) values from logarithmic data similar to the Trend function with linear data. It is actually designed only for exponential data but also works for log data provided you enter the data exactly as shown below (opposite to the prompts in the dialog box).
In the NH3 spreadsheet, use the GROWTH function to estimate ppm N concentrations (x-estimates). We will do this with the sample data used to predict sample concentration via TREND function but now we will do it directly on (x,y) values, i.e., (conc., mV) values rather than (log x, y) values, i.e., (log conc., mV).
- Select an array of cells where the estimated conc. values will appear. Click in the formula bar, then click Insert, Function, Statistical, Growth, OK.
- In the Growth dialog box, enter the cell addresses in the following order: known_x’s, known_y’s, measured (sple)_y’s, const = 1, CTRL+SHIFT + ENTER.
- This should give the same sample conc. values as obtained using the TREND function.
Caution: The GROWTH function will give incorrect predictions of mV (y-axis) values. If an estimated y-value (mV) is desired, it can be readily calculated from the regression equation.
Exercise: Use the growth function to calculate Na+ ion sample concentration values using the ‘sample data’ given previously. Your results should be the same as shown below.
Calibration Data / Sample Datappm Na+ / sple mV / sple ppm Na+ / sple mV
0.1 / 20 / 0.122 / 25
1 / 77 / 1.00 / 77
10 / 134 / 19.1 / 150
100 / 191 / 144 / 200
1000 / 248 / 1000 / 248
The GROWTH function only works for log (using reversed syntax, i.e., x,y,y,1x) and for exponential functions (using syntax as prompted, i.e., y,x,x y). Thus there will be occasions where you must use your basic algebra and rearrange an equation to solve for an unknown.
PREDICTING VALUES BY REARRANGING AND SOLVING EQUATIONS:
Consider the NH3 calibration equation, y = -25.02LN(x) + 37.60
It is a simple matter to solve this equation for values of y given values of x, since y is the isolated term.
For example, calculate y when x = 1. Do this in your head, confirm it on your calculator, and finally do it on an Excel spreadsheet (using a cell reference in place of x). (Ans. = 37.60)
In order to solve this equation for values of x (given values of y), we must rearrange the equation to isolate x. The rearranged equation is: LN(x) = (y-37.6)/-25.02 which further rearranges to x = e(y-37.6)/-25.02.
The rearranged equation can be copied into a cell and Excel will calculate values of x with the x-values you give it.
For example, calculate x when y = 37.60. Do this in your head, confirm it on your calculator and finally do it on an Excel spreadsheet (using a cell reference in place of y. (Ans. = 1)
Exercise: Enter the appropriate equation for conc. Na+ on your spreadsheet and calculate sample conc. Na+ from the sample mV readings shown above.
PREDICTING VALUES USING THE ‘GOAL SEEK’ TOOL:
As an alternative to rearranging an equation to solve for a non-isolated term, e.g., solving for x (given y) in the equation: y = -25.02LN(x) + 37.60, one can use ‘Goal Seek’ from the Tools menu.
- In an Excel spreadsheet, select a cell, e.g., B1. Click in the formula bar and type in an ‘=’ sign followed by the formula, e.g., =-25.02*LN(A1)+37.60. Note that a cell reference (A1) is used in place of x.
- In the referenced cell (A1), enter any value, e.g., 1. Note that the value in cell B1becomes 37.60, i.e., the solution of the formula (the value of y) using 1 as the x-value for A1 in the formula.
- In the Tools menu, click Goal Seek. In the Goal Seek dialog box, the cell address of the current position of the mouse is displayed in the box labeled ‘Set cell’. Replace this address with the absolute cell reference of the formula, i.e., $B$1, in this case.
- Click the mouse in the box labeled ‘To value’. Then enter a desired y-value, e.g., -20 mV, then click in the box labeled ‘By changing cell’ and enter the absolute cell reference for the current x-value ($A$1) as by clicking in cell A1; then click OK.
- The x-value that appears in cell A1 is your answer. This is the x-value that when entered in the equation will yield the target y-value. Cell B1shows the target y-value you entered. Note that you have solved the equation for an x-value without rearranging the equation to isolate x.
Goal Seek keeps changing the x-value (cell A1) until the equation yields the target y-value that you enter. Excel performs repeated calculations (‘iterations’) until the change in the result is less than a certain default value of the function. You must adjust the sensitivity to a very small (fine) setting to ensure accuracy of the calculation. This is adjusted in the Calculation tab of the Options dialog box found in the Tools menu (shown below). As a matter of course, set the Maximum change setting to 1E-15 rather than the default value of 0.001.
Exercise:
- Use the Goal Seek tool to calculate the Na+ ion sample concentrations using the previously given sample data.
- Use the Rule of Crafts and Goal Seek to calculate the normal bp (@760 mmHg) of water given that a student measured a bp of 98.9°C at 730 mmHg.
CHOOSING THE BEST TYPE OF REGRESSION FOR UNKNOWN RELATIONSHIPS:
It is important to examine a scatterplot as an aid in selecting the appropriate non-linear form. The figure below shows four ‘single-bulge’ non-linear patterns that might be observed as a scatterplot. Each pattern has a label indicating possible applicable regression types.
Equations of Functions:
Linear: y = mx + b
Logarithmic:y = cLn(x) + b
Exponential:y = cebx
Power:y = cxb
2nd Order Polynomial:y = ax2 + bx + c(a quadratic equation)
3rd Order Polynomial:y = ax3 + bx2 + cx + d(a cubic equation)
Excel can solve data for polynomial fits up to 6th order, however it is unlikely that you will ever need anything beyond a 3rd order polynomial.
Note that a 2nd order polynomial, i.e., a quadratic such as ax2 + bx + c = 0, may work for all single-bulge patterns.
If the scatterplot shows two bulges (an S shape), a cubic function (polynomial of order 3) may be appropriate, e.g., ax3 + bx2 + cx + d = 0.
To determine the best fit:
- Enter the data into a spreadsheet (x data in a column to the left of y data).
- Plot an XY(Scatter) diagram without joining lines.
- Select a data point on the chart, choose Chart, Add Trendline. From the Add Trendline dialog box choose a type and under the options tab, choose display R2 on chart and choose display equation on chart.
- Repeat this process for all possible types and compare the R2 values and observe how well the regression equation fits the data points.
Note that it is not necessary to recreate the graph each time. The chart can be copied using a control-drag mouse action and the current Trendline can be changed by clicking on it and repeating the command sequence: Chart, Add Trendline, etc.
Exercise:
Using the following data set, create a series of charts of the all types available in the Add Trendline dialog box including both 2nd and 3rd order polynomials. Identify the best-fit regression type.
CALIBRATION CURVES FOR SPECTROPHOTOMETRY:
Linear calibration curves can be handled easily using a linear Trendline. However calibration curves derived from atomic absorption spectroscopy, flame emission spectroscopy and turbidimetry often show some curvature (or are linear over very short ranges).
Exercise:
- Graph the following x and y data obtained for a flame photometer calibration curve for Na.
- Add Trendlines to determine the best-fit equation. Try all available types. Compare your results with the graphs shown below.
- Use the Goal Seek tool to check each point’s fit to the regression equation.
Na by Flame Photometer / Check
x / x2 / x3 / y / Cubic Fit/Goal Seek
ppm Na / ppm^2 / ppm^3 / Rdg / ppm / Rdg
0 / 0 / 0 / 0 / 0
5 / 25 / 125 / 0.062 / 0.062
10 / 100 / 1000 / 0.115 / 0.115
15 / 225 / 3375 / 0.16 / 0.16
20 / 400 / 8000 / 0.2 / 0.2
25 / 625 / 15625 / 0.233 / 0.233
Note that Log, Exponential and Power Trendlines are not available because the data includes the point (0,0). This is a valid data point. It was obtained by zeroing the spectrophotometer with a blank solution. The blank contained all the same reagents as the samples but no analyte was added. Explain why each of these regression types is not available.