Modeling Response Curves/Spectra with JMP - Discovery Summit Europe 2016

Silvio Miccio

Modeling Response Curves/Spectra with JMP - Discovery Summit Europe 2016

Following two detailed examples about how to model response curves in JMP PRO (please see also the presentation for reference). JMP files that have been used cannot be shared.

1. Modeling Curves/Spectra with PLS

The data are from an experiment for modeling the effect of raw material, equipment and process set-up on the force required to push a product through a packing machine. PLS itself will not be explained in detail.

1.1.Data

Factors (all categorical):

E = Equipment setting

R = Raw Material

P = Process setting

Responses:

1 – 57 position of product in the machine (usually the column names are wavelength, position, time...)

1.2.Visualize the force curves

Select Tables→Transpose

Select Columns 1 – 57 (columns with the response of the curves/spectra) → Create

Select Graph→ Overlay Plot

Select all transposed columns → Y → OK

Curves are showing the force needed to transport the product in machine direction. Objective is to find the system set-up that minimizes the required force in the entire process.

1.3.Create a Validation Column in initial data table

From each experimental condition one sample will be used for testing the model

Select Cols→ Modeling Utilities→ Make Validation Column

Add desired data split (in this case Training Set 0.825 and Validation Set 0.175)

Select Stratified Random→ID→OK

JMP creates a new column called Validation, with values Training and Validation (confirm that one run per experimental condition is assigned as Validation). Select all rows assigned Validation and exclude them.They will be later used as independent test set.

1.4.Model the Curves

Select Analyze→Fit Model

Add factors E, R and P

Add all responses from 1 - 57

Select Personality→Partial Least Squares→Run

Select Validation Method(in this case 7-fold cross validation) →OK

A model with eight latent factors was selected (by seven fold cross validation) explaining about 90 % of the variation in the responses.

The Cummulative Y plot indicates no improvement of model fit after extraction of 4 latent factors. My personal prefference is to use as few latent factors as reasonable, accepting less of the variation in X is explained. Extracting too many latent factors may overfitt the model.

Re-run PLS model with 4 latent factors

The model withfour latent factors fits the data as well as the more complex model.

Select red triangle next to NIPALS Fit with four Factors for additional information/diagnostics

The Percent Variation Plots show how much of the variation in X and Y is explained by the latent factors. Four latent factorsonly partially explain the variation in the predictors, but are doing well in explaining the variation of the responses.

The Variable Importance Plot shows the importance of the variable in the model and can be used for variable selection (also consider the absolute size of the coefficient i.e. inspect the VIP vs. Coefficients Plots).

Diagnostics Plotsto check fore.g. missing high order terms, homoscedasticity, autocorrelation… Notice that PLS has “soft” distributional assumptions.

Of particular interest for PLS response curve modeling is the Spectral Profiler. It enables prediction and visualization of the entire response curve as function of the model factors.

The red box in the profiler shows the entire response curve. At the given system set-up the force at position 26 is 5.6 units.

Setting the process set-up to V5_1 is lifting the entire force curve, indicating an undesired operating condition.

Based on the Spectral Profiler the optimum set-up of the system requires equipment option V3_0 or V3_2 in combination with process set-up V5_1. Raw material effects can be ignored (to small). Please notice that for more complex systemsoptimization can be made based on Desirability Functions.

After completion of model assessment and concluding that the model makes sense, Save Script to Data Table and Save Prediction Formula (for each modeled point of the response curve one column is added to the data table).

1.5.Model Validation

Include test (called in this example validation) set

Select Analyze→Modeling Model Comparison

Add all prediction formulas as Y, Predictor→Group the data by the Validation column

Above a part of the model comparison table (up to position 9). For each position on the response curve measures for goodness of fit are displayed, grouped by Training (data used to fit the model) and Validation (in fact in this example independent test setused only to test predictive capability of the model, not been used for fitting or selecting the model).

The R-square for the training set describes how good the model fits the data; the R-square for the test set (called here Validation) provides information about the expected capability of the model to predict new data. Since these values are quite high (apart from the start and end of the curve, which can be ignored) there is high confidence that the model is capable to predict the system behavior. The test data can be further used for doing a formal model validation (e.g. matched pair test, check slope and intercept of actual by predicted plots…)

Remark:

For repeating this task,it is suggest using the JMP example file Baltic.jmp and doing it the other way round. Model the spectral emission intensities v1 – v27 as function of the three pollutants ls, ha and dt, which is matches quite good what has been done here.

2. Functional Data Analysis

Data are based on adesigned experiment for optimizing a three-component mixture in combination with one process setting. Objective is maximization of the response.

2.1. Data

Factors:

F1, F2, F3 = Mixture Components

P = Process Settings

Responses:

0, 10, 20, 60, 300 = values of the response at different points in time

2.2. Visualization of Curves

Like done before transpose the data and use e.g. the Overlay Plot.

Curves are expected to be smooth. Hence, it should be possible to quantify parameters of the response curve via nonlinear fit.

2.3. Stack Data Table

For enabling the nonlinear fit, the data table has to be stacked.

Select Tables→Stack

Select response columns (0 – 300) →Stack Columns→call Stacked Data Columns “Y”→call Source Label Columns “Time” → OK

Data are grouped by ID for Time and Y, to enable the non-linear fit by experimental condition (run). Change modeling type of Time to numeric continuous!

2.4. Nonlinear Fit

Select Analyze→ Modeling→Nonlinear

Select Model Library

Check which model best describes your data. In this case, the Michaelis Menton Model fits quite well.

Select Show Graph→Show Points

Graph shows all data points of the experiment. Use the slider bars to find a set-up that fits the data (these will be the starting points for the non-linear fit). Once satisfied with the initial fit select Make Formula→Close.

Add saved formula as X, Prediction Formula→ID→By→OK

For each ID (experimental condition) a nonlinear fit panel is available displayed. For fitting all models simultaneously select Ctrl →Go

For each ID (experimental condition) a non-linear fit is made. Inspect all models carefully. When you are fine with them save the coefficients of the non-linear fit.

Go to one of the parameter estimates tables → right mouse click →Make Combined Data Table

Table with all parameters of non-linear fit is created.

2.5. Add Nonlinear Fit Data to Initial Data Table

To match the data table with the nonlinear model parameters with the initial data table it has to be split.

Select Tables→Split

Select Parameters→Split By→ Current Value→Split Columns→ ID →Group→OK

Data table of nonlinear fit parameters now has same format as initial data table and can be joined.

Select initial data table (Example FDA) → splinted data table (in this case Untitled 29)→ ID in both tables →Match(ID=ID) → other settings as shown above →OK

Parameters of nonlinear fit are added to the initial data table.

2.6. Modeling the Response Curve

For modeling the entire curves the coefficients theta are modeled as function of the DOX factors, instead of the actual response. Since the experiments is based on a mixture design (mixture factors are correlated) and the non-linear fit parameters theta are highly correlated again PLS is used (it is coincidence that again PLS is used; this type of modeling can be done also with other regression methods).

Select Analyze→ Fit Model → Personality → Partial Least Squares

Since experimental design was made in JMP the parametric model (which is saved as script) is automatically added.

Unfortunately PLS does not support mixture effects

Remove the mixture attribute and Run the PLS.

Notice that this is not critical, because in the column properties mixture information is still available (you will see this later when building the spectral profiler)

Since data are from a DOX I prefer to use Leave-One-Out cross validation instead of k-fold.

JMP identified a PLS model with four latent factors

Like mentioned before I prefer to be more conservative and will use a model with only two latent factors, still having an R-square comparable to the more complex model.

By carefully inspecting the model, it is evident from the residual plots that interactions are missing.

Add interactions with process setting P and re-run PLS

After adding interaction terms the residual plots are looking ok. In fact, you may also consider dropping some variables not being important for the model (not shown here).

After carefully inspecting the final model →Save Script for All Objects→Save Prediction Formula

2.7. Manually Create the Spectral Profiler

Make new column called “Time” enter the low and high value

Create additional column called “Y” and enter Michaelis Menton equation using the prediction formulas for theta!

Select Apply→OK

Select Graph→Profiler→ add column Y as Y, Prediction Formula→ check box Expand Intermediate Formulas→OK

You manually created a spectral profiler, showing in the first column the entire response curve as function of time and DOX variables.

Notice: Since the mixture constraints for F1, F2 and F3 are in the column properties, the JMP Profiler is constraint, although the mixture attribute was removed for PLS modeling! Constraint means that when changing one of the mixture factors the other mixture factors are automatically changed accordingly.

In the profiler below only F3 was changed, but like you can see F1 and F2 where automatically moved due to the mixture constraints in the column properties.

Unfortunately, I cannot share an example for this application.

Silvio Miccio