AN EXTENDED THEORY FOR SOFT MODELLING IN CHEMOMETRICS -– EXPERIMENTALLY INTERPRETED

L. Muncka*, B. Møller Jespersena, Å. Rinnana, H. Fast Seefeldta,b, M. Møller Engelsena, L. Nørgaarda and S. Balling Engelsena

a University of Copenhagen, Faculty of Life Sciences, Department of Food Science, Frederiksberg, Denmark, bAarhus University, Faculty of Agricultural Sciences, Department of Food Science, Aarslev, Denmark. *Corresponding author

ABSTRACT

An extension of chemometric theory was experimentallyis exploredsuggested to explain the physiochemical basis of “the very high unreasonable” efficiency of soft modelling of data from nature. by including the physiochemical view of self-organisation .Soft modelling was interpreted in vivo by studying the emergence of the unique chemical patterns of mutants in an isogenic barley model on endosperm tissue/cell development. Extremely reproducible, differential Near Infrared (NIR) spectral patterns specifically overviewed the effect on cell composition of each mutant cause. Extended Canonical Variates Analysis (ECVA) classified spectra in wild type, starch and protein mutants. The spectra were interpreted by chemometric data analysis and by pattern inspection to morphological, genetic, molecular and chemical information. Deterministic chemical reactions were defined in the glucan pathway. A drastic mutation in a gene controlling the starch/ß-glucan composition changed water activity that introduced a diffusive, stochastic effect on the catalysis of all active enzymes. It is concluded that “decision making” in self-organisation is autonomous and performed by the soft modelling of the chemical deterministic and stochastic reactions in the endosperm cell as a whole. Uncertainty in the analysis of endosperm emergence was experimentally delimited as the “indeterminacy” in local molecular path modelling “bottom up” and the “irreducibility” of the phenomenological NIR spectra “top down”. The experiment confirmed Ilya Prigogine’s interpretation of self-organisation by his physical model of self-organisation based on a statistical extension of quantum mechanicsdynamic computer model programmed with a soft non-local extension of quantum mechanics. (QM) offers the necessary mathematical background. Because of the irreversibility of chemical reactions his model had to include soft modelling. It implies a change from a mechanistic local hard modelling guided by the researcher to realise that “the decision maker” is intrinsic within the autonomous objects in nature and performed by global soft modellingThe significance of soft modelling in self-organisation in nature introduced by Prigogine, here interpreted in an independent experiment, introduces a paradigm shift in macroscopic science that forwards a major argument for soft mathematical modelling and chemometrics to obtain full scientific legitimacy.

. This new view on soft modelling introduces a paradigm shift in science that forwards a major argument for chemometrics to obtain an increased scientific legitimacy. In this paper soft modelling of self-organisation was experimentally interpreted. Endosperms in an isogenic barley mutant experiment were surveyed by Near Infrared Spectroscopy (NIRS) and classified by Extended Canonical Variates Analysis (ECVA). Extremely reproducible, differential spectral patterns specifically summarized the physiochemical gene expression for each mutant cause. The spectral data were validated to chemical composition and further evaluated by spectral inspection, single wavelength correlations and by Partial Least Squares Regression (PLSR) predictions. The uncertainty in self-organisation that limits access to data was experimentally defined as the “irreducibility” of the phenomenological NIR spectra and the “indeterminacy” in local molecular path modelling. In the future the goal should be to integrate the mathematics of chemometrics, statistics, thermodynamics and quantum mechanics in a model to work unsupervised directly on experimental data in biology.

KEYWORDS: Soft modelling as an universal model; Physiochemical interpretation of statistical elements in data; High Content Analysis of Gene Expression; Chemistry in chemometric theory; Self-organisation in endosperm modelled by NIR spectroscopy; Chemical interpretation of stochastic, deterministic, uncertainty and causal elements in soft modelling; Extension of chemometric models.Endosperm Near Infrared Spectroscopy (NIRS) Mutant Model; Scientific Legitimacy of Chemometrics.

1. INTRODUCTION

1.1. Keeping the chemical aspect alive and vital in chemometric modelling

This paper deals with chemometric theory related to experimental data from natural phenomena aiming at forming a platform for unifying the void between the micro and macro aspect from physics and chemistry to biology. Definitions of the derived concepts are displayed in Table 1.. The mathematician Eugene Wigner in 1960 focusing on simplicity in physics, was intrigued by “The unreasonable effectiveness of mathematics in natural sciences” [1]. Soft-modelled chemometric data analysis including measurement equipment is a successful technology [2].If chemometrics in its historical development had been limited to follow current scientific (and statistical) theories there would have been minimal progress in its wide applications today. The pragmatic theory of chemometrics is simple and effective. The real world is under indirect observation. A data-matrix such as spectral wavelengths as X´s and chemical variables as Y´s can be resolved as orthogonal linear patterns/PC´s/functional factors that can be handled by classical statistics and visualised in graphs. . Leading experienced chemometricians look on themselves as chemists and are aware of the simplifications introduced by chemometrics. They have an extremely fair, honest and humble attitude to complex problem solution in nature. However, chemometrics may tempt inexperienced scientists to use its tools mechanically without any deeper knowledge of chemical and biological theory. The warning from Wold and Sjöström in Chemolab in 1998 [32] is more burning than ever: “To continue the excitement of chemometrics, we must be sure not to separate chemometrics from chemistry”.

How should chemistry widely defined be able to have a stronghold in the validation of results from chemometric data analysis when chemometric models deal with latent variables that seem to hide chemical information? The answer may rest in explaining the reason for “the unreasonable efficiency” of chemometric applicationss formulated in physical, chemical and biological terms and include that knowledge in chemometric theory. and applications. By merging chemometric and chemical theory in an extension to the present pragmatic chemometric theory on soft ,modelling, a new scientific legitimacy could be reached.

1.2. Interpreting the possibilities and limits of mathematical models to represent natural phenomena

There is a strong anthropocentric tradition in hard deterministic modelled mathematics [4 p.5-6 ] by theoretical construction as a stand-alone analogy to natural phenomena that is opposed to the inductive participative research strategy in soft-modelled chemometrics by measuring first and hypothesising afterwards including outlier detection that makes induction scientific [5]. The deterministic tradition is focused on the free incentive of the researcher to hypothesise, with strong emphasis on the purification of the underlying mechanics of the seemingly reproducible deterministic phenomena and laws of nature by hard mathematical modelling. It creates a theoretical, virtual stand alone world of hard mathematical modelling in science far from biological meaning not unlike that in Atomic Physics [6], Complex Systems Mathematics [7] and in Systems Biology [8]. It further causes a deep mistrust in getting involved in the macro state of life by measuring and evaluating data tapped from nature by soft modelling [5].This is far from the excitement and optimism in chemometrics in approaching data from nature [3].

On the other side, at the roots of chemometrics introduced by the early economists, there is the concept of autonomy – “…that the gates in a diagram represents independent mechanisms” (Pearl 2000) [9]. Biological networks (individuals, samples) are composed of many different sub-systems together that should be characterised as unique objects. Herman Wold solved the problem of modelling unique objects in his discovery of the self-modelling NIPALS (Non Linear Iterative Partial Least Squares) [10] algorithms. It facilitates classification of more or less autonomous self-organised objects characterised as patterns of variables that is central in chemometrics. Behind the autonomy in nature physically, chemically and biologically interpreted, there is the underlying force of chemical affinity and catalysis that was named and anticipated already by J.J. Berzelius [11] in 1837. Ilya Prigogine, Nobel laureate in chemistry 1977 explained in his physical-chemical theory on dissipative systems [12 p.1-56, 13] how deterministic individuals (e.g. a plant species) could self-organise as irreducible phenomena, formed by irreversible probabilistic reactions through chemical affinity.

The classical mathematician Herman Weyl [14] argued that theoretical construction is not the only approach to the phenomena of life. Another, approach - that of “understanding from within by interpretation” is open to us. We aim at in a designed experiment to interpret “from within” physiochemical analogies to elements in mathematical modelling of data. The analogies are defined in the Definitions Box in Table 1 visualised in Figure 7. They represent the trivial phenomenon of self-organisation (e.g. a plant endosperm)“from within” by chemical and molecular analyses and concepts and from outside by spectral phenomenological overviews in a “bottom up-top down” dialogue mediated by chemometrics and data inspection.

It now stands clear that the best way “not to separate chemometrics from chemistry” is by experimental design [2,3] where specific chemical/biological questions are asked e.g. by spectroscopy to plant phenotypes in nature. Chemometric models are like all mathematical models approximate. However, Near Infrared (NIR) spectra [4] from a population of cells (tissues) of an organism constitute a phenomenological physical model of self-organisation that is an almost deterministic expression of nature when genetics and environment are controlled.

A NIR spectrum [4, 5] is a “stand alone” unique combination of a finely tuned physiochemical expression of a natural phenomenon recorded by the hardware of the spectrometer and gently polished by the mathematics of MSC/SNV/2nd derivative scatter corrections [6]. The surprisingly high representational power of each single NIR-spectral pattern has been overlooked because of simplistic statistical probabilistic reasoning that tends to disregard individual seemingly deterministic elements. However, the validity of drawing chemical conclusions from single NIR-IR spectra of samples measured under controlled conditions is since long time accepted by spectroscopists [7].

1.2. A paradigm shift in science from the first principles of mathematics to the autonomous soft modelling in self-organisation

There is a strong anthropocentric tradition in deterministic modelled mathematics [8 p.5-6 ] that is against the inductive research strategy in chemometrics by measuring first and hypothesising afterwards. This tradition is focused on the free incentive of the researcher in command, with strong emphasis on the purification of the laws of nature by hard modelling. This, however, creates a theoretical, virtual stand alone world of hard mathematical modelling in science not unlike that in Atomic Physics [9] and Systems Biology [10]. It further causes a deep mistrust in getting involved in the macro state of life by measuring and evaluating data tapped from nature by soft modelling [4]. This is far from the excitement and optimism in chemometrics in approaching data from nature [2].At the roots of chemometrics introduced by the early economists there is the concept of autonomy – “…that the gates in a diagram represents independent mechanisms” (Pearl 2000) [11]. On the other hand biological networks (individuals, samples) are composed of many different sub-systems together that should be characterised as unique objects. Herman Wold solved the problem of modelling unique objects in his discovery of the self-modelling NIPALS (Non Linear Iterative Partial Least Squares) [12] algorithms that facilitates classification of more or less autonomous objects characterised as patterns of variables that is central in chemometrics.

Behind this autonomy in nature physically, chemically and biologically interpreted, there is the underlying force of chemical affinity and catalysis that was named and anticipated already by J.J. Berzelius [13] in 1837. Ilya Prigogine, Nobel laureate in chemistry 1977 explained by his theory on self-organisation [14 p.1-56, 15] how deterministic individuals (e.g. a plant species) could arise as irreducible phenomena formed by irreversible probabilistic chemical reactions by affinity in autonomous objects. Prigogine realised that both the classical trajectory/pathway and the quantum mechanics (QM) theories were incomplete [14 p.1-56, 5]. In Bohr’s classical modelling of local QM the action of an observer was introduced to choose the hard deterministic solutions of the Schrödinger equation for wave functions. The introduction of irreversible persistent chemical reactions in self-organisation leading to a formation of unique structures called for a transformation from local wave functions in QM to a non-local statistical non-distributional description of QM ensembles. Prigogine comments [14 p 131]: “We now arrive at a realistic description of quantum theory because we now know that the transition from wave functions to ensembles can be understood as Poincaré resonances (section 4.2.1) without the mysterious intervention of an observer. The actor is “probability itself” [14 p.132] that was confirmed in computer simulations [15]. It explains why it is no longer in nature a sharp distinction between causal necessity and chance [14 ].

Prigogine´s mathematical [14 p.129-151] extension of classical QM closed the void between the micro and macro states in the theory of physics by formulating a dynamic theoretical mathematical model of the creative force of irreversibility in self-organisation. The concept of self-organisation is a paradigm shift away from classical hard modelled, deterministic, anthropocentric mathematics [8,10] in science to soft modelling by ”probability itself” in the autonomous self-modelling networks of nature [14 p.1 - 56]. This paradigm shift is fully comparable with the change away from the geocentric model of the place of planet earth in the universe in the 17th century. We aim at fulfilling Prigogines theoretical work by demonstrating in a designed experiment how the concepts in self-organisation is able to extend the theory of chemometrics by merging mathematical modelling with physical/chemical understanding.

1.3. From the first principles of mathematics to the autonomous soft modelling in self-organisation

We will here give a short background on how Prigogine arrived to his time irreversible dynamic computer model on the physics of self-organisation including detailed references to his most recent book [12]. It is a sign of strength of the classical hard mathematical modelling strategy by separate analogies to nature that Prigogine succeeded to dig deep down into the general theory of self-organisation relevant to biology and was able to break out from the classical hard reductionistic science from within by realising the constructive role of soft modelling. Prigogine realised that both the classical trajectory/pathway and the quantum mechanics (QM) theories were incomplete [12 p.1-56, 5]. A key concept in Prigogine’s mathematics on self-organisation was to find a solution for the indeterminacy of deterministic trajectories (pathways) in classical physical Newtonian dynamics that builds on Poincare´s theorem “on the non-integrability of dynamic systems due to resonances between the degrees of freedom” [12 p.39]. This concept is explained as follows: