# Functional Neuroimaging Data Mining

2018

Vol.3 No.2:6 iMedPub Journals

Journal of Translational Neurosciences

ISSN 2573-5349

DOI: 10.21767/2573-5349.100019

Radu Mutihac*

Funcꢀonal Neuroimaging Data Mining

Department of Physics, University of Bucharest, Bucharest 077125, Romania

Abstract

*

Data mining, alternaꢀvely denominated as knowledge discovery from data, is a relaꢀvely young and fast-growing interdisciplinary scienꢀꢁc ꢁeld. The contribuꢀon hereaꢂer criꢀcally underpins the main approaches and trends in data mining applied to funcꢀonal neuroimaging analysis.

Corresponding author: Radu Muꢀhac

muꢀhac@gmail.com

Department of Physics, University of Bucharest, Bucharest 077125, Romania.

Keywords: Data mining; Exploratory data analysis; Cluster analysis; Mulꢀvariate regression; Funcꢀonal brain imaging; Staꢀsꢀcal parametric mapping

Tel: +4072-702-0772

Fax: +4021-315-9249

Received: March 21, 2018; Accepted: May 07, 2018; Published: May 14, 2018

Citation: Muꢀhac R (2018) Funcꢀonal

Neuroimaging Data Mining. J Transl

Neurosci Vol.3 No.2:6

Introducꢀon

Exploring physical world is aiming to discover the structure in experimental data and to reveal the underlying processes from which the acquired data have originated. In most pracꢀcal cases, useful informaꢀon comes out by processing raw data only. if informaꢀon is encoded as bits in a computer memory or transcribed engines and protein structures in a human cell.

Nevertheless, disputes have etymologically stemmed, in the sense that there is “no science” in CS as far as CS is not concerned with observing nature. In this respect, parts of CS are engineering

(more pracꢀcal), and parts are mathemaꢀcs (more theoreꢀcal).

Contrarily, science refers to laws of nature and natural phenomena, whereas phenomena involved in CS are man-made

[4]. Apart from a consistent deꢁniꢀon unanimously agreed upon,

CS is commonly accepted to consist of some theoreꢀcal and pracꢀcal subdomains, as follows.

Besides, real-life measurements provide unknown mixtures of interesꢀng and uninteresꢀng signals. A popular saying goes like we are living in the informaꢀon age”, yet we are actually living in the data age” [1]. And sꢀll, data mining might be regarded as a direct development of informaꢀon technology looking for novel approaches in data processing.

Funcꢀonal neuroimaging data yield valuable informaꢀon on physiological processes, yet being oꢂen aﬀected by several arꢀfacts including noise, and generally acquired as collecꢀons of unknown mixtures of signals variably summing up in ꢀme and/ or space. In several cases, even the nature of signal sources is a quesꢀon of debate. In most cases, reliable and full informaꢀon is missing, so that a reasonable esꢀmaꢀon of plausible soluꢀons to idenꢀꢁcaꢀon of original signals falls in the large class of blind source separaꢀon (BSS) methods [2]. a) Computaꢀonal complexity theory: addressing fundamentals of computaꢀonal and intractable problems; it is highly abstract; b) Computer graphics: dealing with real-world computerassisted visual applicaꢀons; c) Programming language theory: invesꢀgaꢀng analyꢀcal

Fundamental Concepts approaches to programming;

Some fundamental concepts in data analysis are loosely deꢁned hereaꢂer in order to introduce the general context of our discussion. d) Computer programming: exploring programming languages and complex systems; e) Human-computer interacꢀon: designing usable computers

Computer Science (CS) though controversial debates exist on the meaning of “computer science”, we accept CS as the and computaꢀons universally accessible to humans. scienꢀꢁc discipline and pracꢀcal approach to computaꢀon and The ꢁeld of Arꢀꢁcial Intelligence (AI) has emerged and developed its applicaꢀons [3]. It entails a systemaꢀc exploraꢀon of the assuming that a speciꢁc property of humans, that is, intelligence, feasibility, structure, expression, and formal reasoning of the can be suﬃciently well de- scribed to the extent that it can be protocols that reﬂect the acquisiꢀon, representaꢀon, processing, mimicked by a machine. As such, philosophical issues arise storage, communicaꢀon of, and access to informaꢀon, either on the nature of the human mind and, furthermore, on the © Under License of Creative Commons Attribution 3.0 License | This article is available in:

12018

Vol.3 No.2:6

ARCHIVOS DE MEDICINA

Journal of Translational Neurosciences

ISSN 2573-5349

ISSN 1698-9465 ethics of creaꢀng arꢀꢁcial systems endowed with human-like and study of computer algorithms that are automaꢀcally selfintelligence, issues which have been addressed by myth, ꢁcꢀon, improving through experience in that they can learn and make and philosophy since anꢀquity [5]. AI has been the subject of predicꢀons on data. Such algorithms overcome following strictly opꢀmism by its incepꢀon, sꢀll it has also crossed setbacks ever staꢀc program instrucꢀons by making data-driven predicꢀons since. For the ꢀme being, AI consꢀtutes a major component or decisions, through building a model from sample inputs. The of technology and poses several challenging problems at the core of ML deals with representaꢀon and generalizaꢀon. Feature forefront of research in CS. learning or representaꢀon learning is a set of techniques that learn a transformaꢀon of “raw” inputs to a representaꢀon that can be eﬀecꢀvely further exploited in supervised/unsupervised learning tasks. Generalizaꢀon is the property that the system will perform well on unseen data instances; the condiꢀons under which this can be guaranteed are a key object of study in the subꢁeld of computaꢀonal learning theory. Contrarily, tradiꢀonal staꢀsꢀcal techniques are not adapꢀve but typically process all training data simultaneously before being used with new data.

In a similar key, Tom Mitchell, another learning researcher, proposed back in 1977 a more precise deꢁniꢀon for ML in the case of well-posed learning problem: “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E” [10]. In a broader view, ML is closely related to and overlaps with computaꢀonal staꢀsꢀcs, which focuses on predicꢀon-making through the use of computers, too. It is also related with mathemaꢀcal opꢀmizaꢀon, which provides theory, methods, and applicaꢀon domains to the ꢁeld. ML is someꢀmes conﬂated with DM regarding predicꢀon, although the laꢃer overlaps more on exploratory data analysis

(EDA) sharing much in common in terms of goal and methods.

Mechanical or formal reasoning has been introduced by philosophers and mathemaꢀcians as well since anꢀquity, too.

The study of logic has established a milestone in our society by the creaꢀon of the digital electronic computer. The starꢀng point was marked by the Turing machine [6]. Turing’s theory of computaꢀon demonstrated that a programmable machine may simulate any act of mathemaꢀcal deducꢀon by manipulaꢀng simple symbols like ”0” and ”1”. By the same ꢀme, discoveries in cyberneꢀcs, informaꢀon theory, and signiꢁcant advances in neurology, oriented the interest of the scienꢀꢁc community towards evaluaꢀng the feasibility of designing an electronic brain.

Data Mining (DM) equates to the extracꢀon of implicit, not a priori known, and potenꢀally valuable informaꢀon from raw data

[7]. The underlying idea in DM is to build up computer programs that seeks for regulariꢀes or paꢃerns through databases. Anyway, real data are imperfect, incomplete, corrupted, conꢀngent on accidental coincidences, and some of no interest whatsoever, leading to spurious and inexact predicꢀons. Some excepꢀons will sꢀll exist to all rules, as well as cases not covered by any rule.

Therefore, the algorithms involved in DM must be robust enough to cope with imperfect data, yet capable to idenꢀfy inexact but useful regulariꢀes [8]. As an analyꢀc process, DM is conceptually designed to work in three stages as it follows hereaꢂer.

Methods

In pracꢀce, raw data per se are generally of liꢃle (if any) immediate usage. It is only when informaꢀon is extracted via processing that makes data meaningful. a) Iniꢀal exploraꢀon of data in search of consistent paꢃerns, as well as systemaꢀc relaꢀonships among variables. The process may involve data cleaning, data transformaꢀons, and data space reducꢀon to a proper subspace by feature selecꢀon, thereby reducing the number of variables to a meaningful and manageable range,

Data analysis

Back in ’60s, Tukey advocated that classical staꢀsꢀcs leaning on analyzing small, homogeneous, staꢀonary data by means of known distribuꢀonal models and assumpꢀons will prove inappropriate to deal with the problems raised by the analysis of large amount and complex data” [11]. The reason invoked was the qualitaꢀve diﬀerence that might exist between pracꢀcal larger and larger data sets at hand and common smaller ones rather than strictly the size [12]. Consequently, funcꢀonal neuroimaging data analysis should primarily rely on methods circumscribed to both DM and EDA. Moreover, Huber stated that “. . . there are no panaceas in data analysis” [12]. It comes out that an opꢀmal choice is domain-dependent (Figure 1). In order to make it clear, mathemaꢀcal problems are considered ill-posed if they do not saꢀsfy each of the three criteria: b) Developing models based on paꢃern idenꢀꢁcaꢀon and staꢀsꢀcal assessment of ꢁndings by means of detected paꢃerns applied to new data subsets. Ranking models and choosing the best one based on their predicꢀve performance, that is, explaining the variability in data and providing consistent results across samples, c) Deployment and predicꢀon by using the model previously chosen as best to disclose knowledge structures aiming to guide further decisions under condiꢀons of limited (if any) accessibility and/or certainty.

Machine Learning (ML) is conceptualized as the technical basis of DM aiming to discover and describe structural paꢃerns in data.

In other words, ML is perceived as the acquisiꢀon of structural descripꢀons from examples, further employed for predicꢀon, explanaꢀon, and understanding. Historically, ML was considered by Arthur Samuel the subꢁeld of Computer Science (CS) that gives computers the ability to learn without being explicitly programmed” [9]. Speciꢁcally, ML refers to the construcꢀon a) a soluꢀon exists, b) it is unique, and c) it depends conꢀnuously of the iniꢀal data.

To solve ill-posed problems, well posed ness must be restored by restricꢀng the class of admissible soluꢀons [13].

This article is available in:

22018

Vol.3 No.2:6

ARCHIVOS DE MEDICINA

Journal of Translational Neurosciences

ISSN 2573-5349

ISSN 1698-9465 a) inspecꢀng the distribuꢀon of variables,

Data mining and exploratory data analysis b) comparing the coeﬃcients of the correlaꢀon matrices with

DM is an iteraꢀve process of exploring and modeling large amounts of data aiming to discover baseline paꢃerns and relaꢀonships among signiꢁcant variables. As such, DM is called to idenꢀfy trends, predict future events, and assess various courses of acꢀon that improve system performance. DM is a mulꢀdisciplinary ꢁeld imporꢀng and boosꢀng ideas and concepts from diverse scienꢀꢁc areas like staꢀsꢀcs, signal and image processing, paꢃern recogniꢀon, mathemaꢀcal opꢀmizaꢀon, and computer vision. Extracꢀng non-explicit knowledge is fostered by advances in several disparate and oꢂen incongruous domains, such as bioinformaꢀcs, DNA sequencing, e-commerce, knowledge management remote sensing images, stock investment and predicꢀon analysis, and real-ꢀme decision making [14]. meaningful thresholds, and c) inspecꢀng mulꢀ-way frequency tables.

Some frequent approaches in mulꢀvariate exploraꢀon are listed hereaꢂer. a) Principal Component Analysis (PCA) [20] b) Independent Component (Subset) Analysis (ICA) [21] c) (Fuzzy) Cluster Analysis (FCA) [22,11] d) Factor Analysis (FA) [23] e) Projecꢀon Pursuit (PP) [24] f) Discriminant Funcꢀon Analysis (DFA) g) Parꢀal Least Squares (PLS)

The main applicaꢀons of DM span a large range of issues in signal processing like adapꢀve system modeling and informaꢀon mining including applicaꢀons on biomedical data [15], visual data mining [16], scale-space analysis with applicaꢀons in image segmentaꢀon [17], chemometrics including arꢀꢁcial neural networks (ANNs) [18], characterizaꢀon of protein secondary structure [19], and many more. h) Mulꢀdimensional Scaling i) Log-linear Analysis j) Canonical Variate Analysis k) Correspondence Analysis

Exploratory data analysis (EDA) consists of large set of techniques that deal with data informally and disclose structure quite straighꢄorward. Data probing is primarily stressed upon, in many cases prior to their comparison with any parꢀcular probabilisꢀc models. Such methods are opꢀmal compromises in many circumstances and quite near to opꢀmal soluꢀon for each individual case. In a typical exploratory approach, several variables are criꢀcally considered and thoroughly compared by means of diverse techniques in search of systemaꢀc paꢃerns in data. In a more general sense, computaꢀonal exploratory data analysis comprises various methods from a large spectrum ranging from simple basic staꢀsꢀcs to advanced mulꢀvariate exploratory techniques. Basic staꢀsꢀcal exploratory analysis includes techniques like: l) Time Series Analysis m) Classiꢁcaꢀon Trees n) Stepwise Linear and Nonlinear Regression o) Conꢀnuum Regression p) Mulꢀvariate Linear Model (MLM) q) General Linear Model (GLM)

Analyꢀcal techniques encompass graphical data visualizaꢀon techniques that can idenꢀfy relaꢀons, trends, and biases usually hidden in unstructured data.

In funcꢀonal brain imaging, EDA methods can idenꢀfy interesꢀng features reporꢀng on brain acꢀvaꢀons which may be not anꢀcipated or even missed by the invesꢀgator. Whereas EDA performs ﬂexible searching for evidence in data, conꢁrmatory data analysis (CDA) is concerned with evaluaꢀng the available evidence. In imaging Neuroscience, there is a permanent dynamic interplay between hypothesis generaꢀon on one hand, and hypothesis tesꢀng on the other hand, that can be regarded as a Hegelian synthesis of EDA and CDA [25]. Furthermore, arꢀfactual behavior idenꢀꢁed easily by EDA may raise quesꢀons on

1. data appropriateness,

2. the necessity of addiꢀonal preprocessing steps, or

3. introducꢀonofspuriouseﬀectsbythepreprocessingemployed.

By all means, conꢁrmatory methods are a must for controlling both type I (false posiꢀves) and type II (false negaꢀves) errors, yet their staꢀsꢀcal signiꢁcance is meaningful if both the chosen model and the distribuꢀonal assumpꢀons are correct only.

Hypothesis-driven versus model-driven univariate/

Figure 1 mulꢀvariate data analysis with beneꢁts (+) and piꢄalls

(-): PCA, FA, PP, CR, ICA, CA, FCA, CVA, PLS, MLM, GLM

(abbreviaꢀons explained in text).

DM is heavily based on staꢀsꢀcal concepts including EDA

© Under License of Creative Commons Attribution 3.0 License

32018

Vol.3 No.2:6

ARCHIVOS DE MEDICINA

Journal of Translational Neurosciences

ISSN 2573-5349

ISSN 1698-9465 and modeling and, consequently, it shares with them some comes out with typical components like task-related, transiently components. Nevertheless, an important diﬀerence exists in the task-related, and funcꢀon-related acꢀvity without reference to goal and purpose between DM and tradiꢀonal EDA in that DM any experimental protocol, including unanꢀcipated or missed is oriented towards applicaꢀons to a larger extent rather than acꢀvaꢀons (Figure 3). the underlying phenomena. That is, DM is less concerned with

The assumpꢀons behind the ICA model are the following: idenꢀfying the speciꢁc relaꢀons between the involved variables,

(i) the latent source signals are assumed staꢀsꢀcally independent and at most one Gaussian, and rather its focus is on producing a soluꢀon that can generate useful predicꢀons. Therefore, DM comprises tradiꢀonal EDA techniques, as well as techniques like ANNs that can come out with valid predicꢀons and sꢀll not having resources to idenꢀfy the speciꢁc nature of the variable interrelaꢀons on which the predicꢀons are made.

(ii) the mixing process is assumed staꢀonary and linear but unknown.

ICA, based on higher order-staꢀsꢀcs, transforms the ill-posed

PCA problem into a well-posed one. The ICA decomposiꢀon is unique up to IC amplitude (scale), IC polarity (sign), and IC ranking

(order) [26]. Technically, applying the ICA model amounts to the selecꢀon of an esꢀmaꢀon principle (objecꢀve funcꢀon) plus an opꢀmizaꢀon algorithm. Typical objecꢀve funcꢀons consist in maximizaꢀon or minimizaꢀon of Exploratory methods

The central interest in funcꢀonal brain studies resides in the electrical acꢀvity of neurons, which cannot be directly invesꢀgated by any magneꢀc resonance imaging (MRI) [26]. The human brain electrical acꢀvity is of paramount interest for both understanding and modeling the human brain, and for medical diagnosis and treatment as well, especially for developing automated paꢀent monitoring, computer-aided diagnosis, and personalized therapy [26].

(i) high-order staꢀsꢀcal moments (e.g., kurtosis),

^y(t) x(t) x(t)

In data analysis, a widely spread task consists in ꢁnding an appropriate representaꢀon of mulꢀvariate data aiming to facilitate subsequent processing and interpretaꢀon. Transformed variables are hoped to be the underlying components, which best describe the intrinsic data structure and highlight the physical causes responsible for data generaꢀon. Linear transforms like

PCA and ICA and are oꢂen envisaged to accomplish this task due to their computaꢀonal and conceptual simplicity [26]. Generally, methods of unsupervised learning fall in the class of data-based

(hypothesis-driven) analysis, such as eigen image analysis [27,28] or self-organizing maps (SOM) [29].

^x (t) y (t) x1(t)

1

1

W

^x (t)

2y (t)

2x (t)

2

T

..

..

(L x N )

.

.

(L x N )

^x (t) y (t)

NLx (t)

N

(N x T ) (L x T)

(N x T )

Staꢀonary PCA model. The goal of PCA is to idenꢀfy the dependence structure in each dimension and to come out with an orthogonal transform matrix W of size L × N from RN to RL, where L ≤ N, such that the L-dimensional output vector y(t)=Wx(t) suﬃciently represents the intrinsic features of the input data x(t). Consequently, the reconstructed input data are given by x (t) =WTWx(t).

Figure 2

PCA and ICA: PCA is deꢁned by the eigenvectors of the covariance matrix of the input data. In PCA, data are represented in an orthonormal basis determined by the second order staꢀsꢀcs

(covariances) of the input data. Such representaꢀon is adequate for Gaussian data [30]. PCA is a means of encoding secondorder dependencies in data by rotaꢀng the orthogonal axes to correspond to the direcꢀons of maximum covariance (Figure

2). Asa linear transform, PCA is opꢀmal in terms of least mean square errors over all projecꢀons of a given dimensionality. PCA decorrelates the input data but does not address the high-order dependencies. Decorrelaꢀon means that variables cannot be predicted from each other using a linear predictor, yet nonlinear dependencies between them can sꢀll exist. Edges, as for instance, deꢁned by phase alignment at mulꢀple spaꢀal scales, consꢀtute an example of high-order dependency in an image, similarly to shape and curvature are [26]. Second-order staꢀsꢀcs capture the amplitude spectrum of images but not the phase [31]. Coding mechanisms that are sensiꢀve to phase play an important role in organizing a perceptual system [32]. The linear staꢀonary

PCA and ICA processes can be introduced on the basis of a common data model. The ICA model is a data-driven mulꢀvariate exploratory approach based on the covariance paradigm and formulated as a generaꢀve linear latent variables model [21]. ICA

Staꢀonary noiseless linear ICA model. Here s(t), x(t), , and A denote the latent sources, the observed data, and the (unknown) mixing matrix, respecꢀvely, whereas ai, i = 1,

2, ..., M are the columns of A. Mixing is supposed to be instan- taneous, so there is no ꢀme delay between the source variables {si(t)} mixing into observable (data) variables {xj(t)}, with i = 1, 2, ..., M and j = 1, 2, ..., N .

Figure 3

This article is available in:

42018

Vol.3 No.2:6

ARCHIVOS DE MEDICINA

Journal of Translational Neurosciences

ISSN 2573-5349

ISSN 1698-9465

(ii) maximum likelihood (ML),

(iii) mutual informaꢀon (MI), or

(iv) negentropy.

For ICA model, staꢀsꢀcal properꢀes (e.g., consistency, asymptoꢀc variance, robustness) depend on the esꢀmaꢀon principle.

Algorithmic properꢀes (e.g., convergence speed, memory requirements, numerical stability) depend on the selecꢀon of the opꢀmizaꢀon algorithm. ICA is based on the concept of independence between probability distribuꢀons which, in its turn, relies on informaꢀon theory. Entropy is a criterion for staꢀsꢀcal independence widely employed. Terms like informaꢀon and entropy are richly evocaꢀve with mulꢀple meanings in everyday usage; informaꢀon theory captures only some of the many facets of the noꢀon of informaꢀon (Figure 4).

PCA and ICA compared for Gaussian and non-Gaussian data.

Figure 4

Clustering analysis: Searching for meaningful paꢃerns in data like biological ones, has been a permanent endeavor best typiꢁed by the taxonomy that arranged species into groups based on their similariꢀes and diﬀerences [33]. Clustering is an important DM method for discovering knowledge in mulꢀdimensional data.

Clustering analysis, alternaꢀvely called automaꢀc classiꢁcaꢀon, numerical taxonomy, typological analysis, amounts to grouping, segmenꢀng or parꢀꢀoning a set of objects into subsets (clusters), maximizing their degree of similarity within each cluster and minimizing their degree of dissimilarity if belonging to disꢀnct clusters. Clusters may be

(i) disjoint vs. over- lapping,

(ii) determinisꢀc vs. probabilisꢀc, and (iii) ﬂat vs. hierarchical.

Hierarchical clustering. Agglomeraꢀve methods (boꢃomup) merge the objects (observaꢀons) into successively larger clusters up to a single cluster grouping all objects.