Appendix e-1: Details of analysis
Psychometric properties of psychosocial inventories
Reliability measures for psychosocial inventories were estimated using Center A subjects. Cronbach’s alpha coefficient was used to assess internal consistency reliability. To estimate test-retest reliabilities, ten epilepsy and ten PNES subjects retook the questionnaire seven days after initial completion, and Spearman’s correlation coefficient was calculated for subscale and total scores between initial and seventh-day administration.
Table e-6 shows internal consistency and test-retest reliabilities for inventories used in this study. Psychosocial inventory subscales were internally consistent (Cronbach’s alpha > 0.70), except for QOLIE-31 overall quality of life (0.46), MHLC doctors (0.37), and MHLC other people (0.62). Inventories demonstrated satisfactory test-retest reliability with all correlation coefficients greater than 0.82. The CASE-epilepsy scale maintained significant inter-subscale correlations (0.53, 0.56, 0.57) that were comparable to those of the original CASE-cancer scale 18.
Hybrid classifier
Data Modification
Five Center A non-PNES subjects reported ranges (e.g., 5—10 per month) for monthly seizure frequency instead of a single value. For these subjects the midpoint of the range was assigned as a single value. Two additional Center A non-PNES subjects reported ranges (e.g., 15—20 years old) for the ‘age of seizure onset’ item. Again, the midpoint of the range was used in these two cases.
Variable Reduction
Using deciles, quartiles and binary data as features, we classify the subjects into two groups defined by the outcome data. Features (or predictors) are used as inputs to the classifier and the outputs of the classifier are binary values that classify the inputs into two distinct groups.
Because the inputs of the classifier are represented in the form of deciles, binary or quartile variables, the inputs and the outputs of the classifiers are all categorical data to which numerical values have been assigned for representation purposes only. For example, gender is represented using the binary values of one and zero. This assignment is purely arbitrary and the values do not have any significance mathematically. Therefore, one should be very careful on how to treat the data when applying different mathematical and statistical techniques and tools. For this reason, we decided not to prune the data using principle components analysis (PCA) or similar methods that rely on the algebraic properties of (continuous) numerical values. Instead, we use instantaneous logistic regression to reduce the number of predictors.
The approach is to fit a logistic model to all 22 successfully screened (p-value <0.20 in univariate logistic regression testing) variables simultaneously and investigate the influence of each predictor on the outcome. An independent variable (predictor) with a regression coefficient that is not significantly different from zero (p-value>0.05) can be removed from the regression model. If p-value for a coefficient being different from 0 is <0.05 then the variable contributes significantly to the prediction of the outcome variable. The logistic regression coefficients show the change in the predicted log odds for having the characteristic of interest given a one-unit change in the independent variables. When the independent variables are dichotomous variables (e.g. gender) then the influence of these variables on the dependent variable can simply be compared by comparing their regression coefficients. Variables with p-value <0.05 in logistic regression constitute the reduced predictor set.
Instantaneous logistic regression reduced the 22 successfully screened variables to ten predictors: Age of seizure onset, Monthly seizure frequency, Fibromyalgia, Nutritional practices, Efficacy to understand, Limiting behavior, Practical support seeking behavior, Chance locus of control, Other people locus of control, and Depression. Of note, Anxiety and Depression scores were highly correlated, as were Fibromylagia and Chronic pain items. Therefore, from each pair, we retained the variable that more accurately predicted PNES (Depression and Fibromyalgia).
Classification methods
Different classification techniques have been used for this dataset as described below. We used 75 % of Center A data for training and classify the remaining 25 % for Center A validation. Subsequently, all Center B patients are classified for Center B validation. In order to remove the effect of selecting the training dataset in each step, we randomly select the training data, yet stratify to maintain overall proportion of PNES diagnosis (35%) in the training and Center A validation sets. We classify the dataset once using all 22 screened predictors as input variables and then, for comparative purposes, using the reduced set of ten predictors as given by the results of instantaneous logistic regression analysis. Classification accuracy of the reduced model is compared to that of the full model to confirm no loss of information. An overview of different classifiers used is given next.
1- Perceptron Neural Network (PNN)
A perceptron neuron uses the hard-limit transfer function: a=hardlim(Wp+b)
where p(Rx1) is the input vector, W (1xR) is the weight matrix, b (scalar) is the bias, a (scalar) is the output which takes values in the (binary) set {0, 1}, and “hardlim” is the hard limiter function. The PNN is a very simple classifier and often works effectively for applications where the inputs are to be grouped into two distinct classes. The discriminate function is linear and the classifier does not perform well if the data is not linearly separable. However, when that data is linearly separable the optimization problem that is used to select the weights of the PNN has a global solution and this makes the PNN a powerful modeling tool for these applications. For our application, we used a PNN with the number of input neurons equal to the number of input features (predictors) and one output neuron that generates binary {0,1} values corresponding to the two classification groups.
2- Backpropagation Neural Network (BNN)
A backpropagation network is one of the most widely used neural network architectures and consists of three main layers: the input, output, and hidden layers. The training process is used to find the optimal weight matrix that relates the inputs to the outputs. The difficulty in training is often because of the complicated topology of the parameter space and performance landscape that can lead to the existence of local minima. However, some computational techniques have been developed to address this problem. One of the strengths of the BNN is its ability to estimate nonlinear mappings between input and output datasets. It is well known that BNNs are universal approximators of nonlinear input-output mappings; however determining the correct network architecture requires an exploratory computational approach. In this study we used one neuron for each three inputs in the hidden layer and one neuron for the output layer. For example, if we have 22 predictors, then there are seven neurons in the hidden layer and one neuron in the output.
3- Linear Vector Quantization (LVQ)
LVQ belongs to the class of self-organizing networks. Such networks are capable of detecting and learning regularities and correlations in the input data and can adapt their future responses to the inputs accordingly. An LVQ network has two layers: the competitive layer and the linear layer. The neurons in the competitive layer of the network learn to recognize groups of similar input vectors. Self-organizing maps learn to recognize groups of similar input vectors in such a way that neurons that are physically near to each other in the network respond to similar input vectors. The competitive transfer function accepts a net input vector and returns a neuron output of 0 for all neurons except for the winner, which is the neuron that is associated with the most positive element of net input. If all biases are zero, then the neuron whose weight vector is closest to the input vector has the least negative net input and, therefore, wins the competition and has an output equal to one. The linear layer then transforms the classes as determined by the competitive layer into the target classifications defined by the user. For example, if we select to have 22 neurons in the input layer and four neurons in the hidden or completive layer, this means that we group the input features into four classes in the competitive layer based on the self-organizing map. As a matter of fact, there is no gold standard for choosing the number of neurons in the competitive layer and a common approach is to use prior knowledge and structure of the input features in choosing the number of classes in the competitive layer. Most often, a suitable network architecture is determined by trial and error in the training phase.
4- Bayesian Classifier
Assume we have N features f1,…,fN and a class variable C. A Bayesian classifier learns the conditional probability of each attribute fi given the label C in the training data. Classification is then done by applying Bayes rule to compute the probability of C given the particular instantiation of f1,…,fN. The assumption is that all the attributes fi are conditionally independent given the value of the label C. The performance of this classifier is generally very good even though this assumption is not satisfied in most applications of the method. There is no predefined structure of the Bayesian classifier as required for neural network classifiers.
Hybrid Classifier
As mentioned, we classify the dataset for each outcome in two different stages. In the first stage we use the entire set of 22 predictors in the dataset as inputs to each of the four classifiers and next we use only the reduced set of ten predictors that have been selected from instantaneous logistic regression. After training, the outputs of all four classification methods are linearly combined (hybrid classifier) using the logit transformation in order to restrict the output between zero and one, and tested on the Center A validation set (remaining 25% of Center A subjects). We used maximum likelihood estimation to optimize logit function coefficients. Hybrid classifier outputs ≥0.5 correspond to PNES diagnosis, while values <0.5 indicate the absence of PNES. The hybrid classifier was retested after sequentially eliminating each of its four classification methods. Elimination of the BNN method did not significantly change hybrid classifier performance and was, therefore, permanently removed from the hybrid classifier.
The reduced model (Figure e-1) actually outperformed the full model for all classification techniques, except for BNN where there was no change in classification accuracy. This suggests that instantaneous logistic regression analysis can be effectively used to identify the most influential (categorical) predictors of an outcome, and when the reduced set of predictors are used for classification they may outperform the classification where the entire set of predictors have been used.
Advantages of Artificial Neural Networks Versus Traditional Statistical Methods
For our study, ANNs offered three major advantages over traditional statistical methods for classification (e.g., logistic regression, discriminant analysis). First, ANNs make no assumptions about the distribution, dispersion, or linearity of data25,27. In traditional statistical methodology, variables that violate these assumptions are recoded or excluded, potentially leading to loss of predictor information. ANNs were particularly advantageous for our study because they are designed to handle inherently complex nonlinear interactions between demographic, clinical, seizure-related, and psychosocial predictor variables25,27, whereas traditional statistical methods for classification (e.g., logistic regression, discriminant analysis) assume that the strength and direction of a predictor remain constant for all subjects in a given sample. For instance, high seizure frequency may be a strong indicator of PNES in women with high self-efficacy and healthy nutritional practices, yet predict the absence of PNES in women with low self-efficacy and unhealthy nutritional practices. Finally, ANNs are designed to minimize classification error, whereas traditional methods, such as logistic regression, are based on maximizing a likelihood function. Because our goal was to construct a sensitive and specific PNES screening instrument, minimization of classification error is more relevant to our study than maximization of a likelihood function.
Logistic regression
Model building
Logistic regression model building was according to methods described by Hosmer and Lemeshow 24. Predictors with associated p-value < 0.20 in univariate logistic regression (i.e., successfully screened predictors) were entered into a backwards stepwise multivariable logistic regression procedure. Categorical and ordinal variables were recoded using design variables in order to estimate an independent odds ratio for each category relative to a designated reference category with assigned odds ratio = 1.00. Continuous variables were examined for ‘linearity in the logit’ using three methods: 1) Univariable smoothed scatterplot on the logit scale, 2) Quartile-based design variable analysis, and 3) Fractional polynomial analysis. Based on the results of these three methods a continuous variable could either be converted to a categorical variable, be transformed into one or more powers of the variable (e.g., squared, cubed, square root), or remain as is.
The final multivariable model incorporating converted or transformed variables was compared with an identical model that used the original continuous forms of the variables. If there was no significant improvement in model fit, then the original continuous forms of the variables were used in place of the converted/transformed variable forms. Model fit was assessed using the Hosmer and Lemeshow goodness-of-fit test. Two-way interaction terms were entered into the final model one at a time and retained if significant at alpha = 0.25 level.
Logistic regression results
Sixteen of the 22 successfully screened predictors (p<0.20 in univariate logistic regression) were continuous variables. On examining the three methods for ‘linearity in the logit’ four variables were converted into categorical form and two variables were transformed into powers of their original continuous form. QOLIE-31 overall quality of life subscale and BRIQ limiting behavior subscale scores were categorized into quartiles, whereas PAI social detachment subscale and Zung depression scores were made dichotomous relative to their median value. Fractional polynomial analysis indicated HPLP nutrition subscale score raised to the power of -0.5 and BRIQ all-or-none behavior subscale score raised to the powers of -0.5 and -2 as potentially significant contributors to the model.
After stepwise elimination the multivariable model retained eight variables: age of seizure onset, HPLP nutrition subscale, CASE-epilepsy understanding subscale, BRIQ limiting behavior and practical support seeking subscales, MHLC chance and other people subscales, and Zung depression score. Replacing HPLP nutrition, BRIQ limiting behavior, and Zung depression variables with their alternate forms did not change model performance (p=0.990, paired t-test). Hosmer and Lemeshow goodness-of-fit test indicated adequate model fitting of the data (p=0.760). No interaction terms attained significance at alpha = 0.25 level. Table e-5 shows results for prediction of PNES diagnosis.