Models in WEKA

NAME

weka.classifiers.bayes.AODE

SYNOPSIS

AODE achieves highly accurate classification by averaging over all of a small space of alternative naive-Bayes-like models that have weaker (and hence less detrimental) independence assumptions than naive Bayes. The resulting algorithm is computationally efficient while delivering highly accurate classification on many learning tasks.

For more information, see

G. Webb, J. Boughton & Z. Wang (2004). Not So Naive Bayes. To be published in Machine Learning. G. Webb, J. Boughton & Z. Wang (2002). <i>Averaged One-Dependence Estimators: Preliminary Results. AI2002 Data Mining Workshop, Canberra.

OPTIONS

debug -- If set to true, classifier may output additional info to the console.


NAME

weka.classifiers.bayes.BayesNet

SYNOPSIS

Bayes Network learning using various search algorithms and quality measures.

OPTIONS

BIFFile -- Set the name of a file in BIF XML format. A Bayes network learned from data can be compared with the Bayes network represented by the BIF file. Statistics calculated are o.a. the number of missing and extra arcs.

debug -- If set to true, classifier may output additional info to the console.

estimator -- Select Estimator algorithm for finding the conditional probability tables of the Bayes Network.

searchAlgorithm -- Select method used for searching network structures.

useADTree -- When ADTree (the data structure for increasing speed on counts, not to be confused with the classifier under the same name) is used learning time goes down typically. However, because ADTrees are memory intensive, memory problems may occur. Switching this option off makes the structure learning algorithms slower, and run with less memory. By default, ADTrees are used.


NAME

weka.classifiers.bayes.ComplementNaiveBayes

SYNOPSIS

Class for building and using a Complement class Naive Bayes classifier. For more information see,

ICML-2003 "Tackling the poor assumptions of Naive Bayes Text Classifiers"

P.S.: TF, IDF and length normalization transforms, as described in the paper, can be performed through weka.filters.unsupervised.StringToWordVector.

OPTIONS

debug -- If set to true, classifier may output additional info to the console.

normalizeWordWeights -- Normalizes the word weights for each class.

smoothingParameter -- Sets the smoothing parameter to avoid zero WordGivenClass probabilities (default=1.0).


NAME

weka.classifiers.bayes.NaiveBayes

SYNOPSIS

Class for a Naive Bayes classifier using estimator classes. Numeric estimator precision values are chosen based on analysis of the training data. For this reason, the classifier is not an UpdateableClassifier (which in typical usage are initialized with zero training instances) -- if you need the UpdateableClassifier functionality, use the NaiveBayesUpdateable classifier. The NaiveBayesUpdateable classifier will use a default precision of 0.1 for numeric attributes when buildClassifier is called with zero training instances.

For more information on Naive Bayes classifiers, see

George H. John and Pat Langley (1995). Estimating Continuous Distributions in Bayesian Classifiers. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. pp. 338-345. Morgan Kaufmann, San Mateo.

OPTIONS

debug -- If set to true, classifier may output additional info to the console.

useKernelEstimator -- Use a kernel estimator for numeric attributes rather than a normal distribution.

useSupervisedDiscretization -- Use supervised discretization to convert numeric attributes to nominal ones.


NAME

weka.classifiers.bayes.NaiveBayesMultinomial

SYNOPSIS

Class for building and using a multinomial Naive Bayes classifier. For more information see,

Andrew Mccallum, Kamal Nigam (1998) A Comparison of Event Models for Naive Bayes Text Classification

OPTIONS

debug -- If set to true, classifier may output additional info to the console.


NAME

weka.classifiers.bayes.NaiveBayesSimple

SYNOPSIS

Class for building and using a simple Naive Bayes classifier.Numeric attributes are modelled by a normal distribution. For more information, see

Richard Duda and Peter Hart (1973). Pattern Classification and Scene Analysis. Wiley, New York.

OPTIONS

debug -- If set to true, classifier may output additional info to the console.


NAME

weka.classifiers.bayes.NaiveBayesUpdateable

SYNOPSIS

Class for a Naive Bayes classifier using estimator classes. This is the updateable version of NaiveBayes.This classifier will use a default precision of 0.1 for numeric attributes when buildClassifier is called with zero training instances.

For more information on Naive Bayes classifiers, see

George H. John and Pat Langley (1995). Estimating Continuous Distributions in Bayesian Classifiers. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. pp. 338-345. Morgan Kaufmann, San Mateo.

OPTIONS

debug -- If set to true, classifier may output additional info to the console.

useKernelEstimator -- Use a kernel estimator for numeric attributes rather than a normal distribution.

useSupervisedDiscretization -- Use supervised discretization to convert numeric attributes to nominal ones.


NAME

weka.classifiers.functions.LeastMedSq

SYNOPSIS

Implements a least median sqaured linear regression utilising the existing weka LinearRegression class to form predictions. Least squared regression functions are generated from random subsamples of the data. The least squared regression with the lowest meadian squared error is chosen as the final model.

The basis of the algorithm is

Robust regression and outlier detection Peter J. Rousseeuw, Annick M. Leroy. c1987

OPTIONS

debug -- If set to true, classifier may output additional info to the console.

randomSeed -- Set the seed for selecting random subsamples of the training data.

sampleSize -- Set the size of the random samples used to generate the least sqaured regression functions.


NAME

weka.classifiers.functions.LinearRegression

SYNOPSIS

Class for using linear regression for prediction. Uses the Akaike criterion for model selection, and is able to deal with weighted instances.

OPTIONS

attributeSelectionMethod -- Set the method used to select attributes for use in the linear regression. Available methods are: no attribute selection, attribute selection using M5's method (step through the attributes removing the one with the smallest standardised coefficient until no improvement is observed in the estimate of the error given by the Akaike information criterion), and a greedy selection using the Akaike information metric.

debug -- Outputs debug information to the console.

eliminateColinearAttributes -- Eliminate colinear attributes.

ridge -- The value of the Ridge parameter.


NAME

weka.classifiers.functions.Logistic

SYNOPSIS

Class for building and using a multinomial logistic regression model with a ridge estimator.

There are some modifications, however, compared to the paper of leCessie and van Houwelingen(1992):

If there are k classes for n instances with m attributes, the parameter matrix B to be calculated will be an m*(k-1) matrix.

The probability for class j with the exception of the last class is

Pj(Xi) = exp(XiBj)/((sum[j=1..(k-1)]exp(Xi*Bj))+1)

The last class has probability

1-(sum[j=1..(k-1)]Pj(Xi))

= 1/((sum[j=1..(k-1)]exp(Xi*Bj))+1)

The (negative) multinomial log-likelihood is thus:

L = -sum[i=1..n]{

sum[j=1..(k-1)](Yij * ln(Pj(Xi)))

+(1 - (sum[j=1..(k-1)]Yij))

* ln(1 - sum[j=1..(k-1)]Pj(Xi))

} + ridge * (B^2)

In order to find the matrix B for which L is minimised, a Quasi-Newton Method is used to search for the optimized values of the m*(k-1) variables. Note that before we use the optimization procedure, we 'squeeze' the matrix B into a m*(k-1) vector. For details of the optimization procedure, please check weka.core.Optimization class.

Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights.

For more information see:

le Cessie, S. and van Houwelingen, J.C. (1992). Ridge Estimators in Logistic Regression. Applied Statistics, Vol. 41, No. 1, pp. 191-201.

Note: Missing values are replaced using a ReplaceMissingValuesFilter, and nominal attributes are transformed into numeric attributes using a NominalToBinaryFilter.

OPTIONS

debug -- Output debug information to the console.

maxIts -- Maximum number of iterations to perform.

ridge -- Set the Ridge value in the log-likelihood.


NAME

weka.classifiers.functions.MultilayerPerceptron

SYNOPSIS

This neural network uses backpropagation to train.

OPTIONS

GUI -- Brings up a gui interface. This will allow the pausing and altering of the nueral network during training.

* To add a node left click (this node will be automatically selected, ensure no other nodes were selected).

* To select a node left click on it either while no other node is selected or while holding down the control key (this toggles that node as being selected and not selected.

* To connect a node, first have the start node(s) selected, then click either the end node or on an empty space (this will create a new node that is connected with the selected nodes). The selection status of nodes will stay the same after the connection. (Note these are directed connections, also a connection between two nodes will not be established more than once and certain connections that are deemed to be invalid will not be made).

* To remove a connection select one of the connected node(s) in the connection and then right click the other node (it does not matter whether the node is the start or end the connection will be removed).

* To remove a node right click it while no other nodes (including it) are selected. (This will also remove all connections to it)

.* To deselect a node either left click it while holding down control, or right click on empty space.

* The raw inputs are provided from the labels on the left.

* The red nodes are hidden layers.

* The orange nodes are the output nodes.

* The labels on the right show the class the output node represents. Note that with a numeric class the output node will automatically be made into an unthresholded linear unit.

Alterations to the neural network can only be done while the network is not running, This also applies to the learning rate and other fields on the control panel.

* You can accept the network as being finished at any time.

* The network is automatically paused at the beginning.

* There is a running indication of what epoch the network is up to and what the (rough) error for that epoch was (or for the validation if that is being used). Note that this error value is based on a network that changes as the value is computed. (also depending on whether the class is normalized will effect the error reported for numeric classes.

* Once the network is done it will pause again and either wait to be accepted or trained more.

Note that if the gui is not set the network will not require any interaction.

autoBuild -- Adds and connects up hidden layers in the network.

debug -- If set to true, classifier may output additional info to the console.

decay -- This will cause the learning rate to decrease. This will divide the starting learning rate by the epoch number, to determine what the current learning rate should be. This may help to stop the network from diverging from the target output, as well as improve general performance. Note that the decaying learning rate will not be shown in the gui, only the original learning rate. If the learning rate is changed in the gui, this is treated as the starting learning rate.

hiddenLayers -- This defines the hidden layers of the neural network. This is a list of positive whole numbers. 1 for each hidden layer. Comma seperated. To have no hidden layers put a single 0 here. This will only be used if autobuild is set. There are also wildcard values 'a' = (attribs + classes) / 2, 'i' = attribs, 'o' = classes , 't' = attribs + classes.

learningRate -- The amount the weights are updated.

momentum -- Momentum applied to the weights during updating.

nominalToBinaryFilter -- This will preprocess the instances with the filter. This could help improve performance if there are nominal attributes in the data.

normalizeAttributes -- This will normalize the attributes. This could help improve performance of the network. This is not reliant on the class being numeric. This will also normalize nominal attributes as well (after they have been run through the nominal to binary filter if that is in use) so that the nominal values are between -1 and 1

normalizeNumericClass -- This will normalize the class if it's numeric. This could help improve performance of the network, It normalizes the class to be between -1 and 1. Note that this is only internally, the output will be scaled back to the original range.

randomSeed -- Seed used to initialise the random number generator.Random numbers are used for setting the initial weights of the connections betweem nodes, and also for shuffling the training data.

reset -- This will allow the network to reset with a lower learning rate. If the network diverges from the answer this will automatically reset the network with a lower learning rate and begin training again. This option is only available if the gui is not set. Note that if the network diverges but isn't allowed to reset it will fail the training process and return an error message.

trainingTime -- The number of epochs to train through. If the validation set is non-zero then it can terminate the network early

validationSetSize -- The percentage size of the validation set.(The training will continue until it is observed that the error on the validation set has been consistently getting worse, or if the training time is reached).

If This is set to zero no validation set will be used and instead the network will train for the specified number of epochs.

validationThreshold -- Used to terminate validation testing.The value here dictates how many times in a row the validation set error can get worse before training is terminated.


NAME

weka.classifiers.functions.PaceRegression

SYNOPSIS

Class for building pace regression linear models and using them for prediction.

Under regularity conditions, pace regression is provably optimal when the number of coefficients tends to infinity. It consists of a group of estimators that are either overall optimal or optimal under certain conditions.

The current work of the pace regression theory, and therefore also this implementation, do not handle:

- missing values

- non-binary nominal attributes

- the case that n - k is small where n is the number of instances and k is the number of coefficients (the threshold used in this implmentation is 20)

For more information see:

Wang, Y. (2000). A new approach to fitting linear models in high dimensional spaces. PhD Thesis. Department of Computer Science, University of Waikato, New Zealand.

Wang, Y. and Witten, I. H. (2002). Modeling for optimal probability prediction. Proceedings of ICML'2002. Sydney.

OPTIONS

debug -- Output debug information to the console.

estimator -- The estimator to use.

eb -- Empirical Bayes estimator for noraml mixture (default)

nested -- Optimal nested model selector for normal mixture

subset -- Optimal subset selector for normal mixture

pace2 -- PACE2 for Chi-square mixture

pace4 -- PACE4 for Chi-square mixture

pace6 -- PACE6 for Chi-square mixture

ols -- Ordinary least squares estimator

aic -- AIC estimator

bic -- BIC estimator

ric -- RIC estimator

olsc -- Ordinary least squares subset selector with a threshold

threshold -- Threshold for the olsc estimator.


NAME

weka.classifiers.functions.RBFNetwork

SYNOPSIS

Class that implements a normalized Gaussian radial basisbasis function network. It uses the k-means clustering algorithm to provide the basis functions and learns either a logistic regression (discrete class problems) or linear regression (numeric class problems) on top of that. Symmetric multivariate Gaussians are fit to the data from each cluster. If the class is nominal it uses the given number of clusters per class.It standardizes all numeric attributes to zero mean and unit variance.