The Tree Has Three Types Of

How decision trees are used for classification?Explain decision tree induction algorithm for classification [Dec-14/Jan 2015][10marks]

To illustrate how classification with a decision tree works, consider a simpler version of thevertebrate classification problem described in the previous section. Instead of classifyingthe vertebrates into five distinct groups of species, we assign them to two categories:

The tree has three types of

?A root node that has no incoming edges and zeroor more outgoing edges.

?Internal nodes, each of which has exactly one incoming edge and two or more outgoingedges.

?Leaf or terminal nodes, each of which has exactly one incoming edge and no outgoing edges

How to improve accuracy of classification?Explain [Dec-14/Jan 2015][5marks]

some tricks for increasing classification accuracy. We focus on ensemble methods. An ensemble or classification is a composite model, made up of a combination of classifiers. The individual classifiersvote, and a class label prediction is returned by the ensemble based on the collection of votes.Ensembles tend to be more accurate than their component classifiers. We start off in introducingensemble methods in general. Bagging, boosting and random forests are popular ensemble methods.Traditional learning models assume that the data classes are well distributed. In many real-world data domains

3.Explain the importance of evaluation criteria for classification methods [Dec-14/Jan

2015][8marks]

The input data for a classification task is a collection of records.Each record, also known asaninstance or example, is characterized by a tuple (x, y), where x is the attribute set and y isaspecial attribute, designateas the class label. sample data set used for classifying vertebratesintoone of the following categories:mammal, bird,fish,reptile, oramphibian.The attributesetincludes properties of a vertebrate suchas itsbody temperature, skincover,method ofreproduction ability to fly, andability to liveinwater.the attribute set can also contain continuousfeatures.Theclasslabel,on the other hand, must be a discreteattribute. This is a keycharacteristic that distinguishes classification from regression, a predictive modelling task in

4. Explain a.Continiousb.Discretec.Asymmetric Attributes with example?[June/July 2014] [10marks]

Discrete attribute has a finite or countably infinite set of values. Such attributes can becategorical or numeric. Discrete attributes are often represented by integer valyes.E.g. Zip code,counts etc.Continuous attribute is one whose values are real numbers. Continious attributes are typicallyrepresented as floating point variables. E.g. tempreture, height etc.For asymmetric attributes only presence- a non zero attribute value- is regarded as important. E.g.consider a data set where each object is a student & each attribute records whether or not astudent took a particular course at university. For a specific student an attribute has a value of 1 Ifthe student took the course and a value 0 otherwise. Because student took only a small fraction ofall available courses, most of the value in such a data set would be 0.therefore it is moremeaningful and more efficient to focus on non 0 values

5. Explain hunts algorithm and illustrate is working? [june/july 2015][10 marks]

[June/July 2014][10marks]

Data cleaning: This refers to the preprocessing of data in order to remove or reduce noise (byapplying smoothing techniques, for example) and the treatment of missing values (e.g., byreplacing a missing value with the most commonly occurring value for that attribute, or with themost probable value based on statistics). Although most classification algorithms have somemechanisms for handling noisy or missing data, this step can help reduce confusion duringlearning.

Relevance analysis: Many of the attributes in the data may be redundant. Correlation analysiscan be used to identify whether any two given attributes are statistically related. For example, astrong correlation between attributes A1 and A2 would suggest that one of the two could beremoved from further analysis. A database may also contain irrelevant attributes. Attributesubset selection4 can be used in these cases to find a reduced set of attributes such that theresulting probability distribution of the data classes is as close as possible to the originaldistribution obtained using all attributes. Hence, relevance analysis, in the form of correlationanalysis and attribute subset selection, can be used to detect attributes that do not contribute to the classification or prediction task. Including such attributes may otherwise slow down, andpossibly mislead, the learning step.

6. What is rule based classifier? Explain how a rule based classifier works.[Dec-14/Jan 2015][10marks] [Dec 13/jan14][7 marks]

Using IF-THEN Rules for Classification

Rules are a good way of representing information or bits of knowledge. A rule-based classifieruses a set of IF-THEN rules for classification. An IF-THEN rule is an expression of the form IFcondition THEN conclusion.

An example is rule R1,

R1: IF age = youth AND student = yes THEN buys computer = yes.

The “IF”-part (or left-hand side) of a rule is known as the rule antecedent or precondition.

The “THEN”-part (or right-hand side) is the rule consequent. In the rule antecedent, the

condition consists of one or more attribute tests (such as age = youth, and student = yes) that arelogically ANDed. The rule’s consequent contains a class prediction (in this case, we arepredicting whether a customer will buy a computer). R1 can also be written as R1: (age = youth)^ (student = yes))(buys computer = yes).

If the condition (that is, all of the attribute tests) in a rule antecedent holds true for a given tuple,we say that the rule antecedent is satisfied (or simply, that the rule is satisfied) and that the rulecovers the tuple.

A rule R can be assessed by its coverage and accuracy. Given a tuple, X, from a class labeleddata set,D, let ncoversbe the number of tuples covered by R; ncorrectbe the number of tuplescorrectly classified by R; and jDj be the number of tuples in D. We can define the coverage andaccuracy of R as coverage(R) = ncoversjDjaccuracy(R) = ncorrectncovers

That is, a rule’s coverage is the percentage of tuples that are covered by the rule (i.e., whoseattribute values hold true for the rule’s antecedent). For a rule’s accuracy, we look at the tuplesthat it covers and see what percentage of them the rule can correctly classify.

7. Write the algorithm for k-nearest neighbour classification. [june/july 2015] [Dec 13/jan-14][3 marks]

Data clustering is under vigorous development. Contributing areas of research include datamining, statistics, machine learning, spatial database technology, biology, and marketing. Owingto the huge amounts of data collected in databases, cluster analysis has recently become a highlyactive topic in data mining research. As a branch of statistics, cluster analysis has beenextensively studied for many years, focusing mainly on distance-based cluster analysis. Clusteranalysis tools based on k-means, k-medoids, and several other methods have also been built into many statistical analysis software packages or systems, such as S-Plus, SPSS, and SAS. Inmachine learning, clustering is an example of unsupervised learning.

Unlike classification, clustering and unsupervised learning do not rely on predefined classes andclass-labeled training examples. For this reason, clustering is a form of learning by observation,rather than learning by examples. In data mining, efforts have focused on finding methods forefficient and effective cluster analysis in large databases. Active themes of research focus on the

scalabilityof clustering methods, the effectiveness of methods for clustering complex shapes.