CIS 732, Fall 2001: Machine Learning and Pattern Recognition - Homework 1 of 6 (Problem Set)

CIS 732Midterm Exam, Fall 2005 Name: ______

CIS 732

Machine Learning and Pattern Recognition

Fall, 2005

Midterm Exam (In-Class, Open Notes)

Instructions and Notes

You have 50 minutes for this exam. Budget your time carefully.
No calculators or computing devices are needed or permitted on this exam.
Your answers on short answer and essay problems shall be graded for originality as well as for accuracy.
You should have a total of 7 pages; write your name on each page.
Use only the front side of pages for your answers; you may add additional pages if needed.
Circle exactly one answer for each true/false and multiple choice question.
Show your work on problems and proofs.
In the interest of fairness to all students, no questions concerning problems shall be answered during the test. If you believe there is ambiguity in any question, state your assumptions.
There are a total of 150 possible points in this exam and 15 points of extra credit.

Instructor Use Only

Problem Number / Points / Possible
1 / 40
2 / 40
3 / 40
4 / 30
Extra / 15
Total / 200+15

True/False (10 questions, 4 points each)

Circle exactly one answer per problem.

- Page 1 of 7 -

CIS 732Midterm Exam, Fall 2005 Name: ______

a) T F

b) T F

c) T F

d) T F

e) T F

f) T F

g) T F

h) T F

i) T F

j) T F

Pre-pruning in decision tree induction is an overfitting avoidance technique.

An efficiently PAC learnable concept can be learned in time polynomial in the size of the concept c, the number of variables (attributes) n, the ratio bound , and the confidence bound .

Rote learning of a concept is equivalent to learning a version space with S consisting of the conjunction of all positive examples and G consisting of the conjunction of all negative examples.

The entropy H of a boolean variable is a concave function of the probability of.its being true.

A 2-layer feedforward ANN can learn any continuous function to arbitrarily small error, given enough hidden units and training time.

Every propositional function can be represented exactly by some feedforward artificial neural network with at most two layers of units.

m-of-n concepts over Boolean variables are not linearly separable.

If uniform priors are assumed over hypotheses, MAP estimation reduces to Naïve Bayes estimation.

A polytree is a Bayesian network that has no undirected loops.

All singly-connected networks are polytrees.

- Page 1 of 7 -

CIS 732Midterm Exam, Fall 2005 Name: ______

Definitions (4 questions, 10 points each)

Give your definition in a complete sentence.

a) Instance space –

b) Representation (restriction) bias –

c) Information gain –

d) Maximum A Posteriori class value (distinguish from MAP hypothesis) –

Decision Tree Induction (40 points total). Show all work.

Jedi Master Mace Windu has been concerned about the number of apprentices who have been turning to the Dark Side of the Force lately, so he has asked one of them to run ID3 on the temple database.

The input variables are:

Race
Brought to Temple as infant?
Apprenticed to Knight or Master?

(Index)
Apprentice / Race / Infant / Master / (Output)
Went-Dark
1 / Human / No / Knight / No
2 / Human / Yes / Knight / Yes
3 / Human / No / Master / No
4 / Human / Yes / Knight / Yes
5 / Human / Yes / Master / Yes
6 / Twi’lek / Yes / Master / No
7 / Twi’lek / Yes / Knight / No
8 / Twi’lek / No / Master / Yes
9 / Twi’lek / Yes / Knight / No
10 / Cerean / Yes / Master / No
11 / Cerean / Yes / Knight / Yes
12 / Cerean / Yes / Knight / Yes
13 / Cerean / No / Master / No
14 / Cerean / Yes / Master / No

a) (8 points) What is H(Went-Dark | Infant = Yes)? Show your work, but you may leave expressions unreduced.

b) (8 points) What is H(Went-Dark | Race = Human)?

c) (10 points) What attribute would the decision tree-building algorithm choose to use as the root of the tree (assume no pruning)?

d) (14 points) What attributes would the decision tree-building algorithm choose to use in the second layer of the tree (assume no pruning)?

Version Spaces (30 points total). Show your work.

(Based on Problem 2.4 Mitchell) Consider the instance space consisting of integer points in the x, y plane and the set of hypthoses H consisting of rectangles. More precisely, hypotheses are of the form a  x  b, c  y  d, where a, b, c, and d can be any integers.

Consider the version space with respect to the set of positive (+) and negative (-) examples shown below.

a) (15 points) What is the S boundary of the version space in this case? Write out the hypotheses and draw them on the diagram.

b) (15 points) What is the G boundary of the version space in this case? Write out the hypotheses and draw them on the diagram.

Extra Credit

(10 points) Explain what inductive bias is captured by a Naïve Bayes network. Is this a representation (restriction) or search (preference) bias?

(5 points) Briefly (in 1-2 sentences) state your most unclear point about supervised inductive learning.

- Page 1 of 7 -