Last-Quizartificial Intelligence CSE 5290, Spring 201762 Points

Last-quizArtificial Intelligence CSE 5290, Spring 201762 points

It is to your advantage to compute the final value (if asked), however, I only care about the formulation that must be there, not just the final result.

Q1-4 caries 5 pts each

Q1. A manager tracks her employees sleep pattern (for enough sleep or not). She observes every day whether each of them sleeps during work, and whether they have red eyes. Following is her knowledge base: (1) typically, the probability of enough sleep is 0.7; (2) probability of having enough sleep on a night is 0.8 given the same on the previous night, but 0.3 if not; (3) the probability of red eyes is 0.2 given enough sleep that night, but 0.7 if not; and (4) the probability of sleeping in office is 0.1 given enough sleep, but 0.3 if not.

Draw a Dynamic Bayesian Network for this problem, DBN for two time points, t and t+1, and write the CPTs (each as a full table, with all values for variables, not completing table for Boolean variables, as in the book, may not be done)

From 15.13 The DBN has three variables: St, whether the student gets enough sleep; Rt, whether

they have red eyes in class; Ct, whether the student sleeps in class. St is a parent of St+1, Rt,

and Ct. The CPTs are given by

P(s0) = 0.7

P(st+1|st) = 0.8

P(st+1|￢st) = 0.3 // negation signs may not get printed

P(rt|st) = 0.2

P(rt|￢st) = 0.7

P(ct|st) = 0.1

P(ct|￢st) = 0.3

To reformulate as an HMM with a single observation node, simply combine the 2-valued variables “having red eyes” and “sleeping in class” into a single 4-valued variable, multiplying

together the emission probabilities. (Probability tables omitted.)

Q2.Two independent sensors count numbers (S1 and S2) of client visits (actual number is C) in a store. There is a small possibility e of errors by up to one client. Each sensor can also become faulty occasionally (events E1 and E2) with a much smaller probability f, in which case the counts may be undercut by three or more counts (or if C <3, a faulty sensor may even return 0).

a. Draw two different Bayesian networks to formulate the same problem.

b. Show conditional probability tablues for each of your networks (assume S1 and S2 are {0,1,2,3}, and C is{1, 2, 3}, both fixed sets). You may keep the tables empty.

Key: Book has two correct networks. Note that a table is on a node, and may be multidimensional depending on number of arcs to the node.

The answer will be a large number of (empty ok) CPT’s, for all descendant nodes on two networks.

Q3. A slot machine has three independent wheels, each producing one of the four letters A, B, C, and D with equal probability. The slot machine has the following payout scheme for a bet of 1 coin. A “?” sign below indicates that it may have any of the four letters.

D/D/D pays 5 coins

C/?/? pays 1 coin,

C/C/? pays 2 coins

C/C/C pays 3 coins

B/B/B pays 20 coins

A/A/A pays 15 coins

anything else does not return any coin.

Q3a. For each coin played, what is the expected coin return or payoff?

Q3b. Compute the probability that playing the slot machine only once will result in any win.

ANS: 13.10. Exchanged Bar->B, Bell->A, Lemon->D, and Cherry->C.

13.10

a. To compute the expected payback for the machine, we determine the probability for

each winning outcome, multiply it by the amount that would be won in that instance,

and sum all possible winning combinations. Since each symbol is equally likely, the

first four cases have probability (1/4)3= 1/64.

However, in the case of computing winning probabilities for cherries, we must only

consider the highest paying case, so we must subtract the probability for dominating

winning cases from each subsequent case (e.g., in the case of two cherries, we subtract

off the probability of getting three cherries):

CHERRY/CHERRY/? 3/64 = (1/4)2−1/64

CHERRY/?/? 12/64 = (1/4)1−3/64 −1/64

The expectation is therefore

20 ·1/64 + 15 ·1/64 + 5 ·1/64 + 3 ·1/64 + 2 ·3/64 + 1 ·12/64 = 61/64.

Thus, the expected payback percentage is 61/64 (which is less than 1 as we would

expect of a slot machine that was actually generating revenue for its owner).

b. We can tally up the probabilities we computed in the previous section, to get

1/64 + 1/64 + 1/64 + 1/64 + 3/64 + 12/64 = 19/64.

Alternatively, we can observe that we win if either all symbols are the same (denote

this event S), or if the first symbol is cherry (denote this event C). Then applying the

inclusion-exclusion identity for disjunction:

P(S ∨C) = P(S) + P(C) −P(S ∧C) = (1/4)2 + 1/4 −1/64 = 19/64.

Q4. Consider the following data set of three binary attributes (A1 through A3) and one binary output y.

Example / A1 / A2 / A3 / Output y
X1 / 0 / 0 / 1 / 0
X2 / 1 / 0 / 1 / 0
X3 / 0 / 1 / 0 / 0
X4 / 1 / 1 / 1 / 1
X5 / 0 / 1 / 1 / 1

Use the decision tree generation algorithm to learn a decision tree for the data. Show computations to determine attribute split at each node.

ANS:on 18.6 Exchanged rows A1->A3, A2->A2, A3->A1

18.6 Note that to compute each split, we need to compute Remainder(Ai) for each attribute

Ai, and select the attribute the provides the minimal remaining information, since the

existing information prior to the split is the same for all attributes we may choose to split on.

Computations for first split: “remainders” for A1, A2, and A3 are respectively,

(4/5)(−2/4 log(2/4) −2/4 log(2/4)) + (1/5)(−0 −1/1 log(1/1)) = 0.800

(3/5)(−2/3 log(2/3) −1/3 log(1/3)) + (2/5)(−0 −2/2 log(2/2)) ≈0.551

(2/5)(−1/2 log(1/2) −1/2 log(1/2)) + (3/5)(−1/3 log(1/3) −2/3 log(2/3)) ≈0.951

Choose A2 for first split since it minimizes the remaining information needed to classify all

examples. Note that all examples with A2 = 0, are correctly classified as B = 0. So we only

need to consider the three remaining examples (x3, x4, x5) for which A2 = 1.

After splitting on A2, we compute the remaining information for the other two attributes

on the three remaining examples (x3, x4, x5) that have A2 = 1. The remainders for A1 and

A3 are

(2/3)(−2/2 log(2/2) −0) + (1/3)(−0 −1/1 log(1/1)) = 0

(1/3)(−1/1 log(1/1) −0) + (2/3)(−1/2 log(1/2) −1/2 log(1/2)) ≈0.667.

So, we select attribute A1 to split on, which correctly classifies all remaining examples.

Q5. Answer following questions briefly. Each question carry 2 points.[22]

Question / Answer
For the given finite discrete S1 and S2 values in Question 2b above, what is the number of entries in a Full Joint Distribution table? Explain briefly your answer. / Sizes of variable domainsS1, S2, E1, E2 and F: 4x3x2x2x2, or the last variable size may be 1 instead of 2 as the probability sums to 1.
What is the number of entries in the better(most efficient) of your two Bayesian networks? / CPT sizes on Fig 14.22 p562,
1x4x3 + 1x4x3
What is meant by a “leak node” in a Bayesian network? / A node for “Miscellaneous causes” that we do not know explicitly (p519)
toothache / toothache / ~toothache / ~toothache
catch / ~catch / catch / ~catch
cavity / .108 / .012 / .072 / .008
~cavity / .016 / .064 / .144 / .576
Calculate the followings:
(Note upper case for vector)
a. P(~Cavity)
P(Cavity) or P(~cavity), either answer would be ok. / This asks for the vector of probability values for the random variable Cavity. It has twovalues. First add up for Cavity=true: 0.108 + 0.012 +0.072 +0.008 = 0.2. Then we have
P(Cavity=t/f) = 0.2, 0.8.
[Q13.8]
Continuing from above:
b. P(Toothache | Cavity=false) / ANS b.
P(Toothache|Cavity=f) = (.016 + .064)/0.8, (0.144 + 0.576)/0.8= < - - , - -
Map the Dynamic Bayesian inferencing mechanisms on left below to the corresponding times on right:
Filtering Past
Prediction Present
Smoothing Future / Filtering Present
Prediction Future
Smoothing Past
With five binary attributes making a target decision (e.g. going to a restaurant or not) how many decision trees may be generated (without eliminating any attribute from the tree)? / Ans: 2^2^5
All subsets of the truth table
Write the full form of the acronym PAC learning. / Probably Approximately Correct learning: No matter how bad the hypotheses are they are likely to eliminate wrong choices with high likelihood.
A biased coin comes up with 90% head. Show its entropy computation. / -(0.9 log 0.9 + 0.1 log 0.1)
Write your group number and briefly describe yourself-study topic including your results.
Write Asimov’s three laws of Robotics. / 1. Don’t kill human. 2. Obey human orders that do not violet rule 1. 3. Transmit these three laws to other robot’s built by this robot.