CMPS 273, Fall 2006 Syllabus

CMPS 470, Machine Learning Notes

Introduction Machine Learning (Chapter 1)

· Terms

o Knowledge – (from Wikipedia) Knowledge is defined (Oxford English Dictionary) variously as (i) expertise, and skills acquired by a person through experience or education; the theoretical or practical understanding of a subject, (ii) what is known in a particular field or in total; facts and information or (iii) awareness or familiarity gained by experience of a fact or situation. Philosophical debates in general start with Plato's formulation of knowledge as "justified true belief". There is however no single agreed definition of knowledge presently, nor any prospect of one, and there remain numerous competing theories.

§ Organized/clustered/data and its related functions that one uses to relate the data items in a way that can model the original data or predict events or unseen data based on the observed data.

o Learning – (from Wikipedia) Learning is the acquisition and development of memories and behaviors, including skills, knowledge, understanding, values, and wisdom. It is the product of experience and the goal of education. Learning ranges from simple forms of learning such as habituation and classical conditioning seen in many animal species, to more complex activities such as play, seen only in relatively intelligent animals.[1][2]

§ The ability to improve performance through experience. Formal definition is on page 2.

§ A computer program is said to learn from experience E with respect to tasks T and performance measure P, if its performance at tasks T, as measured by P, improves with experience E.

o Understanding – (from Wikipedia) Understanding is a psychological process related to an abstract or physical object, such as, person, situation, or message whereby one is able to think about it and use concepts to deal adequately with that object.

§ An understanding is the limit of a conceptualization. To understand something is to have conceptualized it to a given measure.

§ Examples:

· One understands the weather if one is able to predict and to give an explanation of some of its features, etc.

· A psychiatrist understands another person's anxieties if he/she knows that person's anxieties, their causes, and can give useful advice on how to cope with the anxiety.

· A person understands a command if he/she knows who gave it, what is expected by the issuer, and whether the command is legitimate.

· One understands a reasoning, an argument, or a language if one can consciously reproduce the information content conveyed by the message.

· One understands a mathematical concept if one can solve problems using it, especially problems that are not similar to what one has seen before.

§ Defining the cause and effect relationships between different situations or steps required in achieving ones goal. (empirical definition)

o Reasoning – The use of understanding to extend ones knowledge. This can be thought of as extension of knowledge based on principles previously derived from learning. It could be said that reasoning could extend the knowledge base without further physical exploration of the solution space. (empirical definition)

o Discussion – From these definitions of knowledge, learning, understanding, and reasoning, what if any empirical statements can we make?

§ From these definitions it appears that knowledge is a marriage of data items and the functions/actions associated with them in order to achieve some goal.

§ Knowledge is the product of learning.

§ One can learn how to react to a situation, or how to be succeed without understanding.

§ Understanding can occur without learning, but to do so would require an intuitive “leap”. The understanding would then need to be confirmed by experiment.

§ Reasoning requires understanding.

· Approach to problem solving (chapter 1)

§ Examples of learning problems:

· Checkers:

o T – playing checkers

o P – percent of games won

o E – playing games against itself.

· Handwriting recognition:

o T – recognition and classification of handwritten words within images of text.

o P – percent of words classified correctly

o E – a data base of handwritten words paired with their classifications.

· Robot driving a car:

o T – driving a car down the road

o P – average distance before the human operator has to correct the robot driver

o E – a sequence of images (sensor inputs) paired with the correct actuator commands (steering movements, throttle and brake positions) as done by a human driver.

· Typical Machine Learning tasks fall into these two categories:

o Control – What is the correct action based on some sensor inputs? Some examples:

§ Given the current temperature in the house should the ac or the heater be turned on in order to reach the desired temperature range?

§ Given a cat robot, equipped with two microphones for ears, whose job is to catch the squeaking mouse robot, how does it associate sensor readings with actuator commands that propel closer to the mouse?

o Classification – What am I currently observing? Some examples:

§ I am looking at an animal. Is it a cat, a rat, a dog, a hog, an eel, or a giraffe?

§ Given an image that I want to compress using lossy vector compression, what is the ideal set of pixel patterns that can be used to reproduce the image?

· Quiz 1

Concept Learning (chapter 2)

· Concept Learning – In Concept Learning we our goal is develop a function that infers a Boolean value based on some training values. On page 21 there is a table of some training values (examples) that illustrate what we are trying to do.

Example

Sky

Air Temp

Humidity

Wind

Water

Forecast

/ Enjoy Sport
1 / Sunny / Warm / Normal / Strong / Warm / Same / Yes
2 / Sunny / Warm / High / Strong / Warm / Same / Yes
3 / Rainy / Cold / High / Strong / Warm / Change / No
4 / Sunny / Warm / High / Strong / Cool / Change / Yes

Given these examples, can we determine a pattern that will help us predict whether the person is going to go out and do their sport? In the book example this is data collected from a fellow named Aldo that does water sports.

From inspection it appears that the determining factors, from these examples, is if it is raining and cold – we determine this because on day 3, he does not do his sport, and the only attributes that are inconsistent with the other days are when it is rainy and when it is cold.

Notation:

· ? indicates that any value is okay in this position for a positive outcome.

· <Sunny, Warm, ?, ?, Strong, ?,> The question marks indicate that any value will satisfy that attribute. So Humidity, Wind, and the Forecast are irrelevant when determining if Aldo is going to do his water sport.

· <?, ?, ?, ?, ?, ?> for the above example would mean that Aldo enjoys his water sport no matter what the conditions.

· Ø indicates that no values are acceptable in this position.

· <Sunny, Warm, Ø, Ø, Strong, Ø > In this case the Ø indicates that value is acceptable here is these positions.

o < Ø, Ø, Ø, Ø, Ø, Ø> for the above example indicates that there are no conditions in which Aldo would enjoy his water sport.

· X is the set of training values.

· x is a member of X.

· c is the function that we are learning. It is called a target concept. It has a Boolean value.

o c(x) = 1 for a positive example in X.

o c(x) = 0 for a negative value in X.

o c(x1) = 1 for the above example.

o c(x3) = 0 for the above example.

· <x, c(x)> is the example and its target concept value.

o < (sunny, warm, high, strong, warm, same), 1> is how we represent the values from row 2 in the table for our example above.

· The learner want to estimate c.

· H is the set of all hypotheses that the learner can consider for c.

· Each h in H is a function that is defined across X.

o h(x) = c(x) for all x in X

§ <?, cold, high, ?, ?, ? > = no

· How many choices are there for X?

o Our attributes are: <sky, air, humidity, wind, water, forecast>

§ 3 choices for sky

§ 2 for each of the rest.

§ Number of possible permutations = 3 * 2 * 2 * 2 * 2 * 2

§ 96 distinct values

§ When you add in the ? and Ø operators, we end up with

· Number of permutations = 5 * 4 * 4 * 4 * 4 * 4 = 5120

Find-S algorithm

The Find-S algorithm find the maximally specific hypothesis that satifes a set of training values. To do so it works its way from the most specific hypothesis and gradually becomes more general.

Example

Sky

Air Temp

Humidity

Wind

Water

Forecast

Find-S algorithm

Set h to most specific hypothesis in H
For each positive training instance x in X
For each attribute constraint ai in h
If the constraint is satisfied by the corresponding attribute in x
Then do nothing
Else replace ai in h by a more general constraint (?) that is satisfied by x
Output hypothesis h

Find-S for the enjoy sport example.

h0 = < Ø, Ø, Ø, Ø, Ø, Ø>
h1 = <Sunny, Warm, Normal, Strong, Warm, Same>
h2 = <Sunny, Warm, ?, Strong, Warm, Same>
h3 = <Sunny, Warm, ?, Strong, Warm, Same>
Note: Since day 3 was negative we did not use it.
h4 = <Sunny, Warm, ?, String, ?, ?>

Note: We did not use the positive values.

Assignment: Do Find-S algorithm starting with example 4 and working backwards to example 1. Repeat using examples in the following order 1, 3, 4, 2 and 2, 1, 3, 4. Show your work.

List then Eliminate

Terms:

· Version Space – all h’s within H

List then Eliminate algorithm

Initialize Version Space with all h
For each training example <x, c(x)>
Remove any h from Version Space for which h(x) != c(x)

This algorithm is brute force in every way.

· It requires enumeration of all h’s in H.

· They are all searched for each x in X.

Candidate Elimination Algorithm

Terms:

· Satisfy – x is said to satisfy h when h(x) = 1

o Only positive examples in X

· Consistent – x is consistent with h when h(x) = c(x)

o All examples in X

· Version Space – set of all hypotheses consistent with the observed training data.

Candidate Elimination Algorithm

Initialize G to most general hypothesis
Initialize S to the most specific hypothesis
For each example d, do
If d is a positive example
Remove from G all g not consistent with d
For each hypothesis s in S the is not consistent with d
Remove s from S
Add to S all minimal generalizations h of s such that
H is constant with d and some member of G is more general that h.
Remove from S any hypothesis that is more general than another hypothesis in S.
If d is a negative example:
Remove from S any hypothesis inconsistent with d.
For each hypothesis g in G that is not consistent with d
Remove g from G
Add to G all minimal specializations h of g such that
H is consistent with d, and some member of S is more specific than h.
Remove from G any hypothesis that is less general than another hypothesis in G.

So, we work from specific to general using the positive examples, and general to specific using the negative examples.

Example

Sky

Air Temp

Humidity

Wind

Water

Forecast

Here is the above example worked starting with example 4 and going backwards.

S’s

S0 = < Ø, Ø, Ø, Ø, Ø, Ø>
S3, S4 = <Sunny, Warm, High, Strong, Cool, Change>
S2 = <Sunny, Warm, High, Strong, ?, ?>
S1 = <Sunny, Warm, ?, Strong, ?, ? >

G’s

G0, G4 = <?, ?, ?, ?, ?, ?>
G3 = <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, Cool, ?>
G2 = <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>
G1 = <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>

From G1 and S1 we get

<Sunny, ?, ?, ?, ?, ?> Masked with <Sunny, Warm, ?, Strong, ?, ? >

results in:

<Sunny, Warm, ?, ?, ?, ?>, <Sunny, ?, ?, Strong, ?, ?>

and

<?, Warm, ?, ?, ?, ?> Masked with <Sunny, Warm, ?, Strong, ?, ? >

results in:

<Sunny, Warm, ?, ?, ?, ?>, <?, Warm, ?, Strong, ?, ?>

combining we get:

<Sunny, Warm, ?, ?, ?, ?>, <Sunny, ?, ?, Strong, ?, ?>, <?, Warm, ?, Strong, ?, ?>