Teaching Logic and Logic Programming in Undergraduate AI Course

Paper number YYY

Abstract

Almost every course on artificial intelligence discusses logic as a knowledge representation language. It also introduces a programming language at the beginning so that some of the algorithms discussed later in the course are implemented. Prolog is one of the popular languages used in AI courses. Some textbooks introduce logic after search algorithms and constraint satisfaction while other textbooks introduce logic right in the beginning. In this paper, we argue that it is better to teach logic very early in the course, in particular, if the programming language chosen is Prolog. It provides a basis for introducing declarative programming and simplifies the transition problems one faces in moving from procedural and object-oriented programming to declarative programming. Our empirical results suggest that teaching logic before Prolog enhances the retention of Prolog and declarative programming concepts taught.

Keywords: Artificial Intelligence, Prolog, Logic, Computer Science Education.

1Introduction

In their ACM Turing award lecture, Newell and Simon [14] argued that a physical symbol system has the necessary and sufficient means for general intelligent action – any system that exhibits general intelligence will prove upon analysis to be a physical symbol system and any physical symbol system of sufficient size can be organized further to exhibit general intelligence. This hypothesis (popularly known as physical symbol system hypothesis) placed knowledge representation, reasoning and search at the heart of artificial intelligence research. Almost every AI course covers knowledge representation and reasoning as well as search. However, the order in which they are covered differ from course to course and book to book. While Russell and Norvig [17], by far the most popular AI textbook, covers search first, other textbooks like Poole et al [15] and Luger [13] cover knowledge representation and reasoning first.

It is also customary for an AI course to introduce a programming language suitable for implementing intelligent systems so that students can implement a toy system exhibiting some sort of intelligence and play with various AI algorithms. While Prolog is popular in Europe, Asia and Australia, LISP is popular in North America. Generally, propositional and first-order logics are taught as part of knowledge representation and reasoning in AI courses throughout the world. In view of the strong relation between first-order logic and Prolog, we decided to go for Prolog[1] in our course. In this paper, we argue that covering logic, Prolog and search in this order has pedagogical advantages and students are well prepared for advanced topics and implementation of AI algorithms. We spend the first half of the course on introduction to AI (5%), logics and logic (Prolog) programming (25%) and search (20%) before covering topics like expert systems (10%), structural representations (8%), uncertainty (12%), learning (12%) and natural language understanding (8%).

2First-Order-Logic

AI course can build its logic coverage based on the introductory material covered in discrete structures course and cover the following topics:

  • translation of English sentences into first-order logic,
  • relation between logical consequence and proofs,
  • completeness and soundness of inference rules,
  • clause normal form and resolution, and
  • proof by contradiction.

Students come to AI course generally after a good training in imperative and object-oriented programming (from the 3 introductory courses recommended bythe ACM/IEEE computer curricula, CC2001 [1]). In view of this baggage, they struggle to adapt their thinking to declarative programming. Introduction of declarative programming as a continuation of discussion on logic eases this transfer to some extent. Translation between English sentences and first-order logic formulae is the first step in this direction.

In teaching translations between English and first-order logic, the rules of thumb in using various logical operators can be introduced. Functional completeness can be discussed by showing that any propositional formula can be equivalently stated using just negation and disjunction or negation and conjunction. Similarly, any first-order logic formula can be equivalently stated using only one of the two quantifiers, i.e., every first-order logic formula has an equivalent formula without existential quantifiers in it, and another formula without universal quantifiers in it.

The notions of validity, satisfiability, unsatisfiability, logical consequence are better to be introduced first in propositional setting using truth table and then extended to first-order logic using interpretations and models. We may then introduce inference rules, the notion of proof and the important concepts of soundness and completeness.

  • A set R of inference rules is sound if a proof of a formula F from a set  of formulae using inference rules in R implies that F is a logical consequence of , for any F and .
  • A set R of inference rules is complete if there exists a proof of F from  using inference rules in R whenever F is a logical consequence of .

If the inference engine of an intelligent system only uses sound inference rules, its conclusions can always be trusted. If its set of inference rules is complete, one can hope that every useful fact can be arrived at by the system. Since all the inference rules introduced as part of proof techniques (in discrete structures course) are sound, students generally have difficulty in appreciating the value of soundness and completeness. The rule of abduction widely used in day to day life (e.g., medical diagnosis) and AI systems is not a sound inference rule and can be used to illustrate the importance of soundness. The notion of completeness is best illustrated by showing that modus ponens is sound but not complete using the following example from Russell and Norvig [17].

It clearly follows from the following statements that one becomes rich.

  • If you go for PhD, you will get a good job. P  G
  • If you do not go for PhD, you will start earning early. P  E
  • If you get a good job, you will become rich. G  R
  • If you start earning early, you will become rich. E  R

However, we cannot prove it using modus ponens, while it can be proved using resolution.

The facts that resolution is both sound and complete over formulae in clause normal form and every propositional sentence has an equivalent formula in clause normal form make it an ideal candidate for implementation in an intelligent system. Further, every first-order logic formula can be transformed into a clause normal form (CNF) using Skolemization so that resolution can be applied to prove that a formula F is a logical consequence of a set  of formulae using proof by contradiction. At this point, the unification operation can be naturally introduced, completing a reasonable coverage of logic and deductive reasoning.

3Prolog Programming

The above coverage of logic –clause normal forms, unification, resolution and proof by contradiction– sets the stage for Prolog programming. At this point, it is rather easy to motivate logic programming, starting with the views expressed by Kowalski in his IFIP paper [11], where he advocated the use of predicate logic as a programming language and explaining the birth of Prolog (Programming in Logic) from the works of Colmerauer et al [6].

3.1Facts and rules

A Prolog program consists of a set of facts and rules. These facts and rules may contain variables, which are bounded variables with universal quantification. Having seen the procedure for clause normal form conversion, students do not have difficulty in noting the implicit information that the variables in Prolog programs are universally quantified[2]. Prolog computation is nothing but a proof by contradiction. To verify that atom A follows from the facts and rules in a Prolog program P, we ask a query ?- A. The Prolog interpreter attempts to derive an empty clause from the clauses (facts and rules) in P and A using resolution. If it succeeds in deriving an empty clause (proving that addition of A to P leads to contradiction), the Prolog interpreter gives the answer ‘yes’ and may give a substitution  (if needed) such that A is a logical consequence of P.

The following simple program drives home the implicit universal quantification of variables.

man(socrates).

mortal(X) :- man(X).

We can ask the Prolog interpreter whether Socrates is mortal by posing the query: ?- mortal(socrates). The answer ‘yes’ given by Prolog can be explained as the application of modus ponens and universal instantiation inference rules. In fact, execution of this query mimics the following argument[3] generally used as one of the first examples of proofs in first-order logic.

Socrates is a man.

All men are mortal.

Therefore, Socrates is mortal.

The above program also serves as a good example to explain Prolog’s syntactic convention that variables start with capitals and constants start with smalls (the name Socrates is coded as constant ‘socrates’ in the program).

The following program can be used to illustrate two important aspects of logic programming, namely, backtracking and generation of solutions – Prolog programs not only verify truth of a sentence (e.g., mortal(socrates)) like proofs covered in logic lessons do, but can also compute values of variables that make a sentence or formula true.

likes(pat, chinese). likes(pat, thai).likes(pat, italian).

likes(sue, italian).likes(sue, chinese).

invite(X, Y, F) :- likes(X, F), likes(Y, F).

The facts specify what food is liked by whom. The rule specifies that we may invite friends X and Y and prepare food F if both X and Y like it. Prolog interpreter generates two answers Food=chinese and Food=italian for the query ?-invite(pat, sue, Food). The generation of these answers in that particular order can be explained by drawing a proof tree in a systematic fashion using unification and resolution and stating that Prolog interpreter searches this proof tree following the depth-first search strategy generally covered in data structures course. In other words, Prolog execution may be seen as a proof by search [7].

3.2Programming as translation

The following exercise of implementing family relations illustrates programming as translation and the declarative nature of Prolog programming.

parent(pat, pam). parent(pat, jim).

parent(pat, tom).parent(mat, bob).

parent(mat, kim). parent(pam, kim).

parent(pam, bob). parent(tom, liz).

parent(kim, bev).

male(pat). female(pam).

male(tom). female(kim).

male(jim). female(liz).

male(bob). female(bev).

male(mat).

The above facts give all the information needed for defining various family relations. We define sibling, aunt, cousin, grand-father and ancestor relations to illustrate the derivation of Prolog programs from English definitions given by students.

A simple definition of the word sibling is: two people are siblings if they share a common parent.[4] It may be written a bit more formally as: X and Y are siblings if someone (say, Z) is a parent of both X and Y, i.e., Z is a parent of X and Z is a parent of Y. This English statement can be translated into Prolog as the following clause.

sibling(X, Y) :- parent(Z, X), parent(Z, Y).

Running the Prolog program obtained by adding this clause to the above facts, it can be shown that sibling(pam, pam) is true as per the above definition, which may not be the intention. Students can see where the problem is and suggest changing the clause by adding a condition that X and Y are different, obtaining

sibling(X, Y) :- parent(Z, X),

parent(Z, Y), X \== Y.

The oxford dictionary defines the word aunt as a sister of one’s parent. It may be translated as

aunt(X, Y) :- parent(Z, Y), sister(X, Z).

Since sister is essentially a female sibling, the above clause can be rewritten as

aunt(X, Y) :- parent(Z, Y), sibling(X, Z),

female(X).

Another solution is to keep the original clause and add a clause sister(X, Y) :- sibling(X, Y), female(X) defining the relation sister.

The oxford dictionary defines the word cousin as a child of one’s aunt or uncle. It may be rewritten as: two people are cousins if they are children of siblings, giving the clause

cousin(X, Y) :- parent(Z1, X), parent(Z2, Y),

sibling(Z1, Z2).

Grandfather is father of one’s parent. It may be translated as

g_dad(X, Y) :- parent(Z, Y), father(X, Z).

It may be further rewritten as

g_dad(X, Y) :- parent(Z, Y), parent(X, Z),

male(X).

The following example demonstrates the use of Prolog (and derivation of Prolog programs from general knowledge and information) in implementing games.[5]

A player wins in Tic-Tac-Toe by winning a row, column or diagonal. This information may be coded as

wins(P) :- wins_row(P).

wins(P) :- wins_coumnl(P).

wins(P) :- wins_diagonal(P).

A player wins a row if all the three cells in that row are marked by his symbol.

wins_row(P):- cell(R, 1, P), cell(R, 2, P),

cell(R, 3, P).

Similarly, the code for wins_col(P) is

wins_column(P):- cell(1, C, P), cell(2, C, P),

cell(3, C, P).

The two ways of winning by diagonals may be coded as

wins_diagonal(P):- cell(1, 1, P), cell(2, 2, P),

cell(3, 3, P).

wins_diagonal(P):- cell(1, 3, P), cell(2, 2, P),

cell(3, 1, P).

This example can be used to drive home the point that declarative programming helps in writing programs in a very natural fashion if one can get away from the habits of procedural and object-oriented programming. The students generally come to AI course after a good training in Java and C++ programming. The introduction of Prolog (immediately) after discussing logic helps in simplifying the transition problems one faces in moving from one programming paradigm to another.

3.3Programming Style

This subsection discusses programming styles that go well with Prolog programming, starting with how we can relate recursive programs to recursive definitions and induction proofs.

3.3.1Recursive programs

Continuing the family relations, the ancestor relation may be defined as: X is an ancestor of Y if (a) X is a parent of Y – base case, or (b) X is a parent of an ancestor of Y – inductive case. This is reflected by the following Prolog clauses.

ancestor(X, Y) :- parent(X, Y).

ancestor(X, Y) :- parent(X, Z),

ancestor(Z, Y).

Recursive definitions are usually illustrated with examples from Piano arithmetic, like, 0 + Y = Yand s(X) + Y = s(X+Y). The following Prolog program reflects this recursion.

add(0, Y, Y).

add(s(X), Y, s(Z)) :- add(X, Y, Z).

Lists and binary trees are widely used in many Prolog books to illustrate recursive programming in view of the inherent recursive nature of these data structures. The program for append predicate has exactly the same structure as the above addition program.

app([ ], Ys, Ys).

app([X|Xs], Ys, [X|Zs]) :- app(Xs, Ys, Zs).

The following member and count programs for binary trees illustrate the case analysis and divide-and-conquer strategy so common in Prolog programs with recursion.

An element X occurs in a binary tree T if it is (a) the root node or (b) in the left subtree or (c) in the right subtree.

t_member(X, tree(X, L, R)).

t_member(X, tree(Y, L, R)) :-

t_member(X, L).

t_member(X, tree(Y, L, R)) :-

t_member(X, R).

The number of nodes in an empty tree is zero and the number of nodes in tree(Y, L, R) is the sum of the number of nodes in L and R plus one.

count(void, 0).– void is the empty tree.

count(tree(Y, L, R), N) :-

count(L, Nl), count(R, Nr), N is Nl+Nr+1.

The above examples essentially demonstrate the point made in [8] that recursion is natural and simpler to teach in declarative programming paradigm.

3.3.2Reversibility of Prolog programs

Reversibility is one of the nice properties of logic programs – a program may be used for multiple purposes. For example, the standard append program can be used to find the concatenation of two lists as well as to find all possible ways of breaking a given list – ?-app([a,b], [1,2,3], Z) returns the concatenation of [a,b] and [1,2,3] as the answer Z=[a,b,1,2,3], while ?-app(X,Y,[1,2,3,4]) returns the five different ways of splitting [1,2,3,4]. The following example illustrates how this reversibility helps in writing programs in a natural way.

The predicate pos(E, K, L) is true if and only if E occurs as the Kth element of L. In other words, pos(E, K, L) is true when L can be seen as parts: element E, elements to the left of E (say, list L1), and elements to the right of E (say, list L2) such that the length of L1 is K-1. Taking advantage of Prolog’s list notation, we may say that L has two parts: L1 and [E|L2]. In terms of the semantics of append, L is the result of appending L1 and [E|L2], giving the following program[6].

pos(E, K, L) :- app(L1, [E|L2], L),

length(L1, K1), K is K1+1.

length([ ], 0).

length([X|Xs], S) :- length(Xs, S1), S is S1+1.

This program not only checks whether E occurs as the Kth element of L or not, but also gives all positions in which E occurs, exhibiting reversibility. For example, the query ?-pos(a, K, [a,b,c,a,d,a]) has three answers K=1, K=4 and K=6.

3.3.3Declarative and Procedural Semantics – tail recursion

The Prolog clause

A :- B, C

has two meanings: (a) A is true if B and C are true – declarative meaning, and (b) we may prove A by proving B and C – procedural meaning. The declarative meaning is closely related to semantics of logic, while procedural meaning is tightly related to the implementation decisions made in Prolog interpreter, like matching (unifying) clauses in the order they are written and executing the atoms (subgoals) in the body from left to right. By the commutative law of conjunction in logic, the clauses A :- B, C and A :- C, B have the same declarative meaning, however their procedural meaning can differ. This point can be best illustrated by running two modified programs of ancestor

(A2)ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).

ancestor(X, Y) :- parent(X, Y).

and

(A3)ancestor(X, Y) :- parent(X, Y).

ancestor(X,Y) :- ancestor(Z, Y), parent(Z, X).

Program (A2) works as well as the original program given in subsection 3.3.1 for all possible queries, except that the order in which solutions are given (for queries with multiple answers) is different, while program (A3) loops for queries like ?- ancestor (X, pat) and ?- ancestor (jim, pat) – essentially because parent(W, pat) does not succeed for any value of W.

This example nicely brings the advantages of tail recursion to the attention of students. It is advisable to prefer tail recursion whenever the arguments of the recursive atom are not subterms of the arguments of the head. We say it is preferable (not mandatory) to use tail recursion as there are programs (e.g., multiplication and the program for count given above) for which tail recursion is not appropriate.

To summarize this subsection, changing the order of atoms and clauses does not change the declarative meaning while it may change the procedural meaning – the order in which solutions are generated changes and program may loop.

4Empirical results

The section presents the empirical results of our experiment conducted over 5 semesters involving 3 teachers and 15 sections (little over 400 students). To ensure that the results of our analysis are not affected by the personal characteristics of teachers and/or students, each teacher taught the course using both the sequences of coverage. Our section sizes are capped to a maximum of 30 students and students are given freedom to choose the section to register for.