Chapter 11 . Trees

chapter 11 . Trees:

Properties of Trees

Binary Trees

Vector Implementation

Dynamic Memory Implementation

Application: "Guess the Animal" Game

Operator Precedence Parsing

Tree Traversals

Postorder Tree Traversal Iterator

Preorder Tree Traversal Iterator

Inorder Tree Traversal Iterator

Level-Order Tree Traversal Iterator

Binary Tree Representation of General Trees

Properties of Trees

Trees in general (and binary trees in particular) are a common way to store data.

Trees are often used in everyday life to represent information in a hierarchical fashion. Common uses of trees include:

Organization charts

Pairings in a tournament

Family trees

Family trees account for much of the terminology used in the description of trees (parent, child, and so on).

A family tree shows the descendants of a particular individual:

An ancestor tree shows the ancestors of an individual:

In general, a tree consists of nodes, which are connected by directed arcs. A tree has a single root, which is shown at the top of the tree.

Each node may point to other nodes (its children); the node itself is considered to be the parent of these children. The descendants of a node are its children, the children of those nodes, and so on.

Nodes that do not have children are called leaf nodes, or leaves. A node that does have children is called an interior node.

In a tree, there must be a single unique path from the root to any particular node. This rule prevents children from having more than one parent. It also prevents a tree from having circular paths.

The height of a tree is the length of the longest path from the root to any leaf.

Each node in a tree can itself be considered to be the root of a tree. The node, along with its descendants, is said to be a subtree of the original tree. The following figure shows a tree and one of its subtrees:

Trees are often used to describe the structure of a computer program. A parse tree is a tree that shows the structure of a program (or a portion of a program).

Parse trees are constructed from a grammar. A grammar is a set of rules that describe the syntax of a programming language. Here is a portion of the grammar for Java:

<statement> ::= <select-statement> | <expr>

<select-statement> ::= if ( <expr> ) <statement> else <statement>

<expr> ::= <relation-expr> | <assign-expr> | identifier

<relational-expr> ::= <expr> < <expr>

<assign-expr> ::= <expr> = <expr>

In a parse tree, each leaf node represents a token. Interior nodes represent syntactic categories, such as <statement> or <expr>. When given a program to compile, a compiler might create a parse tree from the tokens in the program.

The following is a parse tree for the statement

if (a < b) max = b else max = a

A parse tree can be condensed to a different form known as an expression tree (or abstract syntax tree). In such a tree, leaves represent operands and interior nodes represent operators. The expression A + (B + C) * D would have the following expression tree:

A tree can be represented in Java by node objects connected by pointers. The following class could be used to represent individual nodes. In the following declaration, a a node will have a left and right child as well as a parent. However, there are trees that can be implemented so that a node can have any number of children or only a left and right child with no parent.

public class treeNode<T> {

private T value;

private treeNode<TleftChild;

private treeNode<T> rightChild;

private treeNode<T> parent;

}

public class BinaryTree<T>

{

protected final max int MAX_NODES = 100;

protected ArrayList<treeNode<T> tree;

….

}

Binary Trees

A tree in which each node has at most two children, a left child and a right child, is said to be a binary tree.

Binary trees are more frequently encountered in algorithms than general trees.

Binary trees have a number of important properties. One is the relationship between the number of nodes in a tree and the height of the tree.

In a full binary tree, every node has two children and the distance from the root to any leaf is the same.

Theorem: A full binary tree of height n will have 2n leaves.

Theorem: The number of nodes in a full binary tree of height n is 2n+1 - 1.

Both theorems can be proved by induction on n.

A complete binary tree is similar to a full binary tree, except that some of the rightmost leaves may be missing on the bottom level of the tree:

A complete binary tree of height h has between 2h and 2h+1 - 1 nodes.

A complete binary tree containing n nodes must have a height greater than or equal to

and less than

Consequently, the longest path from root to leaf in a complete binary tree is O(log n).

Theorem: A binary tree containing n nodes must have at least one path from root to leaf of length

Theorem: In a complete binary tree containing n nodes, the longest path from root to leaf traverses no more than

nodes.

A height-balanced binary tree has the property that, for each node, the difference between the heights of its left subtree and its right subtree is no larger than 1. Also known as an AVL tree (named after the discoverers Adelson-Velskii and Landis).

A complete binary tree is height-balanced, so the largest number of nodes in a balanced binary tree of height n is 2n+1 - 1.

Computing the smallest number of nodes in a height-balanced tree is a little harder. In general, for a tree of height n, the smallest number of nodes is found by connecting the smallest trees of heights n - 1 and n - 2.

Let Mn represent the minimum number of nodes for a height-balanced tree of height n. The value of Mn is expressed by the following equations:

M0 = 1

M1 = 2

Mn+1 = Mn-1 + Mn + 1

This sequence is similar to the Fibonacci number sequence, whose elements are bounded by 2n. More precisely, it can be shown that

From this approximation, it follows that

This means that the longest path in a height-balanced binary tree with n nodes is no more than 44% longer than the log n minimum for a tree with n nodes. As a result, algorithms that run in time proportional to the length of the longest path in a tree will be O(log n) when used with height-balanced trees.

Array Based Implementation

A complete binary tree can be easily stored in a vector. Storing a tree as a vector is much more efficient than storing it as nodes connected by pointers.

When a tree is stored in vector form, the root of the tree occupies element 0 of the vector. In general, the children of the node in element i of the vector are stored in elements 2 x i + 1 and 2 x i + 2

For example, the tree

would be represented by the following vector:

The parent of the node at position i (for i > 0) will be stored at position (i - 1)/2.

Using a vector to implement a tree is advisable only when the tree is complete. Otherwise, many of the vector's elements may be unused.

For example, a tree consisting of four nodes, in which each node is a right child of the previous node, would be represented by the following vector:

Dynamic Memory Implementation

General binary trees are usually stored using nodes, with each node containing a value plus pointers to the node's left and right children (and possibly a pointer to the node's parent).

The node template of Chapter 17 can be used to build a binary tree. Although several node functions were implemented in Chapter 17, the copy and release functions were not.

The copy function returns a duplicate of an entire tree rooted at a particular node. The parent for the new tree is supplied as an argument to the function. The function does not alter the original tree. What it does is copy from the original tree a new tree given a particular node for the new parent (as the parameter). And the function recursively calls itself for child subtrees.

This function returns a copy of the tree rooted at node:

public treeNode<T> node<T>::copy(treeNode<T> newparent)

{

treeNode<T> nodenode = new treeNode<T>(value, newparent, 0, 0);

if (leftChild)

newnode.leftChild = leftChild.copy(newnode);

if (rightChild)

newnode.rightChild = rightChild.copy(newnode);

return newnode;

}

The release function frees the memory associated with the children of a node. This function is optional since Java provides an automatic garbage collector.

public void treeNode<T>::release()

{

if (leftChild) {

leftChild.release();

delete leftChild;

leftChild = 0;

}

if (rightChild) {

rightChild.release();

delete rightChild;

rightChild = 0;

}

Application: "Guess the Animal" Game

A "guess the animal" game can be constructed around a binary tree. The user first thinks of an animal. The program then asks the user a series of yes/no questions, attempting to determine which animal the user has chosen.

Does it live in water?

Does it bark?

I know. Is it a cat?

yes

I won!

Try again?

The program stores its data as a binary tree in which interior nodes represent questions and leaves represent animals.

The tree originally consists of a single node, containing the string "cat". When the program fails to guess the user's animal, it asks the user to enter a question that can be used to distinguish the animal from other animals.

Does it live in water?

yes

I know. Is it a fish?

What is the animal you had in mind?

a duck

What is a yes/no question that I can use to tell a fish from a duck?

Does it have webbed feet?

For a duck is the answer yes or no?

yes

Try again?

The animalGame function uses the root variable to point to the root of the binary tree. current points to the current node (the one containing the question currently being asked).

publicvoid animalGame()

{

treeNode<string> root = new treeNode<string>("cat", 0, 0, 0);

treeNode<string> *current = root;

System.out.println("Let's play guess the animal.");

while (current != 0) {

if (current.leftChild != 0) {

System.out.println(“ current.value “);

if (answer())

current = current.leftChild;

else

current = current.rightChild;

} else {

System.out.println("I know. Is it a " + current.value + " ?");

if (answer())

System.out.println( "I won.");

else {

learnNewAnimal(current);

}

System.out.println("Try again?");

if (answer())

current = root;

else

return;

}

The answer function reads the answer to a yes/no question.

public bool answer()

{

while (1) {

string ans;

ans = SimpleIO.readLine();

if ((ans[0] == 'y') || (ans[0] == 'Y'))

return true;

else if ((ans[0] == 'n') || (ans[0] == 'N'))

return false;

System.out.println("please answer yes or no." );

}

When the program fails to guess the user's animal, it calls the learnNewAnimal function. This function asks the user to enter the new animal and the distinguishing question. The current node, which must have been a leaf, is replaced by a node containing the question. The new animal and the animal formerly stored in the current node are stored in children of the question node.

publicvoid learnNewAnimal(treeNode<string> current)

{

string currentAnimal = current.value;

System.out.println("what is your animal?" );

string newAnimal;

newAnimal = SimpleIO.readLine()

System.out.println("What is a yes/no question that I can use to tell a " +

current.value + " from a " + newAnimal " ?");

string newQuestion;

treeNode<string> node1 = new treeNode<string>(newAnimal, current, 0, 0);

treeNode<string> node2 = new treenode<string>(currentAnimal, current, 0, 0);

newQuestion = SimpleIO.readLine();

System.out.println("For a " + newAnimal + " is the answer yes or no?" );

if (answer()) {

current.leftChild = node1;

current.rightChild = node2;

} else {

Current.leftChild = node2;

Current.rightChild = node1;

}

Current.value = newQuestion;

}

Operator Precedence Parsing

Chapter 16 described an algorithm to convert an infix expression to postfix form. The technique used in that algorithm, known as operator precedence parsing, can also be used to convert an expression into an expression tree.

The new algorithm will rely on the operators type which are:

Operators  leftparen, plus, minus, times, divide, modulo;

To simplify matters, expressions will be restricted to binary operators, one-letter identifiers, and parentheses.

A new class, expressionInformation, will be used for the information stored in a node. The type member stores the type of the node. If the node represents an identifier, the name member contains the identifier else the field is unused.

class expressionInformation {

// data members

public operators type;

public string name;

// constructors

expressionInformation(char n): type(identifier), name(string(1, n)) {}

expressionInformation(operators op): type(op) {}

};

The expression parsing algorithm uses two stacks:

The operator stack, containing operators whose operands have not yet been completely processed.

The operand stack, containing expression trees that have already been processed.

How it works:

When an operand is encountered in the input expression, it is converted to a node and pushed onto the operand stack.

When an operator is encountered in the input expression, it is compared with the top item on the operator stack.

If it has lower precedence than the top operator, the top operator is popped.
The operands for this operator are popped from the operand stack, and the operator is made the root of a tree in which the operands are children.
The new tree is then pushed onto the operand stack.

At the end of the input expression, any remaining operators on the operator stack are popped and combined with the remaining operands on the operand stack.

The final expression tree will be the only element left on the operand stack.

The following figure shows how the two stacks are used by the algorithm:

The doBinary function is called when an operator is popped from the operator stack. The function pops two elements from the operand stack and combines them with the operator into a single tree. The tree is then pushed onto the operand stack.

void doBinary(operators theOp, stack<node<expressionInformation>*> operandStack)

{

node<expressionInformation> right = operandStack.top();

operandStack.pop();

node<expressionInformation> left = operandStack.top();

operandStack.pop();

node<expressionInformation> newNode =

new node<expressionInformation>(theOp, 0, left, right);

left.parent = newNode;

right.parent = newNode;

assert(newNode != 0);

operandStack.push(newNode);

}

The processOp function is called when an operator is encountered in the input string. It compares the operator's precedence with that of the top item on the operator stack.

void processOp(operators theOp, stack<operators> operatorStack,

stack<node<expressionInformation&operandStack)

{

while ((!operatorStack.empty()) &

(theOp < operatorStack.top())) {

doBinary(operatorStack.top(), operandStack);

operatorStack.pop();

}

operatorStack.push(theOp);

}

This algorithm has the same flaws as the one in for stacks, because + and - are treated as having different precedence, as are * and /. Also, the algorithm fails to take associativity into account.

The parse function implements the entire parsing process, taking the input string as its argument and returning a pointer to an expression tree.

node<expressionInformation> parse(string& inputString)

{

stack<operators> operatorStack;

stack<node<expressionInformation>*> operandStack;

int i = 0;

while (i < inputString.length()) {

if (isLetterOrDigit(inputString[i])) {

node<expressionInformation> newNode =

new node<expressionInformation>

(expressionInformation(inputString[i++]), 0, 0, 0);

operandStack.push(newNode);

} else

switch (inputString[i++]) {

case '(':

operatorStack.push(leftparen);

break;

case ')':

while (operatorStack.top() != leftparen) {

dobinary(operatorStack.top(), operandStack);

operatorStack.pop();

}

operandStack.pop();

break;

case '+':

processOp(plus, operatorStack, operandStack);

break;

case '-':

processOp(minus, operatorStack, operandStack);

break;

case '*':

processOp(times, operatorStack, operandStack);

break;

case '/':

processOp(divide, operatorStack, operandStack);

break;

}

while (!operatorStack.empty()) {

doBinary(operatorStack.top(), operandStack);

operatorStack.pop();

}

return operandStack.top();

}

Tree Traversals

It is often necessary to visit all nodes in a binary tree ("traverse" the tree). Traversing a tree is more complicated than traversing the elements of most data structures. For one thing, there is no single "correct" order in which to visit the nodes.

Most traversal algorithms consist of three steps:

Process a node.

Recursively visit and process the node's left subtree.

Recursively visit and process the node's right subtree.

By varying the order in which these steps are done, six different traversals are possible. Since traversals are almost always done from left to right, only three of these traversals are commonly used:

Preorder: Visit node (value) first, then left child, then right child.

Inorder: Visit left child first, then node (value), then right child.

Postorder: Visit left child first, then right child, then node (value).

The other 3 possibilities are:

Visit node (value) first, then right child, then left child.
Visit right child first, then node (value), then left child.
Visit right child first, then left child, then node (value).

When an expression tree is traversed in one of these ways, with the contents of each node printed when the node is visited, the result is either a prefix expression, an infix expression, or a postfix expression.

Consider the result of traversing the tree on the next page:

Preorder (prefix) traversal: + A * + B C D

Inorder (infix) traversal: A + B + C * D

Postorder (postfix) traversal: A B C + D * +

It is easy to write recursive functions that perform each type of traversal:

void preorder(node<T> current)

{

if (current) {

process(current.value);

preorder(current.leftChild);

preorder(current.rightChild);

}

void inorder(node<T> current)

{

if (current) {

inorder(current.leftChild);

process(current.value);

inorder(current.rightChild);

}

void postorder(node<T> current)

{

if (current) {

postorder(current.leftChild);

postorder(current.rightChild);

process(current.value);

}

Except for the recursion, the amount of work performed for each node is constant. Therefore, the entire traversal can be done in O(n) steps, where n is the number of nodes.

Level-Order Tree Traversal Iterator

Another traversal method, less common than the others, is the level-order traversal. This type of traversal visits all nodes at level 1 (the root), then all nodes at level 2, and so on: