UNIT – IV
NON LINEAR DATA STRUCTURES
INTRODUCTION TO LINEAR AND NON LINEAR DATA STRUCTURES
- Data structures are categorized in to two classes: linear and non-linear.
- In a linear data structure, member elements form a sequence. Such linear structures can be represented in memory by using one of the two basic strategies.
-by having the linear relationship between the elements represented by means of sequential memory locations. These linear structures are called arrays.
-By having relationship between the elements represented by pointers. These structures are called linked lists.
Arrays are useful when the number of elements to be stored is fixed. Operations like traversal, searching and sorting can easily be performed on arrays. On the other hand, linked lists are useful when the number of data items in the collection is likely to change.
These are various non-linear structures, such as, trees and graphs and various operations can be performed on these data structures such as:
Traversal One of the most important operations which involves each element in the list.
Searching Searching or finding any element with a given value or the record with a given key.
Insertion: Adding a new element to the list.
Deletion: Removing an element from the list.
Sorting: Arranging the elements in some order
Merging: Combining two lists into a single list.
1. TREES STRUCTERS
- NEED FOR NON-LINEAR STRUCTURES
- TREE ADT
2.1Preliminaries
2.1. Definitions
- A tree
- Is a collection of nodes.
- The collection can be empty.
- If not empty, a tree consists of a distinguished node r, called the root, and zero or more nonempty sub trees T1, T2, ...., Tk, each of whose roots are connected by a directed edge from r.
Figure 4.1 Generic tree (P.106)
Figure 4.2 A tree (P.106)
- Parent
- Children
- Leaves: Nodes with no children
- Siblings: Nodes with same parent
- Path
- Length: Is the number of edges on the path.
- Depth of ni: Is the length of the unique path from the
root to ni.
- Height of ni: Is the length of the longest path from ni
to a leaf. (all leaves are at height 0)
- The height of a tree is equal to the height of the root.
- The depth of a tree is equal to the depth of the deepest leaf.
–A node may have arbitrary number of children (including zero)
- Node A above has 6 children: B, C, D, E, F, G.
–Nodes with no children are called leaves.
- B, C, H, I, P, Q, K, L, M, N are leaves in the tree above.
–Nodes with the same parent are called siblings.
- K, L, M are siblings since F is parent of all of them.
1.2Implementation of Trees
- Each node consists of its data and a pointer to each child of the node.
- Since the number of children per node is not known in advance, keep the children of each node in a linked list of tree nodes.
typedef struct tree_node *tree_ptr;
struct tree_node
{
element_type element;
tree_ptr first_child;
tree_ptr next_sibling;
};
Figure 4.3 Node declarations for trees
Figure 4.4 First child/next sibling representation of the tree shown in Figure 4.2(P.107)
- TREE TRAVERSALS WITH AN APPLICATION
- Many applications for trees. One of the popular uses is the directory structure in many common operating systems, including UNIX, VAX/VMS, and DOS.
Fig. 4.5 (P.92)
(a typical directory in the UNIX file system)
- This is a very popular hierarchical system.
- Two files in different directories can share the same name.
Fig. 4.6 (P.92)
(files that are depth di will have their names indented by di tabs)
Figure 4.5 Unix directory
Traversal strategies
- Preorder traversal
–Work at a node is performed before its children are processed.
- Postorder traversal
–Work at a node is performed after its children are processed.
- Inorder traversal (for binary trees)
–For each node:
First left child is processed, then the work at the node is performed, and then the right child is processed.
- Preorder traversal
–The work at a node is performed before (pre) its children are processed.
–The running time is O(N) because the total amount of work is constant per node.
voidlist_directory ( Directory_or_file D )
{
list_dir ( D, 0 );
}
voidlist_dir ( Directory_or_file D, unsigned int depth )
{
if ( D is a legitimate entry)
{
print_name ( depth, D );
if( D is a directory )
for each child, c, of D
list_dir( c, depth+1 );
}
}
Figure 4.6 Routine to list a directory in a hierarchical file system
- Postorder traversal
–The work at a node is performed after (post) its children are evaluated.
Fig. 4.8 (P.94)
(the numbers in parentheses representing the number of disk blocks taken up by each file.)
- Since the directories are themselves files, they have sizes too.
unsigned int
size_directory( Directory_or_file D )
{
unsigned int total_size;
total_size = 0;
if( D is a legitimate entry)
{
total_size = file_size( D );
if( D is a directory )
for each child, c, of D
total_size += size_directory( c );
}
return( total_size );
}
Figure 4.9 Routine to calculate the size of a directoryig. (P.94)
- If D is not a directory, then SizeDirectory only returns the number of blocks used by D.
Fig. 4.10 (P.95)
(trace of the function in Fig. 4.9)
LEFT CHILD RIGHT SIBLING DATA TRUCTURES FOR GENERAL TREES
2.BINARY TREES ADT
- A binary tree is a tree in which no node can have more than two children.
Fig. 4.11 Generic binary tree (P.112)
- The two subtrees, TL and TR, both of which could possibly be empty.
- The depth of an average binary tree is considerably smaller than N.
- Average depth is
- The average depth for a binary search tree is O(log N).
- But the depth can be as large as N - 1 in the worst case. (Fig. 4.12 P.112)
2.1Implementation
Fig. 4.13 (P.97)
- Because a binary tree has at most two children, we can keep direct pointers to them.
- A node is a structure consisting of the Key information plus two pointers (Left and Right) to other nodes.
typedef struct tree_node *tree_ptr;
struct tree_node
{
element_type element;
tree_ptr left;
tree_ptr right;
};
typedef tree_ptr TREE;
Figure 4.13 Binary tree node declarations
2.2EXPRESSION TREES
Fig. 4.14 Figure 4.14 Expression tree for (a + b * c) + ((d * e + f ) * g)
There are three notations for a mathematical expression:
1)INFIX NOTATION : (a + (b × c)) + (((d ×e) + f) × c)
2)POSTFIX NOTATION: a b c × + d e × f + g * +
3) PREFIX NOTATION: + + a × b c × + × d e f g
- The leaves are operands such as constants or variables.
- The other nodes contain operators.
- An expression tree T can be evaluated by applying the operator at the root to the values obtained by recursively evaluating the left and right subtrees.
- Inorder Traversal
–(left subtree, node, right subtree) strategy.
–Produce infix expression
- Postorder traversal
–(left subtree, right subtree, node) strategy.
–Produce postfix expression
- Preorder traversal
–(node, left subtree, right subtree) strategy
–Produce prefix expression
- To convert a postfix expression into an expression tree
- Read the expression one symbol at a time.
- If the symbol is an operand, create a one-node tree and push a pointer to it onto a stack.
- If the symbol is an operator, pop pointers to two trees T1 and T2 from the stack and form a new tree whose root is the operator and whose left and right children point to T2 and T1 respectively. A pointer to this new tree is then pushed onto the stack.
- Example:
Convert a b + c d e + * * into an expression tree.
Diagrams (P.98-100)
2.3 APPLICATION OF TREES
- Tree Applications
- There many applications of trees in Computer Science and Engineering.
–Organization of filenames in a Unix File System
–Indexing large files in Database Management Systems
–Compiler Design.
–Routing Table representation for fast lookups of IP addresses
–Search Trees for fast access to stored items
3.BINARY SEARCH TREES ADT
- For every node, X, in the tree, the values of all the keys in its left subtree are smaller than the key value in X, and the values of all the keys in its right subtree are larger than the key value in X.
- Because the average depth of a binary search tree is O(log N), no need to worry about running out of stack space.
Figure 4.15 Two binary trees (only the left tree is a search tree) (P.117)
Implementation
typedef struct tree_node *tree_ptr;
struct tree_node
{
element_type element;
tree_ptr left;
tree_ptr right;
};
typedef tree_ptr SEARCH_TREE;
Figure 4.16 Binary search tree declarations (P.118)
- MakeEmpty
SEARCH_TREE
make_null ( void )
{
return NULL;
}
Figure 4.17 Routine to make an empty tree(P.118)
- Find
–Return a pointer to the node in tree T that has key X, or NULL if there is no such node.
–The test for an empty tree be performed first, since otherwise the indirection would be on a NULL pointer.
–The use of tail recursion is justifiable here because the simplicity of algorithmic expression compensates for the decrease in speed, and the amount of stack space used is expected to be only O(log N).
tree_ptr
find( element_type x, SEARCH_TREE T )
{
if( T == NULL )
return NULL;
if( x T-element )
return( find( x, T-left ) );
else
if( x T-element )
return( find( x, T-right ) );
else
return T;
}
Figure 4.18 Find operation for binary search trees(P.119)
- FindMin
–Return the position of the smallest elements in the tree.
–Start at the root and go left as long as there is a left child. The stopping point is the smallest element.
tree_ptr
find_min( SEARCH_TREE T )
{
if( T == NULL )
return NULL;
else
if( T->left == NULL )
return( T );
else
return( find_min ( T-left ) );
}
- Figure 4.19 Recursive implementation of find_min for binary search trees (P.119)
- FindMax
–Return the position of the largest elements in the tree.
tree_ptr
find_max( SEARCH_TREE T )
{
if( T != NULL )
while( T-right != NULL )
T = T-right;
return T;
}
Figure 4.20 Nonrecursive implementation of find_max for binary search trees (P.104)
- Insert
- Fig. 4.22 (P.105)
- Fig. 4.21 (P.104)
- To insert X into tree T:
- If X is found, do nothing.
- Otherwise, insert X at the last spot on the path traversed.
- Duplicates can be handled by:
–keeping an extra field in the node record indicating the frequency of occurrence or
–keeping all of the structures that have the same key in an auxiliary data structure, such as a list or another search tree.
- Delete
- Fig. 4.25 (P.107)
- Fig. 4.23 - 4.24 (P.106)
- If node is a leaf, it can be deleted immediately.
- If the node has one child, the node can be deleted after its parent adjusts a pointer to bypass the node.
- If the node has 2 children, replace the data of this node with the smallest data of the right subtree and recursively delete that node.
- Because the smallest node in the right subtree cannot have a left child, the second Delete is an easy one.
- Lazy deletion
–When an element is to be deleted, it is left in the tree and only marked as being deleted.
–If a deleted key is reinserted, the overhead of allocating a new cell is avoided.
6. TREE TRAVERSALS (REVISITED)
- Inorder traversal
–To process the left subtree first, then perform processing at the current node, and finally process the right subtree.
–The running time is O(N) because constant work is performed at every node.
–Fig. 4.56 (P.132)
–(To list all the keys in sorted order)
- Postorder traversal
–Process both subtrees first before process a node.
–Example: to compute the height of a node.
Fig. 4.57 (P.133)
–The running time is O(N) because constant work is performed at each node.
- Preorder traversal
- The node is processed before the children.
- Example: to label each node with its depth.
–Level-order traversal
- All nodes at depth D are processed before any node at depth D+1.
It is differs from the other traversals in that it is not done recursively; a queue is used.
4. AVL TREES
- Is a binary search tree with a balance condition.
- Condition:
Every node must have left and right subtrees of the same height.
- Only perfectly balanced trees of 2k -1 nodes would satisfy this condition.
- Although this guarantees trees of small depth, the balance condition is too rigid.
- AVL tree
- Is identical to a binary search tree, except that for every node in the tree, the height of the left and right subtrees can differ by at most 1.
Fig. 4.29 (P.111)
- Height information is kept for each node.
- Example:
An AVL tree of height 9 with the fewest nodes (143).
Fig. 4.30 (P.112)
- The minimum number of nodes, S(h), in an AVL tree of height h is given by:
S(h) = S(h-1) + S(h-2) + 1
S(0) = 1
S(1) = 2
- Balance condition violation
Let the node that must be rebalanced be
Since any node has at most 2 children, and a height imbalance requires that 's 2 subtrees' height differ by 2, it is easy to see that a violation might occur in 4 cases:
- An insertion into the left subtree of the left child of . (L-L)
- An insertion into the right subtree of the left child of .(L-R)
- An insertion into the left subtree of the right child of .(R-L)
- An insertion into the right subtree of the right child of . (R-R)
Cases 1 and 4 are fixed by a single rotation of the tree.
Cases 2 and 3 are fixed by a double rotation of the tree.
4.1Single Rotation
- Single rotation that fixes case 1
Fig. 4.31 (P.113)
- Node k2 violates the AVL balance property because its left subtree is 2 levels deeper than its right subtree.
- In the original tree k2 > k1, so k2 becomes the right child of k1 in the new tree.
- Subtree Y, which hold items that are between k1 and k2 in the original tree, can be placed as k2's left child in the new tree.
- The new height of the entire subtree is exactly the same as the height of the original subtree prior to the insertion that caused X to grow.
- Example
Fig. 4.32 (P.113)
(insert 6 into the original AVL tree on the left)
- Single rotation that fixes case 4
Fig. 4.33 (P.114)
- Example
- Insert the keys 3, 2, 1 and then 4 through 7 in sequential order into an initially empty AVL tree.
Diagrams (P.114-115)
4.2Double Rotation
- Single rotation does not work for cases 2 or 3.
Fig. 4.34 (P.116)
- Subtree Y in figure 4.34 has had an item inserted into it guarantees that it is nonempty. Thus, assume that it has a root and two subtrees.
- Consequently, the tree may be viewed as four subtrees connected by three nodes. (Left part of Fig. 4.35, P.116)
- Exactly one of tree B or C is two levels deeper than D (unless all are empty).
- Left-right double rotation to fix case 2
Fig. 4.35 (P.116)
- Rotate between 's child and grandchild, and then between and its new child.
- It restores the height to what it was before the insertion.
- Right-left double rotation to fix case 3.
Fig. 4.36 (P.116)
- Example:
To continue the example in 4.1 by inserting the keys 10 through 16 in reverse order, followed by 8 and then 9.
Diagrams in P.117 - 119
- To insert a new node with key X into an AVL tree T:
- Recurisvely insert X into the appropriate subtree of T (say TLR)
- If the height of TLR does not change, then we are done.
- Otherwise, if a height imbalance appears in T, we do the appropriate single or double rotation depending on X and the keys in T and TLR, update the height.
- Storage of height information as:
- Difference in height (i.e. +1, 0, -1),
only require 2 bits.
Avoid repetitive calculation
Loss of clarity
Coding is more complicated than if the height were stored at each node.
- Absolute heights
INSERTION OF A NODE IN AVL TREE
Insertion can be done by finding an appropriate place for the node to be inserted. But this can disturb the balance of the tree if the difference of height of sub-trees with respect to a node exceeds the value one. If the insertion is done as a child of non-leaf node then it will not affect the balance, as the height doesn't increase. But if the insertion is done as a child of leaf node, then it can bring the real disturbance in the balance of the tree.
This depends on whether the node is inserted to the left sub-tree or the right sub-tree, which in turn changes the balance factor. If the node to be inserted is inserted as a node of a sub-tree of smaller height then there will be no effect. If the height of both the left and right sub-tree is same then insertionto any of them doesn't affect the balance of AVL Tree. But if it is inserted as a node of sub-tree of larger height, then the balance will be disturbed.
To rebalance the tree, the nodes need to be properly adjusted. So, after insertion of a new node the tree is traversed starting from the new node to the node where the balance has been disturbed. The nodes are adjusted in such a way that the balance is regained.
ALGORITHM FOR INSERTION IN AVL TREE
int avl_insert(node *treep, value_t target)
{
/* insert the target into the tree, returning 1 on success or 0 if it
* already existed
*/
node tree = *treep;
node *path_top = treep;
while (tree & target != tree->value)
{
direction next_step = (target > tree->value);
if (!Balanced(tree)) path_top = treep;
treep = &tree->next[next_step];
tree = *treep;
}
if (tree) return 0;
tree = malloc(sizeof(*tree));
tree->next[0] = tree->next[1] = NULL;
tree->longer = NEITHER;
tree->value = target;
*treep = tree;
avl_rebalance(path_top, target);
return 1;
}
ALGORITHM FOR REBALANCING IN INSERTION
void avl_rebalance_path(node path, value_t target)
{
/* Each node in path is currently balanced. Until we find target, mark each node as longer in the direction of rget because we know we have inserted target there */
while (path & target != path->value) {
direction next_step = (target > path->value);
path->longer = next_step;
path = path->next[next_step];
}
}
void avl_rebalance(node *path_top, value_t target)
{
node path = *path_top;
direction first, second, third;
if (Balanced(path)) {
avl_rebalance_path(path, target);
return;
}
first = (target > path->value);
if (path->longer != first) {
/* took the shorter path */
path->longer = NEITHER;
avl_rebalance_path(path->next[first], target);
return;
}
/* took the longer path, need to rotate */
second = (target > path->next[first]->value);
if (first == second) {
/* just a two-point rotate */
path = avl_rotate_2(path_top, first);
avl_rebalance_path(path, target);
return;
}
/* fine details of the 3 point rotate depend on the third step. However there may not be a third step, if the third point of the rotation is the newly inserted point. In that case we record the third step as NEITHER */
path = path->next[first]->next[second];
if (target == path->value) third = NEITHER;
else third = (target > path->value);
path = avl_rotate_3(path_top, first, third);
avl_rebalance_path(path, target);
}
DELETION
A node in AVL Tree is deleted as it is deleted in the binary search tree. The only difference is that we have to do rebalancing which is done similar to that of insertion of a node in AVL Tree.The algorithm for deletion and rebalancing is given below:
ALGORITHM FOR DELETION IN AVL TREE
int avl_delete(node *treep, value_t target)
{
/* delete the target from the tree, returning 1 on success or 0 if it wasn't found */
node tree = *treep;
direction dir;
node *targetp, targetn;
while(tree) {
dir = (target > value);
if (target == value) targetp = treep;
if (tree->next[dir] == NULL)
break;
if (tree->longer == NEITHER || (tree->longer == 1-dir & tree->next[1-dir]->longer == NEITHER))
path_top = treep;
treep = &tree->next[dir];
tree = *treep;
}
if (targetp == NULL) return 0;
targetp = avl_rebalance_del(path_top, target, targetp);
avl_swap_del(targetp, treep, dir);
return 1;
}
ALGORITHM FOR REBALANCING IN DELETION IN AVL TREE
node *avl_rebalance_del(node *treep, value_t target, node *targetp)
{
node targetn = *targetp;
while(1) {
node tree = *treep;
direction dir = (target > tree->value);
if (tree->next[dir] == NULL)
break;
if (Balanced(tree))
tree->longer = 1-dir;
else if (tree->longer == dir)
tree->longer = NEITHER;
else {
/* a rotation is needed, and targetp might change */
if (tree->next[1-dir]->longer == dir)
avl_rotate_3_shrink(treep, dir);
else
avl_rotate_2_shrink(treep, dir);
if (tree == targetn)
*targetp = &(*treep)->next[dir];
}
treep = &tree->next[dir];
}
return targetp;
}