UNIT V- ALGORITHM DESIGN TECHNIQUES
1
1.Introduction to Algorithmdesign and analysis
Compute the efficiency of the algorithm.. The efficiency iis depend on the following factors
• Time: The amount of time to take execute the algorithm. If the algorithm take lessamount of time to execute then this one will be the best one.
. Space: The amount of memory required to store the algorithm and the amount of memoryrequired to store the input for that algorithm.
• Simplicity: The algorithm should not having any complex instructions. That type ofinstructions are simplified to number of smaller instructions.
• Generality: The algorithm should be in general. It should be implemented in anylanguage. The algorithm is not fit in to a specific output.
Properties of the algorithm
1) Finiteness: - an algorithm terminates after a finite numbers of steps.
2) Definiteness: - each step in algorithm is unambiguous. This means that the action specified bythe step cannot be interpreted (explain the meaning of) in multiple ways & can be performedwithout anyconfusion.
3) Input:- an algorithm accepts zero or more inputs
4) Output:- it produces at least one output.
5) Effectiveness:- it consists of basic instructions that are realizable. This means that theinstructions
6) Non Ambiguity: The algorithm should not having any conflict meaning..
7) Range of Input: Before design a algorithm,, decide what type of iinput going togiven,, whatsis the required output..
8) Multiplicity: You can write different algorithms to solve the same problem..
9) Speed : Apply some idea to speed up the execution time..
2. Analysis of algorithm
Analysis : To compute the efficiency of an algorithm. When compute the efficiency of analgorithm, consider the following two factors.
_ Space Complexity
_ Time Complexity
(i) Space Complexity : The amount of memory required to store the algorithm and the amountof memory required to store the inputs for this algorithm.
(ii) Time Complexity
The amount of time required to run the algorithm.. The execution time depends on the following
factors..
How to measure the running time?
• Find the basic operation of the algorithm. (The inner most loop or operation is called basic
operation)
• Compute the time required to execute the basic operation.
• Compute how many times the basic operation is executed. The Time complexity calculated by
the following formula
T(n) = Cop * C(n) Where,
Cop – Constant (The amount of time required to execute the basic operation)
C(n) – How many times the basic operation is executed.
3. Asymptotic Notations
(i) Big–O
(ii) Big–Omega
(iii) Big–Theta
(i) Big–O (or) O( ) Notation
A function t(n) is said to be in O(g(n)), denote t(n) € O(g(n)), if t(n) is bounded above by someconstant multiple of g(n) for all large n, i.e., if there exists some positive constant c and somenonnegative integer n0 such thatt(n) ≤ c.g(n) for all n ≥ n0.
Example:100n+5 € O(n2)
(ii) Big–Omega (or) Ω( ) Notation
Example:n3€ Ω(n2)
A function t(n) is said to be in Ω (g(n)), denote t(n) € Ω (g(n)), if t(n) is bounded below by someconstant multiple of g(n) for all large n, i.e., if there exists some positive constant c and somenonnegative integer n0 such thatt(n) ≥ c.g(n) for all n ≥ n0.
(iii) Big–Theta (or) θ(θ) Notation
A function t(n) is said to be in θ(g(n)), denote t(n) €θ (g(n)), if t(n) is bounded both above and
below by some constant multiple of g(n) for all large n, i.e., if there exists some positive constant
c1 and c2 and some nonnegative integer n0 such that
C2.g(n) ≤ t(n) ≥ c1.g(n) for all n ≥ n0.
Example:n(n-1)/2 €θ(n2)
4.Greedy method
Greedy method is an approach to find the optimized solutions for the given problem. The solutions are constructed through a sequence of steps, each step expanding a partially constructed solution so far, until a complete solution to the problem is reached.
Applications of greedy method
- A Simple Scheduling Problem
- Huffman Codes
- Approximate Bin Packing
(i)A Simple Scheduling Problem
Scheduling refers to the way processes are assigned to run on the available CPUs, since there are typically many more processes running than there are available CPUs. This assignment is carried out by software known as a scheduler and dispatcher.As an example, suppose we have the four jobs and associated running times shownin Figure.
One possible schedule is shown in Figure . Because j1 finishes in 15 (time units), j2 in 23, j3 in 26, and j4 in 36, the average completion time is 25. A better schedule, which yields a mean completion time of 17.75, is shown in Figure. The schedule given in Figure is arranged by shortest job first. We can show that this will always yield an optimal schedule.
The Multiprocessor Case
We can extend this problem to the case of several processors. Again we have jobs j1, j2, . . . , jn, with associated running times t1, t2, . . . , tn, and a number P of processors. We will assume without loss of generality that the jobs are ordered, shortest running time first. As an example, suppose P = 3, and the jobs are as shown in Figure.
(ii)Huffman Codes
A second application of greedy algorithms, known asfile compression. Suppose we have a file that contains only the characters a, e, i, s, t, plusblank spaces and newlines. Suppose further, that the file has ten a's, fifteene's, twelve i's, three s's, four t's, thirteen blanks, and one newline. As thetable in Figure shows, this file requires 174 bits to represent, since thereare 58 characters and each character requires three bits.
Character Code Frequency Total Bits
------
a 000 10 30
e 001 15 45
i 010 12 36
s 011 3 9
t 100 4 12
space 101 3 9
newline 110 1 3
------
Total 174
In real life, files can be quite large. if it would be possible to provide abetter code and reduce the total number of bits required. The binary code that represents the alphabet can be represented by the binarytree shown in Figure.
The tree in Figure has data only at the leaves. The representation of eachcharacter can be found by starting at the root and recording the path, using a 0to indicate the left branch and a 1 to indicate the right branch. For instance, sis reached by going left, then right, and finally right. This is encoded as 011. A better code than the one given in Figure can be obtained by noticing thatthe newline is an only child. By placing the newline symbol one level higher atits parent, we obtain the new tree in Figure. This new tree has cost of 173,but is still far from optimal. The tree in Figure shows the optimal tree for oursample alphabet. As can be seen in Figure , this code uses only 146 bits.
Huffman's Algorithm
Huffman's algorithm can be described as follows: We maintain a forest of trees. The weight of a tree is equal to the sum of the frequencies of its leaves. C – 1times, select the two trees, T1 and T2, of smallest weight, breaking tiesarbitrarily, and form a new tree with subtrees Tl and T2. At the beginning of thealgorithm, there are C single-node trees-one for each character. At the end ofthe algorithm there is one tree, and this is the optimal Huffman coding tree.
(iii) Approximate Bin Packing
The problem is to pack these items in the fewest number of bins, given that eachbin has unit capacity. As an example, Figure 10.20 shows an optimal packing foran item list with sizes 0.2, 0.5, 0.4, 0.7, 0.1, 0.3, 0.8.
There are two versions of the bin packing problem. The first version is on-linebin packing. In this version, each item must be placed in a bin before the nextitem can be processed. The second version is the off-line bin packing problem. Inan off-line algorithm, we do not need to do anything until all the input has beenread.
On-line Algorithms
-> Next Fit
-> First Fit
-> Best Fit
Next Fit Algorithm
When processing any item, we checkto see whether it fits in the same bin as the last item. If it does, it is placedthere; otherwise, a new bin is created. This algorithm is incredibly simple toimplement and runs in linear time.
First Fit Algorithm
The first fit strategy is to scan the bins in order and place the new item in thefirst bin that is large enough to hold it. Thus, a new bin is created only whenthe results of previous placements have left no other alternative. A simple method of implementing first fit would process each item by scanning
down the list of bins sequentially. This would take O(n2).
Best Fit Algorithm
Instead of placing a new item in the first spot that is found, it is placed in the tightest spot among all bins.
5.Divide and conquer
The divide-and-conquer strategy solves a problem by:
1. Breaking it into sub problems that are themselves smaller instances of the same type ofproblem
2. Recursively solving these sub problems
3. Appropriately combining their answers
Running Time of Divide and Conquer Algorithms
T(n)=a(n/b)+f(n)
Where, T(n) is the running time of the given algorithm
A is the number of sub problems
B is the sub problem size
F(n) is the time to combine the solutions of sub problems
Examples
Closest-Points Problem
Quick sort
(i) Closest-Points Problem
Given n points in metric space, find a pair of points with the smallest distance between them. The problem can be solved in O(n log n) time using the recursivedivide and conquer approach, e.g., as follows[1]:
- Sort points along the x-coordinate
- Split the set of points into two equal-sized subsets by a vertical line x = xmid
- Solve the problem recursively in the left and right subsets. This will give the left-side and right-side minimal distances dLmin and dRmin respectively.
- Find the minimal distance dLRmin among the pair of points in which one point lies on the left of the dividing vertical and the second point lies to the right.
The final answer is the minimum among dLmin, dRmin, and dLRmin
(ii)Quick sort
Quicksort is a divide and conquer algorithm. Quicksort first divides a large list into two smaller sub-lists: the low elements and the high elements. Quicksort can then recursively sort the sub-lists.
DIVIDE:The array >A is partitioned (rearranged) into two nonempty subarrays A[p . . q] and A[q+1 . . r].The index q is computed as a part of this partitioning procedure.
CONQUER:The two subarrays A[p . . q] and A[q+1 . . r] are sorted by recursive calls to quicksort.
COMBINE:Since the subarrays are sorted in place, no work is needed to combine them : the entire array A is now sorted.
The steps are:
- Pick an element, called a pivot, from the list.
- Reorder the list so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the partition operation.
- Recursively sort the sub-list of lesser elements and the sub-list of greater elements.
The base case of the recursion are lists of size zero or one, which never need to be sorted.
6.Dynamic Pogramming
Dynamic programmingis a method for efficiently solving a broad range of search and optimization
problems which exhibit the characteristics of overlappling sub problems and optimal substructure.
Applications of Dynamic Programming
•Using a Table Instead of Recursion
•Ordering Matrix Multiplications
•Optimal Binary Search Tree
•All-Pairs Shortest Path
(i) Using a Table Instead of Recursion
The natural recursive program to compute the Fibonacci numbers is veryinefficient. Recall that the program shown in Figure 10.40 has a running time T(n) that satisfiesT(n) T(n - 1) + T(n - 2). Since T(n) satisfies the same recurrence relation as the Fibonaccinumbers and has the same initial conditions,
Recursive Algorithm for Fibonacci
unsigned int fib( unsigned int n )
{
if( n <= 1 )
return 1;
else
return( fib( n-1 ) + fib( n-2 ) );
}
Non Recursive Algorithm for Fibonacci
unsigned int fibonacci( unsigned int n )
{
unsigned int i, last, next_to_last, answer;
if( n <= 1 )
return 1;
last = next_to_last = 1;
for( i = 2; i <= n; i++ )
{
answer = last + next_to_last;
next_to_last = last;
last = answer;
}
return answer;
}
(ii) Ordering Matrix Multiplications
Suppose we are given four matrices, A, B, C, and D, of dimensions A = 50 X 10, B = 10 X 40, C =40 X 30, and D = 30 X 5. Although matrix multiplication is not commutative, it is associative,which means that the matrix product ABCD can be parenthesized, and thus evaluated, in any order.The obvious way to multiply two matrices of dimensions p X q and q X r, respectively, uses pqrscalar multiplications.
In the case of four matrices, it is simple to solve the problem by exhaustive search, since thereare only five ways to order the multiplications. We evaluate each case below:
(A((BC)D)): Evaluating BC requires 10 X 40 X 30 = 12,000 multiplications. Evaluating (BC)D requires the 12,000 multiplications to compute BC, plus an additional 10 X 30 X 5 = 1,500multiplications, for a total of 13,500. Evaluating (A((BC)D) requires 13,500 multiplications for(BC)D, plus an additional 50 X 10 X 5 = 2,500 multiplications, for a grand total of 16,000multiplications.
(A(B(CD))): Evaluating CD requires 40 X 30 X 5 = 6,000 multiplications. Evaluating B(CD)requires 6,000 multiplications to compute CD, plus an additional 10 X 40 X 5 = 2,000multiplications, for a total of 8,000. Evaluating (A(B(CD)) requires 8,000 multiplications for B(CD), plus an additional 50 X 10 X 5 = 2,500 multiplications, for a grand total of 10,500multiplications.
((AB)(CD)): Evaluating CD requires 40 X 30 X 5 = 6,000 multiplications. Evaluating ABrequires 50 X 10 X 40 = 20,000 multiplications. Evaluating ((AB)(CD)) requires 6,000multiplications for CD, 20,000 multiplications for AB, plus an additional 50 X 40 X 5 = 10,000multiplications for a grand total of 36,000 multiplications.
(((AB)C)D): Evaluating AB requires 50 X 10 X 40 = 20,000 multiplications. Evaluating (AB)Crequires the 20,000 multiplications to compute AB, plus an additional 50 X 40 X 30 = 60,000multiplications, for a total of 80,000. Evaluating (((AB)C)D) requires 80,000 multiplications for(AB)C, plus an additional 50 X 30 X 5 = 7,500 multiplications, for a grand total of 87,500multiplications.
((A(BC))D): Evaluating BC requires 10 X 40 X 30 = 12,000 multiplications. Evaluating A(BC)requires the 12,000 multiplications to compute BC, plus an additional 50 X 10 X 30 = 15,000multiplications, for a total of 27,000. Evaluating ((A(BC))D) requires 27,000 multiplications forA(BC), plus an additional 50 X 30 X 5 = 7,500 multiplications, for a grand total of 34,500multiplications.
If we define MLeft,Right to be the number of multiplications required in an optimal ordering,
then, if Left < Right,
(iii) Optimal Binary Search Tree
•A binary search tree is a tree where the key values are stored in the internal nodes, the external nodes (leaves) are null nodes, and the keys are ordered lexicographically. I.e. for each internal node all the keys in the left sub tree are less than the keys in the node, and all the keys in the right sub tree are greater.
•Catalan number c(n)= C(2n,n)/(n+1) is used to find the number of different possible binary search tree for n nodes.
Example: Suppose if we have 4 nodes, then we can construct 14 different binary search trees.
Keys / 3 / 7 / 9 / 12Probabilities / 0.1 / 0.2 / 0.4 / 0.3
Suppose for the tree (b),
the average key comparisons =0.2(1)+0.1(2)+0.3(2)+0.4(3)=2.2
Then identify the optimal binary search trees by calculating the average number of key comparisons for all those tree structure and select the search tree with minimum values.
Thus, we can write a formula for the cost CLeft,Right of anoptimal binary search tree. If Left > Right, then the cost of the tree is 0; this is the NULL case, which we always have forbinary search trees. Otherwise, the root costs pi. The left subtree has a cost of CLeft,i-1,relative to its root, and the right subtree has a cost of Ci+l,Right relative to its root. This gives the formula
(iv)All pairs shortest path problem
The all-pairs shortest path problem is the determination of the shortest graph distances between every
pair of vertices in a given graph. The problem can be solved using applications of Dijkstra'salgorithm or all at once using the Floyd-Warshall algorithm. The latter algorithm also works in thecase of a weighted graph where the edges have negative weights.The matrix of all distances between pairs of vertices is called the graph distance matrix, or sometimesthe all-pairs shortest path matrix.
Floyd's Algorithm
Floyd's algorithm takes as input the cost matrix C[v,w]
• C[v,w] = oo if (v,w) is not in E. It returns as output
• a distance matrix D[v,w] containing the cost of the lowest cost path from v to wo initially D[v,w] = C[v,w]
• a path matrix P, where P[v,w] holds the intermediate vertex k on the least cost path between v
and w that led to the cost stored in D[v,w].
Floyd's algorithm computes the sequence of matrices D0,D1,….D|V| . The distances in Di represent
paths with intermediate vertices in Vi. Since Vi+1=ViU{Vi+1}, we can obtain the distances in Di+1 fromthose in Di by considering only the paths that pass throughFor every pair of vertices (v,w), we compare the distance Di(v,w), (which represents the shortest pathfrom v to w that does not pass through vi+1) with the sum Di(v,vi+1)+Di(vi+1,w) (which represents theshortest path from v to w that does pass through vi+1).
7.Backtracking
Definition: Backtracking is a process of searching the solution to the problem by searching the solution space(the set of all feasible solutions) for the given problem.
Backtracking is a process where steps are taken towards the final solution and thedetails are recorded. If these steps do not lead to a solution some or all of them may have to beretraced and the relevant details discarded. In theses circumstances it is often necessary to searchthrough a large number of possible situations in search of feasible solutions.
Example
The Turnpike Reconstruction Problem
Games (Eight Queen problem)
(i)The Turnpike Reconstruction Problem
Suppose we are given n points, p1, p2, . . . , pn, located on the x-axis. xi is the x coordinateof pi. Let us further assume that x1 = 0 and the points are given from left to right. These npoints determine n(n - 1)/2 (not necessarily unique) distances d1, d2, . . . , dn between everypair of points of the form | xi - xj |. It is clear that if we are given the set ofpoints, it is easy to construct the set of distances in O(n2) time. The turnpike reconstruction problem is to reconstruct a point set from the distances. Let D be the set of distances, and assume that | D | = m = n(n - 1)/2. As an example, supposethat
D = {1, 2, 2, 2, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8, 10}
Since | D | = 15, we know that n = 6. We start the algorithm by setting x1 = 0. Clearly, x6 = 10,since 10 is the largest element in D. We remove 10 from D. The points that we have placed and theremaining distances are as shown in the following figure.
The largest remaining distance is 8, which means that either x2 = 2 or x5 = 8. By symmetry, wecan conclude that the choice is unimportant, since either both choices lead to a solution (whichare mirror images of each other), or neither do, so we can set x5 = 8 without affecting thesolution. We then remove the distances x6 - x5 = 2 and x5 - x1 = 8 from D, obtaining