Task Allocation Method for Fault Tolerance inDCS in case of Node Failure

DeepikaSoni Jatinder Singh Saini

M.Tech. Student Asst. Professor

Department of CSE/IT Department of CSE/IT

BBSBEC/PTU,India BBSBEC/PTU,India

Email: Email:

ABSTRACT:

Distributed systems works on achieving good performance and high system utilization.There are issues in distributed system like fault tolerance, scalability, openness etc. In this paper we worked on to improve the tolerance of fault the mobile distributed system so that to take lesser execution time when one node moves from its position as compared to other systems.TheDistributedsystemsdecreasetheloadonthecentralauthority.Thecentralauthorityis that which assigns the work tovarious othermobilesystems.Thismethodwill improvethenetworkthroughput,reduceexecutiontimeand will reduce energyconsumption. The main work of a job scheduling system is to manage the computing power of clients, servers, and supercomputers efficiently so as to increasethe job throughput and system utilization. There are multiple issues of heterogeneous distributed computing system which are discussed briefly in this paper. The aim of this paper is to concentrate on fault tolerance and to recover fault with lesser processing time. The proposed algorithm assigns tasks to other nodes only when master node shifts from its actual position. The biggest problem in this architecture is task scheduling, if one node gets failed the task allocated by master node will not be completed and fault is occurred. In this work, we have worked on technique which will help to reduce the fault tolerance of the system and will increasesthe performance and throughput of the distributed computing system.

Keywords:

Distributed systems, Task allocation, job scheduling and scalability

1. INTRODUCTION

A distributed system is a combination of self directing computers that pacify to the users as a single unified system. It is of two types either homogenous or heterogeneous. A heterogeneous distributed computing system is that system in which random node can fail permanently. A homogenous distributing computing system is that who shares local memory. Because the Distributed

Computing System is diverse in nature, so its multiple nodes have distinguishing hardware and software attributes. The various componentsof the application also have so many hardware and software needs. A distributed system connected by local networks and physically connected with each others.A Distributed computing system utilizes a network of multiplecomputers;each is performing a part of an all-encompassing task, in order to achieve results comparatively much faster than with a coherent computer. A computer program that runs on distributed computing system is called as distributed program. The process of writing such type of languages is called distributed programming.First of all every computational entity has an own local memory. The entities communicate with each other through message passing. Second the system should tolerate failuresin self standing systems or computers.The structure of system and their links may vary during the execution of the program. Each system is having the knowledgeonly about the input of the system. Resource sharing is the commodious way to use any hardware, software or data at anyplace in the system. Resources in the distributed computing system, other than the centralized one, are physically enclosed within one of the computers and can only be used from others by the help of communication.Openness is apprehensive with the denotation and improvability of distributed computing systems. New advance components areincluded with presented components so that the added on functionality become approachable from the distributed computing system as a whole.

Fig.1.1 Distributed Computing Systems

1.1Design Issues of Distributed System:ADistributed computing system must be designatedto provide all the advantages of a computing system to its end users. There are also some issues related to design that are as follows:

1. Flexibility:The heterogeneous distributed computing system must be enough flexible so that changes and improvement can be done easily by the users.

2. Scalability:A system must be designed in such a way that it must very easily overcome and cope up with the needs and requirement of the system. It must also avoid algorithms and entities too that are centrally present in the system. Also it should perform all the functions and tasks at clients side work station.

3. Security:So as to make all users trust the system also completely depend on it, different kind of resources of a system must be protected against any kind of breakdown and unauthorized access by any user or intermediate. Security maintenance in a distributed computing system is a very big issue than in a centralized system because of the reason that there is a lack of a single point of control and the use of networks that are insecure for communication of data.

4. Fault Tolerance:The system must be resistant to the faults. In future if any fault occurs it will not degrade its performance.

2. REVIEW OF LITERATURE

In this paper [1] they addressed the distributed computingfor the problem of tracking control for multi agent based systems with heterogeneous uncertainties and a head whose controlling input must be nonzero and should not be obtainable to any of its follower. Based on the states of the neighboring elements, both continuous static and dynamic controllers are made to make sure the uniform ultimate boundoudness of tracking the error for each follower. A necessary condition for the occurrence of these set of controllers is every agent is static. In this paper [2] they represented that the Agent technology becomes a promise agent to become a powerful of the present agent, to know where the agent is and what is it mechanism to improve the pliability and doing. The Mobile agents systems should also provide a customizable of applications with its ability to additional feature for the agent security from deployment of application and its components dynamically towards the harmful and spiteful host and also the high security of the host from a malicious network. But no one among all the present mobile agents is harmful or malicious. The architecture proposed in this paper small easy sample of this systems satisfy all the requirements needed to address the above issues can be used to give a highly secured and trustworthy architecture on which users rely on, best suited for the features of the existing systems. In this paper [3] a service for tolerance of faults that contains a technique of modeling in a group in the networks that are Ad hoc in nature and after that applying the faults tolerance by replicating which is worked initially on the perception and judgment. The adequate algorithm makes the groups based on the amount of neighbors and level of energy of the nodes. After the clustering the network has a structure that is ecclesiastic and is having two steps with a head for every group and a super leader (generic) for every network. The groups made are open, active, non joined, very specific, and making to allow a point-to-point communication between the group. The fault tolerance service for tolerance is applied in making of four sub services for knowing (cluster making, making decision, and replicating by the help of prediction and continuity making it possible for effective managementof the computing network and which combinesall the functions needed for a best accessibility of data. Their contributions towards this took the account of the properties of the mobile terminal with a goal of minimizing the increased loss of data and to decrease the high energy consumption of their critical resources. In the futures work: they wished to implement the completefault-tolerance in a simulator of network in which nodes are Ad-hoc such in GloMoSim and enhance improve algorithm used in clustering by taking in mind about nodes and to combine simulator to differentrouting protocols for Ad hoc networks. In this paper [4] they explained about task allocation problem in distributed heterogeneous system. They said that there is a necessity of allocation of a number of tasks to multiple processors for the execution. The paper focuses on task allocation problem in distributed systems by making an aimof increasing the reliability of the system. Theyalso gave an algorithm to find out the solution for theparticular problem. For the case of calculating performance of the algorithm a more than one parameter was taken into consideration like the number of tasks, the number of processors, and density of task interaction of applications. The experimental results explained the potential of particular algorithm over ordinary algorithms. Inthispaper[5]theydemonstratedthatinDistributedcomputing systems(DCSs), strategy ofallocationof taskisanimportantphasetoreducethecost of the system.Tousethe power of abilitiesofdistributedsystem(DCS)foranefficientparallelism,thetasksofan equally distantprogramis to assign tasks properlytotheprocessors availableinthesystem.So toremove or step on tothisproblem,itisimportantto announcestrategiesfor producingthe best possible solutionfor thegivenproblem. So this paperfocuseson theproblem oftaskallocationinDistributed SystemsuchthatthesystemcostisminimizedInthis paper [6] they presented an efficient solution to the dynamic allocation problem. Let us start with the problem definition of thephase of a latest modular program, a model based on the approach of dynamic programming is suggested. In this model, there are five phases and each phase has the equal numbers of tasks. The best desirable allocation has been obtained in connection with the phase wise optimal costs. In this paper [7] theypresentedaConnection for Fault-TolerantModelfor themobileenvironmentthat considerstwo of the mainscenarios for communication that isfirstly whenMHsisconnected toastablenetworkthroughMobile Station Subsystem (MSS), andthesecond one whenMHs can’tconnecttothefixednetwork. TheCFTmodeldecreasesthetime of blockingof theresourceatthefixeddevices thatprovides muchfaster recoveryfromthe failures of connectiongiving backtomobilityofmobiledevicesand alsomaximizesthenumberof saidmobiletransactions. In paper [8]they presentedanew originaltechnique for fault-tolerancetohave allthecertain positiveadvantages as well asfeatures suitable forbigscaleandad-hochierarchicalmobileagents of the admiringorganizations.Italso detects the failure with the advanced failure detectiontechniquewithlowfailure-freeoverheadwith the help of everydomainmanagerthat transmits or sends aheart-beatmessagetoitsnext immediatehigherlevelmanager.Also it performs actions that are continuous and also ondetection of failurethat are tobeperformedcontinuouslyincaseofcreation of agents,agentmigrationand alsotermination of agents. Theypresentedinthispaper [9] thata bestfaultstolerance service that containsanalgorithmofgroup modelinginthedynamicnetworksandthenimplementingthefault tolerancetobyreplicatingthatisbasedinitially on thejudgment. This particularalgorithm combinesthegroupsonthe behalf of the amount of theneighborsand the level of energyofthe nodes.The clusterednetworkaftercombininghasastructure that is hierarchical andoftwolevelshavingamonitorforeverygroupandalso thesuperleader(generic)forallnetworks.Thegroups formed are alwaysopen,disjoined,explicit,and also making and establishingapoint-to-pointcommunicationof and in thegroup.Theypresented [10]that Distributedsystemsarestillinadevelopment stage moving on,havinga number of questions beingunanswered also how the workers support for the things to be done is an agreementinthefield.So Manysystems being experimentalusestheclient-servermodelwith a unique kind ofprocedural remotecallasthecommunicationbase, so asbutalso thereare some systems that areimplementedontheconnectionmodel.Comparatively a lesser amount ofworkhasbeendoneondistributednaming,protection of distributed system,andresourcemanagement,otherthanbuildingstraight-forwardnameserversandprocessservers. Inthispaper [11] theyexplained that all for the bigger scale grid computational systems thatFaulttoleranceisanimportantaspect,wherethe nodes that are geographicallydistributed canco-operatetocompleteatask. So inordertoobtain aextremely highestlevelofreliabilityandavailability,thegridenvironment mustbeacompletely faulttolerant.So the execution of jobs is highly being caused by the failures, so thefaulttoleranceservicesare veryimportanttosatisfy therequirements of QOSindistributed computing.Frequently usedtechniques are forprovidingfaulttolerancearecheck pointing of jobandthe replication of data.All twomethodsmaketheamountofwork to go away duetothe changingneeds and requirements of the systembut itbrings up asufficiently greatruntimeoverhead.

3.FAULT TOLERANCE IN DISTRIBUTED COMPUTING SYSTEM

Distributing computing system is a computational heterogeneous system in which the software and the hardware under building provides the consistency, dependability and less expensive to accesses highly ended computations. An imperfect system due to some reasons can cause destruction. A piece of work which is working on real time distributed system should be achievable and dependable [12]. The real time systems like grid, robotics etc. is extremely responsible on standstill. Any mistake in real time system can cause a problem of collapse of system if it is not properly noticed and get over at time. Fault-tolerance is a very important method or technique which is usually added to continue reliability in these systems. By adding additional components such as processors, resource, and through links of communication, hardware fault tolerance can be achieved. In software the task of fault, is to deal with faults when in case the faulty messages are added into the system. Distributed computing is different from traditionally distributed system [13]. Fault Tolerance is important method in distributed computing because nodes are dispersed geographically in the system under the different geographically present domains throughout theWorld Wide Web. The most difficult task in distributed computing system is their design of fault tolerant in orderto verify that all its reliability issues and requirements are met [14]. Fault Tolerance can be achieved with the help of two ways. These ways are as follow:

Recovery

Redundancy

A fine system that is fault- tolerant in design needs a cautious study of causes of failures and responseof systemtowards failures. Such learning should be approved out in aspect before the design start and have to remain part of the design process [15].Planning to keep away from failures is most important. A designer must examine the overall scenario and then must decide all the failures present that must be endured to achieve the preferred level of dependability. To optimize fault tolerance, it is veryimportant to measure the approximate actual failure rate for each possible failure.Because the Distributed Computing System is diverse in nature, so its multiple nodes have distinguishing hardware and software attributes. The various components of the application also have so many hardware and software needs.Firstly,theyfind outthecandidatenodesfortasksthatsatisfiesitsrequirements.Thentheyusethedifferent policies for loadsharingfordealingthefailure of the nodesand also increasingthereliabilityofDCS [16].

4. PROPOSED METHODOLOGY

Due to the sudden increase of the scales of today’s distributed computing systems, it is very important to develop effective job handlers. The quantity of users of distributed computing systems and theirdistributed networks increases with the increase in the complexity of their services and their terms, the administrators of the system tries to ensure the highest quality of services every user needs by effectively increasing the use of systems resources. To obtain this aim, true, real-time and effective management handling and monitoring techniques are much more important for the distributed system. But, as the foundation of the distributed systems is suddenly increasing up, a large quantity of knowledge based information is prepared by a huge amount of nodes that are managed so the intricate part of the network monitoring routine becomes excessively high. Thus, the mobile based on agent supervising technique had efficiently been developed to look after these relatively big scales and active distributed systems rapidly and effectively. The proposed algorithm is assigning tasks to other nodes only when the master node dislocate from its original headed position. The major problem in this architecture is the schedulingof task, if one of the slave node stops working or gets failed, then the task allocated by the master node to the candidate nodes will not be completed and the fault will occur.In this work, we will work on technique which is based upon weight that helps to decrease the fault tolerance of the system and will increase performance of the system.

Fig. 4.1: Flowchart of methodology

5. EXPERIMENTAL RESULTS

The entire proposed work has implemented on NS2 platform.

Figure 5.1: Energy Graph

As shown in figure 5.1, the red line shows the energy consumed in the existing scenario in which the fault get raised and green line shows energy consumption in enhanced algorithm in which the fault is recreant/recovered. So the energy consumption in enhanced technique is less as compare to already existing algorithm.

Figure 5.2: Packet Overhead

As shown in figure 5.2, the red line shows the packet overhead in the existing scenario in which the fault get raised and green line shows packet overhead in enhanced algorithm in which the fault is recovered. So the packet overhead is lesser in case of enhanced scenario than the existing one.

Figure 5.3: Delay Graph

As shown in figure 5.3, the red line shows the delay in the existing scenario in which fault get raised and green line shows delay in enhanced algorithm inwhich fault isrecovered. So the delay in the current enhanced scenario is much more reduced as compare to already existing scenario.

6. CONCLUSION

The distributed computing network is defined as ahigh density of combination of mobile entities joined by a link which is wireless that is not having a fixed support. In this topology of network not any kind of authority is present in centre due to which the disconnection of the network is very often among the mobile nodes. So as Due to the above said reasons the chances of occurrence of errors in the mobile distributed computing network is too high. The load on the system is distributeduniformly among all the mobile nodes so as to improve the efficiency of the network and to decrease the execution time. When the load is not uniformly distributed among the mobile nodes, the rate of occurrence of error is much more incremented suddenly. An approach for fault tolerance is needed to decrease the number of errors in distributed mobile network. The allocation of task among all the mobile nodes is made by the help of a task allocation modal. In this thesis, novel technique has been proposed which reduces the fault detection time in the network and reduces the resource consumption to execute the allocated tasks using weight based technique. The proposed algorithm is based on the failure rate, minimum execution and time taken by the master node scheme for fault recovery and concurrent execution of processes for the process execution. This technique leads to reduce in processing time and reduce in energy consumption.

REFERENCES

[1] ZhongkuiLi andZhishengDuan,2013. “Distributed Tracking Control of Multi-Agent Systems with Heterogeneous Uncertainties”, 10th IEEE International Conference on Control and Automation (ICCA) Hangzhou, China, June 12-14.

[2]Sreedevi R.N, Geeta U.N, U.P.Kulkarni ,A.R.Yardi,2009. “Enhancing Mobile Agent Applications with Security and Fault Tolerant Capabilities, IEEE International Advance Computing Conference (IACC 2009) Patiala, India, 6-7 March.