Institute for Software-Integrated Systems

Technical Report

TR#:ISIS-09-106

Title:A State Transfer Framework for Object Oriented Fault-Tolerance

Authors:Friedhelm Wolf, Jaiganesh Balasubramanian, Aniruddha Gokhale, Douglas C. Schmidt

Copyright (C) ISIS/Vanderbilt University, 2009

1Introduction

FLARe [1] is used as lightweight fault-tolerance middlewarein DRE systems. This report discusses the designof Components with HEterogeneous State Synchronization

(CHESS). FLARes mechanism for state transferand synchronization.

Passive replication schemes as used in FLARe dependon backup replicas that can take over processing quicklywhen a failure occurs. This includes deployment of backupinstances of the same application and then failover when anerror is detected. Since most replicas have a unique internalstate, they need to be synchronized frequently. State canchange through client invocations or other system events ortimed signals.

This makes it necessary to synchronize the internal statesof replicas representing the same logical application. Replicashave to be ready to respond to requests any time andthis requires that their internal state is up-to-date. are in thecorrect state to respond to the next request in case of a failure.CHESS is a framework that is responsible for state disseminationof replica objects. The design of CHESS aimsat making state synchronization as transparent to the developer

as possible while providing enough flexibility to accountfor varying types of internal state.

The remainder of this report is structured as follows:Section 2 states the problems that exist in defining a generalframework for state transfer. Section 3 describes design

of CHESS by presenting design challenges and howthe solutions that are applied in CHESS to overcome them.Section 4gives a summary of the resulting systems and its

benefits.

2Problem Statement

Providing a generic mechanism for state replication is achallenging task due to the wide range of differences in howapplication state can look like.

As CHESS focuses on passive replicaion schemes replicasneed to exchange information about their state topreserve consistency. CHESS uses the common checkpointingapproach to do so: all relevant state informationof an application is gathered and captured in form of a socalled snapshot (i.e. structured data or memory dumps).

Designing a generic mechanism requires trade-offsbased on different characteristics of internal applicationstate. This includes different approaches for timing, such astime triggered approaches that periodically take new snapshotsor event triggered approaches take snapshots that initiatesnapshots based on notifications or messages. Depending

on the replication style snapshots are directly distributedto all replicas through dedicated communication mechanismslike multicast messages (warm passive) or they are

stored in a central repository and transferred to the replicaonly prior to a fail-over (cold passive).

These are just examples for different ways to disseminatestate. To evaluate CHESS in a structured and comprehensiveway, we now introduce a simple taxonomy for state

characteristics. The following list categorizes state acrossdifferent dimensions that we will use in the solution sectionto evaluate the different solution aspects.

  1. The Location of state in relation to the componentimplementation is a crucial aspect and limitation forgeneric state replication approaches. The most commoncase is state that is internal to the application, beingcaptured in local variables, members of classes thatimplement the component or component attributes.However in complex DRE systems it is possible thatcomponents access system resources or middlewareinfrastructures (e.g. a database persistency layer)which is external. A special case of external state isshared state where several components use a systemresource (e.g. shared memory) together. Simply includingexternal and especially shared state into thesnapshot would lead to duplicates and merging conflicts in the replicas and needs to be given careful consideration.
  1. The Size of the internal application state can varygreatly. On the one side of the spectrum there arestateless applications that have no state to be preservedfrom invocation to invocation. Other componentskeep state information that is comparativelysmall (e.g. configuration values or counters). In otherapplication domains state data includes large amountsof data (e.g. received streaming data, multimedia content,in-memory databases).
  1. Complexity and Distribution are two tightly coupledproperties of application state information. Distributionmeans that the application can contain very differenttypes of state that is not stored within a single datastructure but rather is distributed throughout the applicationstructure. The greater the degree of distributionthe harder and more time consuming it is to create asnapshot or to restore state from a snapshot. This alsoapplies for complexity: On the one hand there are verysimple data structures like basic types that are veryeasily copied to or extracted from a snapshot. As thecomplexity increases for sequential containers like errorsor lists of items, these operations get more timeconsuming. Associative containers and structures witharbitrary member data types and big hierarchical depthhave even higher performance costs for snapshot creation.
  1. Dynamics of Changes: Not only the form of state differsgreatly from application to application, but alsothe frequency by which state is altered and needs to becheck-pointed. Some applications alter and store theirstate only once at initialization. Other applicationsundergo many state changes in their lifetime. Thesechanges can occur due to external input or internalmechanisms like time-triggered events. Many applicationschange their state based on incoming requests.Depending on the system characteristics this can happenvery rarely (e.g. in applications only used formaintenance) or with a high rate of invocations in therange of microseconds (e.g. for streaming of satellitetelemetry data). A generic replication mechanism likeCHESS, therefore needs to offer the flexibility to specifythe timing characteristics for state synchronization.

3Solution

The design of CHESS is presented by describing howchess addresses the challenges of (1) providing a commoninterface for exchanging diverse state snapshots, (2) satisfyingvarying timing requirements and (3) supporting differentprotocols for state dissemination. Each part consists of astatement summarizing the challenge, a description of howCHESS addresses this challenge and an evaluation how thesolution of CHESS affects the different dimensions of statecharacteristics.

3.1Providing a Common Interface for Exchanging Diverse State Snapshots

Challenge: As described earlier the structure and complexityof state snapshots vary greatly and are tightly coupledto the application implementation. It is therefore impossible

to design an interface through which state snapshotsare passed as strongly typed parameters. First generationdistributed systems tended to solve this problem by

passing simple byte streams and leaving the time consumingtask of marshaling, demarshaling, type checking andalignment adaptions to the application developer.

Solution: Pass state snapshot as CORBA Any. Toachieve platform and language neutrality for the state extractionmechanism and integration the necessary interfacesare declared in CORBAs interface definition language (IDL). IDL defines a special basic type any that allowsdynamic insertion of any data type, while still preservingtype-safety. For this purpose a type code field withinthe any is maintained and code for type checking, marshalling and demarshaling is generated by the IDL compiler.

This allows to separate different obligations in the processof state distribution: The application itself has to performthe insertion operation of its internal state into an anyobject and also the extraction operation to retrieve new stateinstances from an any value. The middleware can thendistribute the Any value transparently without knowledgeabout the internal structure of the snapshot. CORBA Anyscan only contain data defined in IDL.The application developeris responsible for declaring of an IDL data type thatrepresents the complete state, so that it can be inserted intoan any data-type.

Figure 1 shows the obligations of an application to makeits internal state available to the state synchronizationmechanism.An application has to implement these methods to

interact with the state synchronization mechanism. If theframework needs to extract state from an application thatis a primary replica, it will call get_state(). All backupreplicaswill receive state updates through the set_state()method.

Evaluation: This approaches’ main strength is that itaddresses the dimension of complexity and distributionthrough separation of concerns. It shields the generic disseminationmechanism from the internal structure of the applicationstate but also supports the application developer byusing the CORBA Any data type that provides extractionand insertion operators and therefore simplifies the gatheringand composition of a state snapshot.

The dimension ofsize has a strong influence on the performance of this approach:Transmitting any data has a certain overhead sincetype information has to be embedded on the sender side andextracted on the receiver side. CHESS does not interferewith the dimension of state location. By making it the responsibilityof the application developer to serialize state,he has the freedom to choose whether and how to integrateexternal and shared state into the snapshot.

The dimensionof the dynamics of changes is mainly addressed by otherparts of CHESS. However the time for serialization of asnapshots affects the overall timing behavior of state dissemination.

Figure 2: State transmission sequence

3.2Satisfying varying Timing Requirements

Challenge: Applications may have very different requirementsfor when snapshots shall be distributed fromthe primary replica to backup replicas. There are two maintypes of timing behavior: (1) cyclic timing where state is updatedbased on a given time interval and (2) acyclic timingwhere specific events like a client request trigger the statesynchronization. Middleware mechanisms could automaticallydetermine when to disseminate state for cyclic timingbehavior and then use the get_state() and set_state()methods as callback methods to automate the process.However since the timing cannot be predicted in the acycliccase it needs active involvement of applications to disseminatestate at the right time. Combining both cases into ageneral framework mechanism is needed to ease the burdenof the application developer without restricting timingschemes.

Solution: Separation of concerns between triggeringstate synchronization and state retrieval allows to treatboth cases in a uniform way. This approach includes severalsteps of interaction between an application and a statetransfer agent which is a part of the CHESS infrastructure.Each process containing CHESS object replicas also hosts

a state transfer agent that is responsible for all replicationrelated functionality and therefore removes this obligationfrom the application developer.

The sequence of interactions as described in figure 2 providesa mechanism for flexible and generic state dissemination.

  1. Registration of application objects with the state transferagent through a unique application id allows themanager to retrieve state from the application whenneeded. The registration needs to be done during theactivation of the object.
  2. The state transfer agent exposes the interface methodstate_changed (in string id) that allows the applicationobject to indicate a change of its internalstate. This then triggers state synchronization. The idparameter is needed by the agent to identify the objectwithin a process.
  3. It is the agents responsibility to react on the notificationabout a state change and retrieve the state fromthe object that issued the notification. This is done bycalling back the get_state() method described earlier.
  4. As the final step the state transfer agent will then distributestate to backup replicas in form of a CORBAAny instance using a suitable communication mechanism.

Evaluation: This solution mainly addresses the dimensionof dynamics of changes. CHESS makes triggeringof state synchronization the responsibility of the applicationdeveloper. The trade-off for this approach is additionaleffort for the developer to issue the change notificationswhenever they are necessary. On the other hand thisgives great flexibility in controlling which application statechanges really require state synchronization. This allowsfor the most efficient usage of resources, since updates areonly performed if they are necessary. Through the separationof concerns between state change notification and theactual execution of the state dissemination the effort for thedeveloper is greatly reduced. CHESS shields the replica implementationfromthe actual distribution of snapshot data tobackup replicas.

3.3Support for Different Protocols for State Synchronization

Challenge: There is no one-size-fits-all communicationmechanism to disseminate state. Depending on size andtiming requirements and the scheme of state dissemination,different communicationmechanisms are needed to provideoptimal performance. Small snapshots of applications withhigh reliability requirements need to be transferred throughsynchronous peer-to-peer protocols with error correctioncapabilities. Larger snapshots, especially when transmittedto a large number of replicas need efficient protocols likegroup communication protocols and multicast messages. Insystems with cold passive semantics where replicas onlyneed to update their state in a failure case a central persistentstorage solution for state storage and retrieval is moreadequate. Directly encoding the type of communicationmechanism into the applications’ implementation results ina tight coupling between business logic and transportmechanismand therefore complicates development and adaptionof the application.

Solution: Applying the Strategy pattern. CHESS usesthe strategy pattern [2, pp.315f] to allow for a a flexiblechoice of the used protocol at run-time. The state disseminationmechanism is represented by an object interface thatprovides unified access to all variants of state dissemination.This pattern can be applied to shield the application developer

from the concrete protocol for state dissemination andintegrate it into the state transfer agent. On replica registrationthe application can set a policy to determine whichmechanism will be used by the agent. The agent then willinstantiate the appropriate concrete strategy object instanceand associate it with the application to use with every disseminationof state information.

Figure 3 shows how the strategy pattern was appliedin CHESS to support two different communication mechanisms.Namely synchronous CORBA calls and multicastcommunication based on OMGs Data Distribution Service(DDS). The design of CHESS easily allows to extendthe framework with additional communication protocols,e.g. message-based mechanisms or database storage. Theabstract strategy interface benefits from the earlier designdecision to use the CORBA Any data type to represent snapshots.This reduces the complexity of the interfacemethods.However it also creates the necessity to extract the data fromthe any object and transform it into the appropriate form ineach concrete strategy class. One example for this is shownin case of DDS communication in figure 3.

The design above allows for choosing a different communicationmechanism for each replica within the process.At registration time the state transfer agent will create the

appropriate concrete strategy based on a registration parameter.When the application later notifies the agent aboutstate changes, it will pass the state to the appropriate objectusing the ReplicationStrategy interface.

Figure 3: The strategy pattern applied to state synchronization

Evaluation: The flexible mechanism for heterogeneousprotocols within CHESS addresses the dimensions of size,complexity and change dynamics. It allows to transparentlyapply protocols suited for particular state characteristics.This flexibility enables trade-offs between the following aspects:

  1. Short delivery times need to be ensured for applicationswith high update rates where the dimension ofchange dynamics is important. However with growingsize and complexity of state snapshots it is harder toprovide short delivery times. Connection oriented protocolsare well suited for fast delivery of small amountsof data.
  2. High network throughput is necessary for snapshotswith large sizes. However timely delivery can sufferfrom protocols that maximize throughput. Groupcommunication mechanisms are well suited for sendinglarge state to several receivers.
  3. Reliable delivery is needed in systems were state consistencyhas to be guaranteed under all crcumstances.This usually is done through error correction codes andretransmission of lost packets. Therefore trade-offshave to be made between efficient and reliable deliveryprotocols.The strategy pattern allows to make these trade-offs ona per object basis and therefore accounts for heterogeneousenvironments and systems with highly diverse state characteristicsper application object.

4Summary

The CHESS framework eases the burden of the applicationdeveloper through appropriate mechanisms and smartdesign decisions. It automates registration of object instanceswithin the local process space, the initialization anduse of concrete transport protocols, connections managementbetween replica objects and the actual disseminationof state. At the same time CHESS accounts for the greatvariations of state characteristics by leaving flexibility inserialization of state, timing of state synchronization andchoice of the appropriate protocol.

References

[1] J. Balasubramanian, S. Tambe, C. Lu, A. Gokhale, C. Gill,and D. C. Schmidt.

Adaptive Failover for Real-time Middlewarewith Passive Replication. In Proceedings of the 15thReal-time and Embedded Applications Symposium (RTAS),San Francisco, CA, Apr. 2009.

[2] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. DesignPatterns: Elements of

Reusable Object-Oriented Software.Addison-Wesley, Reading, MA, 1995.

[3] Pascal Felber and Priya Narasimhan. Experiences, Approachesand Challenges in

building Fault-tolerant CORBASystems. Computers, IEEE Transactions on, 54(5):497–511,May 2004.

- 1 -