Runtime Performance Modeling and Measurement of Adaptive Distributed Object Applications
John Zinky, Joseph Loyall, Richard Shapiro
BBN Technologies, 10 Moulton Street, Cambridge, MA USA
{jzinky, jloyall, rshapiro}@bbn.com
Abstract. Distributed applications that can adapt at runtime to changing quality of service (QoS) require a model of the expected QoS and of the possible application adaptations. QoS models in turn require runtime measurements, both in-band and out-of-band, from across the application’s components. As the comprehensiveness of the model increases, so does the quality of adaptation. But eventually the increasing comprehensiveness becomes too complex for the QoS Designer to deal with effectively. In addition, performance models of any complexity are expensive to create and maintain at runtime. The QoS Designer therefore needs a set of distributed-QoS tools to assist in the construction of models, the handling of quality vs. complexity tradeoffs, and the efficient maintenance of models at runtime. This paper describes the Quality Objects (QuO) middleware support that provides for developing a performance model; collecting and organizing run-time measurements of a system, both in-band and out-of-band; and maintaining the model at runtime in an efficient way.
1Introduction
Many distributed object application domains, such as military, financial, and health care, have stringent quality of service (QoS) requirements. Hosting these applications on wide-area networks, with their unpredictable and changing resource availability, and in embedded environments, with their constrained resources, requires applications to be aware of the resources in their environment and able to adapt in multiple ways to changes in resource availability. Based on these two requirements, an adaptation strategy must be developed that chooses an application behavior for any given environmental situation. The design of this strategy is the job of the QoS Designer, a role which is distinct from, and complementary to, that of the Application Designer who develops the functional algorithmic code of the application.
A performance model predicts how the application will behave given the usage patterns, underlying resources, QoS requirements, and adaptation strategy. The appropriate level of detail in the model depends on the available adaptations and resource parameters, and on the price the application is willing to pay for adaptivity. A more detailed model provides finer-grained control over the application's QoS behavior, but also requires more effort to construct at design time and to keep current at runtime. The examples will illustrate the tradeoffs between complex fine-grained modeling and simpler but coarser models.
This paper describes the support that QuO middleware provides for collecting run-time measurements of the system, for developing a performance model of the application and its environment, and for maintaining the performance model at runtime. Maintaining this model and its data is an important part of the QoS Designer’s work in creating an adaptive strategy. One of the major recent advances in the QuO middleware is the development of the Resource Status Service (RSS), a distributed service for measuring, aggregating, and disseminating resource information in a distributed system. In this paper, we describe the RSS and its use in creating and maintaining performance models of runtime systems.
Figure 1 shows how QuO runtime supports both in-band measurements and out-of-band expectations of QoS parameters. In-band measurements are inserted directly into the function call tree. This provides actual end-to-end QoS measurements of remote function calls using specific resources. Out-of-band measurements monitor the system resources and try to infer expected QoS. Integrating these two kinds of measurements is at the heart of any adaptation strategy.
The paper is organized as follows. Section 2 provides a brief overview of the QuO middleware. This section may be skipped if you’re already familiar with QuO. Section 3 introduces an example distributed object application, an image server system developed using QuO, which is part of our open-source software toolkit and will serve as a running example throughout the rest of the paper. Section 4 describes QuO’s support for gathering in-band measurements. Section 5 describes QuO’s support for out-of-bandmeasurements, including the RSS. Section 6 describes QuO support for creating efficient runtime models. Finally, Section 7 describes how to calibrate the performance models, by combining in-band performance measurements and resource capacity measurements. Each section includes examples based on the image server application.
2The QuO Framework for Adaptive Applications
The Quality Objects (QuO) framework is an extension to traditional distributed object middleware, such as CORBA and RMI, which manage the functional interactions between objects. In ideal circumstances, CORBA and RMI can give the illusion that remote objects are local. Where resources are limited and QoS requirements are stringent, this illusion is impossible to maintain. In the traditional approach, the algorithms for managing adaptation to constrained resources are entangled with the application’s functional algorithms. This results in overly complicated code that’s very difficult to maintain and extend. QuO provides support for programming QoS measurement, control, and adaptation in the middleware layer, separating the system specific and adaptive code from the functional code of the client and object implementations. In this way, QuO supports reuse of adaptive code and eases the application programmer’s burden of programming system issues.
As illustrated in Figure 2, a QuO application extends the traditional distributed object computing (DOC) model with the following components:
QuO contracts summarize an application’s current operating mode, expected resource utilization, rules for transition among operating states, and means for notifying applications of changes in QoS or in system status. Contract specifications are written in a high-level specification language called CDL and preprocessed into Java or C++ by the QuO code generator.
System condition objects (Sysconds) provide interfaces to system resources, mechanisms, and managers. They provide high-level reusable interfaces to measure, manipulate, and control lower-level real-time control and measurement capabilities. They export values that describe facets of system status, such as the current memory utilization or priority of a running thread, and provide interfaces to control system characteristics, such as modifying the processor clock rate or scheduling priorities.
QoS-aware Delegatesare adaptive components that modify the system’s runtime behavior along the paths of method calls and returns. QuO delegates are implemented as wrappers on method stubs or skeletons, thereby inserting behavior between the client and server. Delegates are written in a high-level aspect language called ASL and converted into Java or C++ by the QuO code generator.
Qosketspull together contracts, delegates, and system conditions into reusable components that are independent of any specific functional interfaces. Combining a functional interface with a Qosket makes a new object that implements the functional interface and manages some QoS aspect.
In summary, QuO includes high-level specification languages, a code generator, a runtime kernel, libraries of reusable qoskets and system condition objects, QoS property managers. These components are described in detail in other papers [8, 14, 15, 17], as is the application of QuO to properties such as security [18] and dependability [1]. In this paper we concentrate on QuO’s support for developing a performance model of an application and its environment, and for maintaining the performance model at runtime.
3An Example: Data Dissemination in a Wide-Area Network
As a running example in this paper, we will use an image server application that we have developed using the QuO adaptive middleware and which serves as the basis for many of our example and experimental applications. It consists of a remote data server maintaining a database of images and a client requesting images from the remote server. The data server has the capability of producing versions of the images of different sizes and of different quality as illustrated in Figure 3. The image server exposes interfaces enabling the client to request pictures that are “big” or “small”, “processed” or “unprocessed.” Big pictures use more CPU resources to display and more bandwidth to transmit than small pictures do. Processed pictures use more CPU resources on the server side to improve the image quality.
The challenge for the QoS designer is to program into the application an adaptive tradeoff between timeliness and quality. The user wants the best picture, but is not willing to wait very long. Better pictures (bigger and processed) take longer to transmit because they take more resources. The application needs to measure the timeliness of image delivery, and when round-trip image delivery and processing slows, the application needs to gather enough information to determine whether the source of the slowdown is network or CPU degradation and adapt accordingly.
As a basic example of adaptation, we use a qosket, called Bottleneck, which partitions the operating environment along the dimensions of bandwidth and CPU resources. Bottleneck’s contract has four regions with high and low server CPU along one dimension and high and low network bandwidth along the other. The Bottleneck qosket also includes system condition objects for determining the status of the runtime environment, used by the contract to determine the high and low regions. The qosket encapsulates all the behavior needed to measure the relevant system resources and to determine whether the constrained resource is the network, the CPU, or both. Note that the qosket is completely independent of the application and is therefore reusable.
When the QoS designer combines the Bottleneck qosket with the image server application, he specifies a binding of the contract regions to method calls on the remote object using QuO’s Adaptation Specification Language, ASL [12]. The QuO code generator creates a delegate that calls the appropriate server methods based on the Bottleneck contract region. While the functional application continues to call the original remote read method, the delegate will transparently substitute calls to other methods, depending on the state of the resources. When there are no resource bottlenecks, readBigProcessed is used because it gives the best picture. When both CPU and bandwidth resources are scarce, readSmallUnprocessed is used because it reduces the time to process and transmit the picture. Likewise, readSmallProcessed is used when the Network is the only bottleneck and readBigUnprocessed is used when the CPU is the only bottleneck. This is a simple strategy for trading off timeliness and quality with respect to the system constraints.
The image server application is typical of several of our experimental and transitioned applications, differing in the images they provide, the system in which they are hosted, and the adaptation choices they offer to maintain QoS. For example, in the avionics example described in [9], the image server delivers virtual target folders, which contain map images with notations and other information. The client and server are embedded in separate aircraft and communicate through a wireless link. Because of the extremely constrained bandwidth of the wireless link and the large size of the data images, the server offers the choice to break each image into smaller tiles, which are delivered separately and reassembled by the client, and to choose the quality of each tile. Higher quality tiles use more bandwidth and CPU.
In the dependable server application described in [10], the image server provides two interfaces, one that authenticates requests and services them with a secure server and another that does not authenticate requests. In this application, the client has the option to tradeoff security for speed, since the authenticating server requires extra data (and thus uses more bandwidth) for the authentication and more time and CPU to validate the identity of the requester.
The QoS designer can design many schemes to resolve the tradeoff between picture quality and call latency. The cleverness and appropriateness of the specific adaptation scheme developed is irrelevant to this discussion. What is important is the kind of adaptation schemes that are possible and how well QuO mechanisms support them. In the following sections, we will show how additional system information made available by measurement and modeling can be used to help refine the adaptation.
4In-band Instrumentation
The basic idea of in-band instrumentation is to insert measurement points along the call path from the client to the server and back. This instrumentation gathers measurements along the method call and return as illustrated in Figure 4, measuring things such as the number of calls; round trip latency; the time spent in the network; or the effective capacity of some underlying resource. The problem with adding instrumentation in traditional applications is that the code has to be placed at many places along the path and the results gathered together for processing and dissemination. Adding instrumentation code breaks the normal boundaries/interfaces as defined by the functional decomposition of the distributed system. Special support is needed to add this code and use its results without creating a tangled mess.
CORBA provides some support for inserting instrumentation into the data path between the client and server. CORBA interceptors [11] allow requests and replies to be intercepted at several points during the transmission of a remote call. CORBA Pluggable Protocols [7] allow new transport protocols to be used instead of the default IIOP protocol over TCP. The new transport protocols can add instrumentation to the messages to measure QoS properties and can provide control over network resources using RSVP [19] or Diffserv [5].
Other distributed object middleware, such as Java RMI, do not have CORBA’s open implementation, so instrumentation must be added above or below the equivalent of the ORB. For instrumentation below the ORB, QuO uses a Gateway Shell [14], which can intercept method calls and manage their QoS as request and reply messages are transmitted over the network resources.
QuO supports above the ORB instrumentation for both CORBA and RMI. QuO’s ASL language and code generator uses aspect-oriented programming (AOP) techniques to weave code inside methods for both the client-side and server-side delegates. Instrumentation code often needs to be added to all methods in an interface, e.g., adding a timer call before and after each remote method call. Features of native languages, such as Java and C++, do not readily support code that cross-cuts many methods, although Java’s class reflection [16] could be used to query an object for a list of all its methods and construct an instrumentation delegate at runtime. QuO uses an approach to support instrumentation in many methods similar to those in other aspect-oriented languages, such as AspectJ. However, unlike AspectJ, QuO supports both Java and C++, and supports weaving code across distributed systems.
4.1Example: Client-Side Latency Measurement
Suppose that the QoS designer needs to keep the latency of a call in the image server application below a threshold, so that delivery of images is smooth. If the latency is too high, the read method could downshift to a remote method that uses fewer resources, such as from readBigProcessed to readBigUnprocessed, to readSmallUnprocessed. The QoS designer can create a contract that implements the downshift as a state machine, but needs access to a measure of the latency to trigger the downshift.
QuO’s contract evaluation mechanism includes support for measuring method latency. Usually, a QuO contract is evaluated before and after a remote method is called. QuO provides a Probe Syscond class, illustrated in Figure 2, which catches the contract evaluation signal and measures the method latency. Different types of Probe Syscond can process the raw latency into statistical metrics, such as the average latency over the last ten calls.
When the contract downshifts to a new region, the latency is expected to go down, but the averaging mechanism still remembers old values. To avoid controlling based on old values, the new behavior must be locked in until the statistics converge to a meaningful value. QuO contracts support locking the contract into the current region until the statistics have stabilized.
4.2Example: Correlated Server and Client Latency
Suppose the QoS designer needs to determine which resource is the bottleneck. In the last example, the downshift behavior arbitrarily chose to reduce the server load (adapting to readBigUnprocessed). But the cause of the latency problem could just as well have been the network. The client side can measure end-to-end QoS characteristics, such as the overall latency or the sizes of requests and replies, but cannot determine which sub-components are contributing to the latency. To determine the relative contribution of sub-components, timers must be set and read as the call enters and leaves each component (identified by the yellow arrows in Figure 4). The measurements can be compared to differentiate between components. The approach is to pass a trace record from the client to the server and back, so that the client can determine the amount of time spent using different resources, such as network bandwidth and server-side CPU.
The ASL for adding a trace record is more complicated than adding simple behavior calls, because it needs to add an additional parameter to the interface to carry the trace record between the client and server. But if the interface is being changed at the client, the server must also change its interface to support the new trace record parameter. The consequence is that the remote object now has two interfaces, the normal interface and one with the instrumentation parameter. QuO includes a reusable qosket that manages adding a trace record and processing the results to get the relative network and server latency.