SAHARA – Service Architecture for Heterogeneous Access, Resources, and Applications

Web page: sahara.cs.berkeley.edu

Participating UC Berkeley Faculty:

Randy Katz (CS – )

We are engaged in a multi-year research program--the SAHARA Project--whose goal is to create end-to-end services with desirable and predictable properties, like performance and reliability, when provisioned from multiple and independent service providers. We seek to develop an architecture for future services that supports the dynamic confederation of sometimes collaborating and sometimes competing service providers.

Our primary investigations have thus far focused on interdomain and inter-ISP routing using BGP as a basic but essential "reachability service", voice over IP as a load and latency sensitive service, and unified messaging as a reliable yet asynchronous wide-area service spanning access networks. Generically, each one critically depends for performance and reliability on choosing good places to deploy processing and storage in the wide-area, and selecting good network paths to connect service instances to clients of those services.

These initial investigations have yielded a reference model to describe our service architecture. Composed services fall into either the connectivity or application plane. The former is built on the best-effort services provided by an IP network, enhancing basic reachability to provide an abstraction of an "end-to-end network with desirable properties". It is further divided into enhanced links and enhanced paths. Enhanced "links" are actually multi-hop connectivity under the control of a single service provider, such as a route through an ISP, or at the peering point between ISPs. They attain desirable properties through behavioral monitoring, to detect protocol errors or misconfigured or faulty components (e.g., routers and interconnect) that support the "link," and through intra-cloud or cloud boundary measurement and resource allocation services to achieve soft performance "guarantees." Enhanced paths are a sequence of enhanced links, providing desirable properties in an end-to-end path between end-points. A path could span multiple service provider domains. Choosing between alternative paths is made adaptively, via path-oriented resource allocation services across providers. Desirable properties such as performance and enhanced availability are achieved through load balancing across provider resources at this layer. Enhanced links are lighter weight than MPLS paths and require no special routing layer support, while enhanced paths carry the concept beyond MPLS to span multiple administrative domains.

Above connectivity is the application plane, which is used to build end-user applications. The application layers of middleware and composition are above the connectivity plane’s end-to-end network. Middleware services reside in various places in the network, and are composed by establishing enhanced paths between service instances. This allows one service to stream its outputs to the inputs of a service with which it is composed. Middleware services enable applications, and thus are not directly visible to end-users. Examples include content-distribution-networks (CDNs), and data translation services such as transcoders, language translators, etc. suitable for achieving interworking between a variety of networks, access devices, and applications.

A new focus for our group is interdomain inter-service provider storage access ("storage wide-area networking") as a composed service within the SAHARA framework. Our particular challenge application will be datacenter disaster response; which we define as the low latency establishment of high speed connectivity to facilitate the rapid copying of huge data volumes to a remote (set of) site(s) as quickly as possible. In addition to the obvious requirements for high bandwidth and low latency from the underlying network transport, such an application demands the rapid identification of locations where storage resources are to found among storage service providers in the wide-area network, and an understanding of the geographical and topological diversity of those resources to give both a high confidence that the storage provider(s) will avoid the disaster and that not interfering paths can be found to exploit parallelism in the data copying operation.

We are in the process of defining the disaster response application in more detail, and intend to design and prototype the relevant underlying application and network services as proofs of concept. These will span enhanced connectivity (e.g., fast methods to identify parallel and orthogonal network paths between the client site and the storage service instances in the network) and resource management techniques (e.g., selection of candidate storage service provider based not only on storage availability but also on dynamically determined end-to-end network bandwidth).

We are interested in investigating how active network components can support such applications. For example, consider pre-existing trust relationships between client organizations and a particular storage service provider. But the latter may not be the best choice of provider to receive the disaster copy given current bandwidth and latency considerations. Thus an alternative provider might be selected if the data to be stored there can be encrypted on the fly using local processing resources. Other examples include automatically striping copy flows across multiple storage service provider instances to reduce overall latency, and the controlled introduction of redundancy, such as RAID-style parity to such wide-area copies.

A university network services research community is starting to emerge, and is jointly developing a wide-area services testbed called PlanetLab. PlanetLab: is a global overlay network spanning well over 100 sites exhibiting a rich diversity of link behavior, including some sites located at major routing and co-location centers of the Internet. While most of these sites will consist of modest resources, several will exhibit substantial processing and storage resources. For example, as part of the Millennium Project, Berkeley has a multiterabyte data storage facility interconnected via gigabit Ethernet with over 1000 Intel processors organized as a cluster distributed across the campus. We will use PlanetLab as the target environment for our prototypes, to quantify the true overheads in orthogonal path discovery and sustaining high end-to-end bandwidth between edge networks and storage service providers.