Internet Draftdraft-irtf-ipnrg-arch-01.txtApril 2002
IPN Research Group / V. CerfINTERNET-DRAFT / Worldcom/Jet Propulsion Laboratory
<draft-irtf-ipnrg-arch-01.txt> / S. Burleigh
April 2002 / A. Hooke
Expires October 2002 / L. Torgerson
NASA/Jet Propulsion Laboratory
R. Durst
K. Scott
The MITRE Corporation
K. Fall
Intel Corporation
E. Travis
Global Science and Technology
H. Weiss
SPARTA, Inc.
Delay-Tolerant Network Architecture:
The Evolving Interplanetary Internet
draft-irtf-ipnrg-arch-01.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
The list of Internet-Draft Shadow Directories can be accessed at
Abstract
This document describes an architecture for delay-tolerant networks, and is a generalization of the architecture designed for the Interplanetary Internet: a communication system to provide Internet-like services across interplanetary distances in support of deep space exploration. This generalization addresses networks whose operational characteristics conventional networking approaches unworkable or impractical. We define a message-based overlay that exists above the transport layer of the networks on which it is hosted. The document presents an architectural overview followed by discussions of services, topology, routing, security, reliability and state management. The document concludes with a discussion of end-to-end information exchange, including an example.
Table of Contents
Status of this Memo......
Abstract......
Table of Contents......
Copyright Notice......
Acknowledgments......
Foreword......
1Introduction......
2Why an Architecture for Delay-Tolerant Networking?......
2.1Constraints Posed by Extreme Environments......
2.2Problems with Internet Protocols and Applications......
2.3DTN Principles of Design......
3DTN Architectural Overview......
3.1The DTN is Based on Message Switching......
3.2DTN Classes of Service Mimic Postal Mail-like Operation...
3.3Regions and Region Identifiers......
3.4Tuples......
3.5Late Binding......
3.6The "Bundle Layer" Terminates Local Transport Protocols and Operates End-to-End
3.7Routing......
3.8Bundle Layer Reliability and Custodianship......
3.9Security......
3.10Time Synchronization......
4Service Considerations: Application Instances and Bundles.....
4.1Networking Style Issues......
4.2Bundle Lifetime......
4.3Classes of Service......
4.4Delivery Options......
4.5Naming and Addressing......
5Topological Considerations: Node, Regions, and Gateways......
5.1Node......
5.2Regions......
5.3Gateways......
5.4Discussion......
6Routing considerations......
6.1Types of Routes......
6.2Store-and-forward operation......
6.3Contact-oriented routing......
6.4Routing protocols......
7Security considerations......
7.1User-oriented security services......
7.2Infrastructure security services......
8Reliability Considerations......
8.1Custodial Operation......
8.2End-to-end Retransmission......
8.3Congestion Control at the Bundle Layer......
9State Management Considerations......
9.1Application Interface State......
9.2Bundle retransmission state......
9.3Bundle routing state......
9.4Transmission queue state......
9.5Receive queue state......
9.6Network management state......
10Convergence Layer Considerations for Use of Underlying Protocols
11Bundle Header Information......
12An Example Bundle Transfer......
12.1Rules for forming tuples in the example network......
12.2Example Network Topology at the Region Level......
12.3DTN Gateway routing......
12.4Systems participating in example bundle data transfer.....
12.5End-to-end Transfer......
12.6Error Conditions at the Bundle Layer......
13Summary......
14References......
15Security Considerations......
16Authors' Addresses......
Copyright Notice
Copyright (C) The Internet Society (2002). All Rights Reserved.
Acknowledgments
John Wroclawski, David Mills, Greg Miller, James P. G. Sterbenz, Joe Touch, Steven Low, Lloyd Wood, Robert Braden, Deborah Estrin, and Craig Partridge all contributed useful thoughts and criticisms to this document. We are grateful for their time and participation.
This work was performed under DOD Contract DAA-B07-00-CC201, DARPA AO H912; JPL Task Plan No. 80-5045, DARPA AO H870; and NASA Contract NAS7-1407.
Foreword
"The term 'sonata' can be applied to a large work in three movements or to the three sections of a single movement: exposition, development, recapitulation. They resonate with the three acts of a typical movie or play, the three main parts of a myth or legend (life, death, rebirth), the cycle of days (daylight, night, then daylight again) and of the seasons in temperate climates (nice weather, then winter, then it gets nice again), the original Star Wars trilogy, the Indiana Jones trilogy, etc., etc. Basically, in any human experience we start off brightly, full of energy and hope; then we experience reality and become thoughtful and reflective, darker and slower; then Something Happens and hope is reborn, our energy returns, and we reach that happy synthesis of Innocence and Experience that is maturity. Rondo, andante, scherzo...."
-- Scott Burleigh
Release Notes
draft-irtf-ipnrg-arch-00.txt, May 2001: Rondo.
Original Issue.
draft-irtf-ipnrg-arch-01.txt, April 2002: Andante.
Restructured document to distinguish between architecture for delay-tolerant networking and the *application* of that architecture to a number of different environments, potentially including interplanetary internetworking, military tactical networking, and sensor internetworking.
Refined DTN classes of service and delivery options. Introduced a "reply-to" address to have information directed to a third party rather than the source.
Further defined the topological elements of delay tolerant networks.
Elaborated routing and reliability considerations.
Significant work in defining the model for securing the delay tolerant network infrastructure, based on signed admission control credentials.
Added section discussing state information.
Updated bundle header information and end-to-end data transfer example.
1Introduction
This document describes an architecture for delay-tolerant networks. We present this as a generalization and extension of our ongoing work on the Interplanetary Internet ( We have come to the realization that there are a number of different environments that share essential characteristics for which the architecture presented herein is appropriate.
The particular approach we employ is that of a message-based overlay that exists above the transport layers of the networks on which it is hosted.
2Why an Architecture for Delay-Tolerant Networking?
The existing TCP/IP-based internet, while fabulously successful in many environments, does not suit all environments. The ability of the "TCP/IP suite" to provide service depends on a number of important assumptions:
that an end-to-end path between source and destination exists for the duration of the communication session;
(for reliable communication) that the maximum round-trip time over that path is not excessive and not highly variable from packet to packet; and
that the end-to-end loss is relatively small.
A class of "challenged networks," which may violate one or more of these assumptions, is becoming important and may not be well served by the current end-to-end TCP/IP model. These networks typically serve environments in which it is either impractical or impossible to configure a communication environment that supports the assumptions on which the TCP/IP suite depends.
This section examines some of the constraints posed by extreme environments that may result in "challenged networks," and considers the problems that these environments might cause for common Internet protocols and applications. Finally, we derive some "principles of design" for the DTN.
2.1Constraints Posed by Extreme Environments
The kinds of extreme environments that this DTN architecture addresses constrain both the communication system and the applications using that communication system. This section describes some of the characteristics that make environments "extreme." Clearly, not all environments that would be considered "extreme" exhibit each of these characteristics.
2.1.1Long and/or Variable Delays
Delay affects networks in at least two different ways. First, interactivity is directly affected by delay. For example, telephone conversations over a terrestrial telephone line are markedly different than telephone conversations that are routed over a geostationary satellite hop. Second, variability in delay can affect applications and protocols. Consider a stream of packets carrying voice data over a network. If the delay in the network varies greatly from one packet to the next, a buffer is required to avoid the perception of pauses in the reconstructed voice track. If the application is hosted on a network with greater delay variability than the application was designed for, the stream will again have pauses, possibly of significant magnitude to make the voice data unintelligible.
Delays are comprised primarily of three components: propagation delay through the medium; queuing delay within relay points, source, and destination; and clocking delays associated with transmitting an atomic unit of data onto the medium.
Propagation delays can be long due to speed-of-light delays to cross long distances (e.g., deep space). Alternatively, propagation delays could be long due to the propagation medium (e.g., acoustic/underwater). How long is long? Round trip delays to Mars range between about 16 and 40 minutes at the speed of light.
Queuing delays are affected by traffic and service rate. In extreme environments, the service rate of a node may be low due to power limitations requiring a slow processor or a long wait while the unit is quiescent. Arrival distributions may cause significant variability in delays, particularly for heavy-tailed distributions.
Clocking delays accrue each time a packet is transmitted if the packet was fully received prior to transmission. ("Cut-through" routing, in which the first bits/bytes of a data unit are operated upon before the last bits/bytes have been received, is a fairly rare operation). For many types of data, the latency advantages of cut-through routing are more than offset by the reluctance of network designers to forward corrupt data. (Although some types of data are useful even if partially corrupted.) For data that are not forwarded before being fully received (and error checked), clocking delays accumulate at each hop in the network. For relatively slow, multi-hop networks, this can result in high per-packet delays, particularly if packet sizes are large. (Which can, of course, increase queuing delays for other packets, increasing both the round trip times and the variability of delay.)
We assume that processing delay, another contributor to overall delay, is comparatively low, and is therefore not discussed here.
Variability of delay can have a significant effect on communication in addition to absolute delay. Many protocols attempt to provide reliability through retransmission (ARQ) mechanisms. Timers are a critical element of most retransmission mechanisms, and establishing a retransmission time out value is an important aspect of the mechanism. Some protocols, particularly those at the link layer, have the ability to eliminate all components of round trip time other than propagation and clocking delays from their timings, and are able to produce retransmission time out estimates that are quite close to the actual round trip time. When queuing delay must be considered and the timing does not have direct knowledge of the delays at all of the queues, estimating a reasonable (not too high, not too low) retransmission time out value becomes more complex. A reasonable retransmission time out value is necessary to prevent unnecessarily retransmitting data (in the case of a timer that is set too low), and to prevent excessive delays in retransmitting lost data (in the case of a timer that is set too high).
2.1.2Frequent Partitioning
Network partitioning typically adds to data loss. If losses become too severe, applications typically fail. In addition to contributing to loss, network partitioning adds significantly to overall delay if nodes are configured to store data until the outbound link is restored. This behavior can also contribute to misordered data arriving at the destination, which can cause poor performance in some protocols.
2.1.3Data rate asymmetry
Data rate asymmetry measures the ratio of forward-path minimum capacity versus reverse-path minimum capacity. Data rate asymmetry adversely affects protocols using ARQ for reliable delivery by altering the ACK or NACK return path. This effect has been studied extensively with TCP, and is discussed below. At a higher layer, request/response applications that have not been appropriately tuned to balance the amount of request versus response traffic can time out waiting for data to travel across the lower-capacity link. In the most extreme case of bandwidth data rate, the asymmetry is infinite (as is the case with communications to submarines, for example) and the entire method of using ARQs for reliable delivery is not possible. In these cases, forward error correction and periodic retransmission are typically used. See [BLMR98] as an example of how such techniques are used in the Internet.
Data rate asymmetry can arise due to routing path asymmetry, different forward/reverse path link technologies, or intentional engineering tradeoffs. Moderate asymmetries in the wired Internet are found most commonly in asymmetric access technologies such as in CATV or ADSL-based subscriber lines. The largest asymmetries for wireless Internet access appear to be prevalent in in-flight Internet access systems. The AirTV system, for example, being proposed for 2004 and beyond, could provide a ground-to-air speed of up to 45Mb/s using DVB satellite technology with a comparatively anemic air-to-ground bandwidth of a 128kb/s or less using INMARSAT [ARINC]. In the case of deep space communications, downlink data rate is intentionally engineered to be much greater than uplink data rate. This tradeoff is not surprising given the goal of most science missions: the capacity demands of downlinking high-resolution images and telemetry from a spacecraft dwarf the capacity required to uplink short commands to that spacecraft.
2.1.4Packet Loss and Errors
Consider the following probability of success metric:
Pr = [(8*m)^(i*(1-pe))
This metric is the probability of successful end-to-end delivery of a particular message of size m bytes across i links assuming an identical, independently distributed (IID) stationary loss process with constant bit error rate pe across all intervening forwarders. (In this equation, the carat -- ^ -- indicates exponentiation.) This metric, as expressed, does not account for congestive loss but does account for message size and number of cascaded links given a fixed BER. For packet networks, most links employ some form of error detection, implying that any bit loss creates an end-to-end packet loss. As can be clearly seen from the formula, the end-to-end probability of successful delivery decays exponentially with path hop count. Any congestive loss would only worsen the performance, but some of these networks are engineered with admission control to effectively eliminate in-network congestion.
For reliable transfer, excessive errors require repair using either error correction or retransmission. In the case of end- to-end retransmission, the path can be so lossy as to effectively cause end-to-end retransmission to be useless. Consider the probability of packet loss pp = 1 - Pr. Given the assumptions listed above, the expected number of retransmissions required before successful delivery is given by: (1-(1-pp)^i)/(1-pp)^i. Considering an error probability of 0.3. In this case, 4 hops requires 3 retransmissions, 10 hops requires 34, and 20 hops requires about 1200 retransmissions. If a hop-by-hop retransmission scheme were used instead, the total number (network wide) number of retransmissions is given by i*pp/(1-pp). For the same error probability of 0.3, the number of retransmissions for 4, 10, and 20 hops would be 2, 5, and 9, respectively. Thus, for very lossy environments, an end to end retransmission strategy will not provide satisfactory performance.
2.1.5Interoperating Among Differently-Challenged Networks
A complicating factor in building internetworks comprised of networks operating in multiple different extreme environments is that there is no guarantee of support for any common communication technology. In many instances, at the time of deployment, only the barest minimum of capabilities can be fielded to support the given mission. If these networks are successful, they evolve. In many cases, this evolution leads to an expansion of scope and span for the network. Often this can result in a requirement to interoperate with another network in an environment that presents different challenges that have been addressed by dramatically different solutions.
The ability to operate over many different types of networks that have been fielded to support extreme environments is both a challenge and a constraint for this Delay Tolerant Networking Architecture.
2.1.6Examples of some extreme environments
Several types of extreme environments currently exist, with more appearing almost daily. If one considers the wired Internet, with multi-gigabit backbones, almost no corruption-related data losses and relatively short round trip times, then many different types of environments seem extreme.
This architecture was originally conceived to support the Interplanetary Internet, which exhibits round trip delays on the order of tens of minutes and intermittent connectivity that can result in disconnection for weeks.
Military tactical networks, in which data rates are low, error rates are high, and link interruptions are frequent, can likely derive great benefit from application of this architecture.
Similarly, sensor networks deployed in oceanic environments exhibit long delays, significant data loss, and the potential for long link outages.
An individual seeking a networking solution for a particularly intriguing and challenging environment has recently approached us to determine if the DTN architecture was right for the following environment. A group of widely dispersed communities of people in remote areas is not well served by either wired, fixed wireless, or satellite internet service. They do, however, frequently travel on snowmobiles from community to community, and congregate at work sites and larger towns. To effectively serve this environment, an architecture is required that can embrace intermittent, probabilistic connectivity, store and forward operation, and high and variable delays. (The architecture described in this document meets these needs and more.)
2.2Problems with Internet Protocols and Applications
This section discusses some of the issues associated with attempting to serve challenged networks with the Internet suite of protocols.
2.2.1Internet "Core" Protocols
The performance characteristics of challenged networks contribute to confound the efficient operation of the core Internet protocols. By the ‘core’ Internet protocols we mean IP, TCP, UDP, BGP, common IGPs (RIP, OSPF, or EIGRP) and DNS. These protocols span the services of end-to-end datagram delivery, reliable two-party stream delivery, regional (aggregated) routing path discovery with policy, intra-domain path selection and distributed support for name resolution. Although some of these protocols are technically “application” protocols from a layering point of view, we treat them here together as core protocols for the purposes of discussion.