TSMP – Time Synchronized Mesh Protocol
Introduction
In addition to the wide variety of research applications of wireless sensor networks (WSN), there is now a growing commercial application space. Commercial requirements of the networking stack tend to be straightforward to state, although often difficult to achieve. Simply, the requirement is reliable, secure delivery of packets at low power in a timely manner. A survey of research applications shows that this is often the case in that domain as well [glaser].
Diligent attention to these applications requirements led to the development of TSMP, the Time Synchronized Mesh Protocol. The success of this approach is evident in its recent adoption as the basis for the Wireless HART protocol, and for the ISA SP100 protocol, both intended for application in industrial automation. Industrial automation applications are extremely performance sensitive, and TSMP’s adoption in this regime came only after extensive evaluation in hundreds of real-world deployments of periods of a several years.
Approach
Reliability in wireless systems is such a challenging problem that it is embedded in the public consciousness with advertising quotes such as “can you hear me now?”. Unfortunately, most wireless sensor motes cannot walk outdoors or move into a different room if they don’t have enough “bars”.
Reliability in most WSN applications means the fraction of data transmitted (typically measured in units of packets) that gets where it needs to go in the network within a given latency requirement. Requirements on latency vary widely, from milliseconds for Factory Automation, to days or months for some environmental monitoring applications. The longer end of this range generally collapses toward lower latency once it becomes clear that it is possible to have latencies of no more than tens of seconds for no higher cost than longer latencies. Similarly, not all applications require 99.9% reliability, but most would like it if they felt that it could be had at little or no cost above what other protocols provide.
To maximize reliability, TSMP uses frequency diversity, time diversity, and spatial diversity. To minimize power consumption, motes are time synchronized and most bandwidth is dedicated, to minimize the power spent on idle listening, and the power wasted on packet collisions. To provide different levels of quality of service (QoS), TSMP radio resources and packet flow are organized into independent graphs with something analogous to MPLS and IPv6 flow labels.
Frequency Diversity
802.15.4 radios provide 16 different channels in the 2.4GHz band. These channels use direct sequence spread spectrum with a channel bandwidth of 2MHz. This provides some diversity against narrowband interference, but unfortunately 2MHz is just not enough of a spread to avoid deep fades due to multipath interference [Werb]. By adding channel hopping in addition to DSSS, TSMP realizes several advantages over single-channel protocols:
· Reliability
· Longer effective range
· More BW available
Time synchronization
All motes in a TSMP network share the same sense of time, accurate to well under one millisecond. This capability supports several application requirements:
· Reliability Reliability requires channel hopping. Channel hopping requires time synchronization.
· Low power It is well known that having a shared sense of time between the two sides of a radio link allows them both to reduce their “on” time.
· Sensor time-stamps There are several approaches to providing time stamps to user data even if the motes themselves are not synchronized. Having time-synchronized motes makes it trivial.
Graph routing – path redundancy, QoS
There are often different classes of data flowing in WSNs. Examples include application data, network performance, and configuration data. Users may have different priorities for application data, which may be lower for regular reporting and higher for alarm data. These flows are typically going in more than one direction. Most data and alarms are likely to flow to one or more gateways, whereas configuration information is likely to flow from gateways into the network. TSMP provides the flexibility to treat all of these flows as equivalent, or to provide dedicated resources to some or all of them. Quality of service (QoS) on individual flows is directly related to power consumption.
Mechanics
The basic operation of TSMP will be described here. Details of implementation and performance will be given in the following sections. Unless otherwise indicated, this section will assume 2.4GHz 802.15.4 radio PHY.
Time slots
The format of a timeslot is shown in figure xxx. Mote A is transmitting to mote B, with an optional initial clear channel assessment. Assuming that the motes are already synchronized (discussed below), then B knows when to expect the first bit of the preamble. With some model for worst-case clock skew between B’s clock and A’s clock, B turns on its receiver Tg seconds before the expected arrival of the first bit from A.
Figure 1 Packet timing with a worst-case late transmitter. Block lengths are not to scale. Red indicates transmission, blue is reception, green is idle listening. For a perfectly synchronized transmitter and receiver, every block except “RX startup” and one “Tg” shift left by Tg. If B does not hear a preamble, then it turns off its receiver after the initial 2Tg idle listen.
Assuming a valid preamble and start symbol are detected, the receiver verifies the 2 byte CRC and then 4 byte DLL MIC. If both of these are valid, then an ACK packet is created, MICed and CRCed, and sent. The original transmitter may turn its radio off between the end of its last transmitted bit, and the earliest expected arrival of the ACK. The guard time for the ACK arrival can be quite small, since the two motes are tightly synchronized by the arrival of the packet.
For an 802.15.4 radio the appropriate time slot is approximately 10ms.
If a receiver listens for Tg before the expected arrival of the first bit of the preamble, and Tg seconds after the expected arrival time there is still no evidence of the beginning of the preamble (verifying this may require the receiver to stay on slightly longer than 2Tg), then the receiver turns off. Either no packet was sent, or the signal was lost due to external or multi-path interference.
Charge consumption of basic operations
There are only a handful of things that can happen in a time slot: a packet is transmitted, a packet is received, or the receiver listens and hears nothing. Each of these has a characteristic profile of current vs. time. The integral of this curve gives the total charge consumed during each operation: QTX, QRX, Qlisten . The amount of charge consumed will depend on the length of the packet transmitted, and if it is acknowledged. The amount of charge consumed will also be a function of the underlying hardware, and the efficiency of the software that runs on top of it.
Link layer acks
Conventional networking wisdom holds that end-to-end acknowledgments are required to achieve reliability. In latency-constrained applications, however, end-to-end acknowledgements often impose too much overhead, and many latency-sensitive internet applications do not use them [VoIP?, video?].
At the same time, if data reliability is critical, then packets must not be deleted until the recipient has acknowledged successful receipt of them. Link-layer ACKs provide a compromise appropriate for WSN where latency-constrained reliable delivery is needed.
If the received packet has a valid CRC and DLL MIC, an ACK is transmitted. The ACK may be positive or negative. Negative ACKs are generated if the receiving mote’s queue is full, or it is unable to process the packet for other reasons. No ACK of any type is sent if the CRC or MIC tests fail. In general, the transmitter will only delete the transmitted packet from its queue on reception of a positive ACK. Negative ACKs are useful for distinguishing network congestion from PHY errors. Both positive and negative ACKs may contain time correction information.
ACKs need to be cryptographically secure, so that malicious radios can not jam reception of a packet and then transmit a valid ACK to the sender.
ACKs are short packets, and would be expected to have lower PER than longer packets. As the properties of the channel are relatively stationary over millisecond timeframes, the PER of link layer ACK is much lower than the PER of a packet sent at a random time, even after accounting for the difference in packet lengths. This is fortunate, since the loss of a positive ACK generates a duplicate packet, since both sides of the link will now have a copy of the transmitted packet and continue to try to deliver it.
Time Synchronization
Using regular communication to compensate for imprecise clocks, it is possible to maintain a shared sense of time across a multi-hop network. In a TSMP network, this time sense must be accurate to better than the receive guard time, Tg, or the mote will lose connection with the network.
Motes transmit the first bit of their packet as close to the ideal start time as possible in their time frame. The receiving mote measures the time of arrival of this first bit in its own time frame. The accuracy of this sending and receiving timing is generally better than a few tens of microseconds on most mote hardware, and can be made substantially better with attention to detail [Shrivastava? Time synch]. With dedicated hardware, the error in this estimate can be pushed down into nanoseconds, allowing for time-of-flight ranging [Lanzisera]. At this point the receiver knows the difference between its time frame and the transmitters time frame. This information is then sent back to the transmitter in the ACK, either explicitly as a field in the message, or implicitly with transmission time. As a result, timing information travels both ways on every acknowledged packet. In any given packet exchange, either the transmitter or the receiver may adopt the clock of the other, or they may both ignore the timing information, or they may both adjust their clocks.
There are many approaches to using the pair-wise exchange of timing information to maintain network synchronization. Perhaps the simplest is to have time propagate from a single time-master, typically the gateway, to all other motes in the network. All motes talking directly to the time-master simply take the time adjustment from any packet exchange with the master as absolute truth, and set their clock to the time-master clock. This may result in a change to their RTC of at most Tg seconds. Similarly, when these “first hop” motes talk to their children, the children adopt the parents’ clock. This establishes the concept of a “time parent”, which may be different than a routing parent. Note that this approach is only optimal in the sense that it is simple, and it is proven to work.
Clock skew and drift
Almost all motes keep track of time by counting the oscillations of a quartz crystal tuning fork, nominally at 32,768 cycles per second. These crystals and their driving electronics burn less than 1 uW in well-designed systems [cite]. Slight differences in manufacturing lead to variations in the nominal frequency of roughly 10 ppm at constant temperature, and material properties lead to variation in frequency of over 100ppm over the industrial temperature range. Temperature-compensated crystal oscillators display better than 10ppm tolerance over temperature and aging, with only slightly higher current consumption [DS3231].
In addition to hardware compensation of the real time clock, it is straightforward for motes to use intelligent software as well. A mote that finds that it is always “slow” might naturally speed up its software clock.
Whatever the time keeping hardware, any two motes will have a bound on the difference e in the rate at which their clocks tick. If two motes are synchronized at time t0, with a worst-case synchronization error magnitude dsync then the worst-case error in their shared sense of time is Dtmax = e (t-t0) + dsync
To maintain connectivity, a mote must keep Dtmax less than Tg, so this places an upper bound on the time that can elapse between synchronizations
Tsync = (t-t0)max < (Tg - dsync )/e
For example, with a +/-1ms guard time and 50ms synchronization error, and a +/-10ppm clock accuracy over temperature and aging,
Tsync < ( 0.95ms) / (20ppm) = 48s
To remain synchronized, a mote must get a time synchronization update at least once every 48 seconds.
Synchronization traffic – keepalives and beacons
Time propagation via beaconing is popular in wireless systems. However, in most cases it is unnecessary and wasteful of energy.
In a network with regular data reporting to a time-master gateway, motes close to the gateway will typically see traffic much more often than once every Tsync seconds. These motes will maintain a very accurate copy of the gateway’s clock with no additional time-maintenance traffic.
On the other hand, motes in an alarm-only network, or motes near the edge of a data reporting network may not see regular traffic as frequently as they need to get synchronized.
Link
A link between two or more motes consists of a repeating time slot and channel offset. A link may be dedicated, in which case there is only one transmitter and one receiver at this time slot and offset. Slotted Aloha is possible if some or all motes in the network share a single link.
Both the transmitting and the receiving side of the link need to know if the packet transmitted is unicast, and an ACK is expected. A link with more than one receiver listening may be used for both unicast and multicast packets. The transmitter must know if the link it has is shared with other possible transmitter. In this case, the transmitter must assume that lost transmissions are due to collisions, and use an appropriate backoff algorithm.