ACN2005 Term Project

Improve VoIP quality

By introducing TCP-Friendly protocol

Burt C.F. Lien (連矩鋒)

CSIE Department, National Taiwan University

Abstract

The most notorious of VoIP lies on its voice quality when unmanaged internet traffic bursts. Lost packets may result in uncomfortable or twisted voice in receiver side. Nonetheless, since most of VoIP traffic use RTP (Real-time Transport Protocol), which layered on top of UDP, it is the nature of a connectionless protocol that the sender sends as much as it can do without the knowledge of network congestion. In this work, I will present an aggressive, but simple, method to augment RTP protocol with knowledge of TCP-friendly by using its extension bit in the header. This can effectively improve the quality of VoIP traffic under network congestion and, in the meanwhile, maintain its compatibility with other non-TCP-friendly-aware VoIP traffic.

Introduction

It may sound good to make real time traffic adaptive to network congestion status. Nonetheless, speaking to VoIP, unlike video’s variable bit rate codec standard, not all of communication peers have the ability to change to various bit rates (some might supports only G.729/20ms for example), thus lead to an interoperability concern.

Within unmanaged networks, burst traffic would cause packet loss and subsequently impair the received voice quality because of not enough sampling data fro voice decoder. This would lead to a poor voice communication quality.

A TCP-friendly mentioned here is a generic term which means a congestion-aware protocol other than unreliable UDP protocol. In this work, I will combine the advantage of voice decoder's linear interpolation and the TCP-friendly protocol, residing between RTP and UDP layer, to comfort this issue. The idea is not to slow down the transmission from RTP itself because VoIP is a real time traffic. Instead, it will slow down the transmission rate by dropping selected packets from its internal network sending queue (RTP queue).

Speaking to voice decoder, linear interpolation is the simplest and basic way to recover lost frames. It is the nature of interpolation method to achieve better prediction results if sampling data of the lost frame's close neighbors can be correctly received. So, if we intentionally to drop a single packet within a continuous voice stream, the accuracy of decoder’s prediction may be improved significantly.

Based on above assumption, I will design an aggressive method to internally drop a packet (for ex: 1 out of 5 packets if the packet lost rate in the network is 20%) from the RTP sending queue autonomously upon receiving congestion signal from TCP-friendly protocol (the modified RTP extension). This implementation is to provide a relatively simple and useful method to improve VoIP quality while considering the interoperability with incumbent protocol stacks.

VoIP Protocol Stacks

Most of current VoIP protocol (voice parts) stacks look like the following diagram. Considering the interoperability with other ordinary VoIP protocol layers, we are not going add a new protocol layer here to complicate the problem. I propose to add a TCP-friendly mechanism inside the RTP header which will be detailed in next section.

RTP Extension

The format of RTP Header is as following:

00 / 01 / 02 / 03 / 04 / 05 / 06 / 07 / 08 / 09 / 10 / 11 / 12 / 13 / 14 / 15 / 16 / 17 / 18 / 19 / 20 / 21 / 22 / 23 / 24 / 25 / 26 / 27 / 28 / 29 / 30 / 31
Ver / P / X / CC / M / PT / Sequence Number
Timestamp
SSRC
CSRC [0..15] :::

X, Extension. 1 bit. If set, the fixed header is followed by exactly one header extension.

We can have an extra 32-bit word space if the X-bit is set.

From the definition of RTP header, the X bit can be set to extend RTP for user-specific application or proprietary information. An extra 32-bits word is available when the X bit of the header is set. It is useful if we need to extend some field for congestion control, while considering the interoperability with other VoIP protocol stacks.

TCP-friendly Implementation

1.  Congestion Detection

From RTP protocol definition, the real time data will be transmitted, tagged with a sequence number, which we can detect a lost packet once the sequence numbers discontinue. The proportion of lost packets in a specific period of time provides us the severity of network congestion. We can then launch or stop our underlying TCP-friendly protocol upon the information.

2.  The packet lost rate

Initial threshold of packet lost rate is defined as “2%”, and increase by 2% scale, in this work. Once the lost rate exceeds the threshold, we begin our regulation by sending a congestion information to opposite peer while communication. Lost rate larger than “30%” is not consider here because the voice quality may be out of control. VoIP application itself should disconnect the call via RTCP (Real Time Control Protocol) BYE command.

3.  Additive increase and decrease

When packet lost rate achieve 2% in the network, the mechanism will be launched and drop specific packets from its outgoing queue.

Assume PLR is the packet lost rate in the network, the following algorithm is implemented to deal with packets in the RTP outgoing queue:

IF PLR>=2

IF ( SequenceNumber % 100 == PLR )

DropThePacket;

ELSE

SendThePacketAsUsual

ENDIF

4.  Recovery

Once PLR (packet lost rate) is less than 2, the RTP traffic regulation will be disabled.

5.  Protocol Implementation:

I define the RTP extension 32-bit data as following:

0 1 2 3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| <protocol initials> | <hd>| <severity of congestion>|

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Protocol Initials (16 bits):

Pad with 16 continuous “0” as RTP extension initial recognition.

Protocol Header (3 bits):

Fixed with “101” (binary) as the protocol header that each TCP-friendly protocol designed here can recognize each other.

Severity of Congestion (13bits):

A 13-bit long unsigned integer, which represents the detected lost rate in receiver side.

Receiver side algorithm:

Receiver side maintains a sliding window which records the packet lost rate (PLR) during a specific period as time goes by (for ex: 30 seconds).

IF PLR>=2

Set RTP header X-bit to be “1”

Extension word = “0000 0000 0000 0000” + “101” + <PLR value> (13 bit long)

ELSE

DoNothing

Sender side algorithm:

When sender side receives RTP packets with X-bit set to “1”, it parses the extension word. Upon detecting the TCP-friendly header “101”, it reads the value of PLR and starts outgoing queue regulation process.

IF PLR>=2

IF ( SequenceNumber % 100 == PLR )

DropThePacket;

ELSE

DoNothing (SendThePacketAsUsual)

ENDIF

Software tools:

The TCP-friendly protocol introduced here will be implemented on open RTP (oRTP http://www.linphone.org/ortp/ ) protocol stack and Windows UDP/IP stacks. We choose G.711/20ms (64Kbps for the voice payload) as the only voice codec in this test ( Speex : a free codec http://www.speex.org/ ).

Experimentation

Test configuration is as following:

( the rectangular represent VoIP a communication peer; the circular represent a network in the experiment)

test set A>

< test set B >

Bandwidth simulator is in charge of simulating various network bandwidths during the experiments. Considering the overhead of other protocol headers beside voice payload itself, the minimum bandwidth to accommodate a clear G.711/20ms VoIP call should be:

160bytes(voice payload) + 16bytes(RTP) + 4 bytes (RTP extension) + 8bytes(UDP) + 20bytes(IP) + 21bytes(MAC) = 229bytes/20ms = 91.6kbps

Having the bandwidth information, we then configure 4 types of bandwidth in bandwidth simulator, 100kbps/ 80kbps/ 60kbps/ 40kbps respectively for both test set A and B. We hope to know the impact of VoIP voice quality when bandwidth falls underneath the minimum bandwidth threshold and the effectiveness of our proposed TCP-friendly protocol.

I plan to conduct a human perceptive experiment to verify the effectiveness of this work. Two sets of VoIP equipments (A and B) will be set up for tests, set A with normal VoIP feature, and set B is equipped with TCP-friendly protocol that we design in the work.

20 persons (10men+10women), without the knowledge of the difference, will be choosed to judge the voice and vote for the best quality among these 2 sets.

Results:

U1~U10: men; U11~U20: women.

Set A (normal), and set B (TCP-friendly aware) are provided:

User / U1 / U2 / U3 / U4 / U5 / U6 / U7 / U8 / U9 / U10
1
2
3
4
User / U11 / U12 / U13 / U14 / U15 / U16 / U17 / U18 / U19 / U20
1
2
3
4

Test results – 1). 100kbps, 2). 80kbps, 3). 60kbps, 4). 40kbps

Future Work

For not having enough time, the implementation in work has not yet been down. If I have a similar project or course in next semester, I will try to finish the mechanism mentioned in this work, and have the above mentioned human perception experiments.

Summary

The mechanism used in this work can effectively mitigate network congestion by dropping certain packets before sending (narrowing down the consuming bandwidth). Nonetheless, just like the back-off mechanism, misbehaved users may eat more bandwidth than those well-behaved guys, and we currently have no solution for punishing those misbehaving.

Therefore, the TCP-friendly protocol proposed here should be promoted widely to gain the largest benefits.

Reference

[1] “TCP-friendly transmission of voice over IP” Beritelli, F.; Ruggeri, G.; Schembra, G.; Communications, 2002. ICC 2002. IEEE International Conference on Volume 2, 28 April-2 May 2002 Page(s):1204 - 1208 vol.2

[2] “Real-time Internet video using error resilient scalable compression and TCP-friendly transport protocol” Wai-Tian Tan; Zakhor, A.; Multimedia, IEEE Transactions on Volume 1, Issue 2, June 1999 Page(s):172 - 186

[3] “Time-lined TCP for the TCP-friendly delivery of streaming media” Mukherjee, B.; Brecht, T.; Network Protocols, 2000. Proceedings. 2000 International Conference on 14-17 Nov. 2000 Page(s):165 - 176

[4] “MPEG-TFRCP: video transfer with TCP-friendly rate control protocol” Miyabayashi, M.; Wakamiya, N.; Murata, M.; Miyahara, H.; Communications, 2001. ICC 2001. IEEE International Conference on Volume 1, 11-14 June 2001 Page(s):137 - 141 vol.1

[5] “A quality-adaptive TCP-friendly architecture for real-time streams in the Internet” Sahu, D.; Ghosh, D.; Chakrabarti, I.; Communications, 2003. APCC 2003. The 9th Asia-Pacific Conference on Volume 1, 21-24 Sept. 2003 Page(s):71 - 75 Vol.1

[6] “An empirical study of real audio traffic” Mena, A.; Heidemann, J.; INFOCOM 2000. Nineteenth Annual Joint Conference of the IEEE Computer and communications Societies. Proceedings. IEEE Volume 1, 26-30 March 2000 Page(s):101 - 110 vol.1

[7] “Resource allocation for audio and video streaming over the Internet” Qian Zhang; Ya-Qin Zhang; Wenwu Zhu; Circuits and Systems, 2000. Proceedings. ISCAS 2000 Geneva. The 2000 IEEE International Symposium on Volume 4, 28-31 May 2000 Page(s):21 - 24 vol.4