[MS-RTPME]:
Real-Time Transport Protocol (RTP/RTCP):
Microsoft Extensions

Intellectual Property Rights Notice for Open Specifications Documentation

§  Technical Documentation. Microsoft publishes Open Specifications documentation for protocols, file formats, languages, standards as well as overviews of the interaction among each of these technologies.

§  Copyrights. This documentation is covered by Microsoft copyrights. Regardless of any other terms that are contained in the terms of use for the Microsoft website that hosts this documentation, you may make copies of it in order to develop implementations of the technologies described in the Open Specifications and may distribute portions of it in your implementations using these technologies or your documentation as necessary to properly document the implementation. You may also distribute in your implementation, with or without modification, any schema, IDL’s, or code samples that are included in the documentation. This permission also applies to any documents that are referenced in the Open Specifications.

§  No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation.

§  Patents. Microsoft has patents that may cover your implementations of the technologies described in the Open Specifications. Neither this notice nor Microsoft's delivery of the documentation grants any licenses under those or any other Microsoft patents. However, a given Open Specification may be covered by Microsoft Open Specification Promise or the Community Promise. If you would prefer a written license, or if the technologies described in the Open Specifications are not covered by the Open Specifications Promise or Community Promise, as applicable, patent licenses are available by contacting .

§  Trademarks. The names of companies and products contained in this documentation may be covered by trademarks or similar intellectual property rights. This notice does not grant any licenses under those rights. For a list of Microsoft trademarks, visit www.microsoft.com/trademarks.

§  Fictitious Names. The example companies, organizations, products, domain names, email addresses, logos, people, places, and events depicted in this documentation are fictitious. No association with any real company, organization, product, domain name, email address, logo, person, place, or event is intended or should be inferred.

Reservation of Rights. All other rights are reserved, and this notice does not grant any rights other than specifically described above, whether by implication, estoppel, or otherwise.

Tools. The Open Specifications do not require the use of Microsoft programming tools or programming environments in order for you to develop an implementation. If you have access to Microsoft programming tools and environments you are free to take advantage of them. Certain Open Specifications are intended for use in conjunction with publicly available standard specifications and network programming art, and assumes that the reader either is familiar with the aforementioned material or has immediate access to it.

Revision Summary

Date / Revision History / Revision Class / Comments /
04/08/2008 / 0.1 / Initial Availability.
06/20/2008 / 1.0 / Major / Updated and revised the technical content.
07/25/2008 / 1.0.1 / Editorial / Revised and edited the technical content.
08/29/2008 / 1.0.2 / Editorial / Revised and edited the technical content.
10/24/2008 / 1.0.3 / Editorial / Revised and edited the technical content.
12/05/2008 / 1.1 / Minor / Updated the technical content.
01/16/2009 / 1.2 / Minor / Updated the technical content.
02/27/2009 / 1.3 / Minor / Updated the technical content.
04/10/2009 / 1.3.1 / Editorial / Revised and edited the technical content.
05/22/2009 / 1.3.2 / Editorial / Revised and edited the technical content.
07/02/2009 / 1.3.3 / Editorial / Revised and edited the technical content.
08/14/2009 / 1.3.4 / Editorial / Revised and edited the technical content.
09/25/2009 / 1.4 / Minor / Updated the technical content.
11/06/2009 / 1.4.1 / Editorial / Revised and edited the technical content.
12/18/2009 / 1.4.2 / Editorial / Revised and edited the technical content.
01/29/2010 / 1.4.3 / Editorial / Revised and edited the technical content.
03/12/2010 / 1.4.4 / Editorial / Revised and edited the technical content.
04/23/2010 / 1.4.5 / Editorial / Revised and edited the technical content.
06/04/2010 / 1.4.6 / Editorial / Revised and edited the technical content.
07/16/2010 / 1.4.6 / No change / No changes to the meaning, language, or formatting of the technical content.
08/27/2010 / 1.4.6 / No change / No changes to the meaning, language, or formatting of the technical content.
10/08/2010 / 1.4.6 / No change / No changes to the meaning, language, or formatting of the technical content.
11/19/2010 / 1.5 / Minor / Clarified the meaning of the technical content.
01/07/2011 / 1.5 / No change / No changes to the meaning, language, or formatting of the technical content.
02/11/2011 / 1.5 / No change / No changes to the meaning, language, or formatting of the technical content.
03/25/2011 / 1.5 / No change / No changes to the meaning, language, or formatting of the technical content.
05/06/2011 / 1.5 / No change / No changes to the meaning, language, or formatting of the technical content.
06/17/2011 / 1.6 / Minor / Clarified the meaning of the technical content.
09/23/2011 / 1.6 / No change / No changes to the meaning, language, or formatting of the technical content.
12/16/2011 / 1.6 / No change / No changes to the meaning, language, or formatting of the technical content.
03/30/2012 / 1.6 / No change / No changes to the meaning, language, or formatting of the technical content.
07/12/2012 / 1.6 / No change / No changes to the meaning, language, or formatting of the technical content.
10/25/2012 / 1.6 / No change / No changes to the meaning, language, or formatting of the technical content.
01/31/2013 / 1.6 / No change / No changes to the meaning, language, or formatting of the technical content.
08/08/2013 / 1.6 / No change / No changes to the meaning, language, or formatting of the technical content.

2/2

[MS-RTPME] — v20130722

Real-Time Transport Protocol (RTP/RTCP): Microsoft Extensions

Copyright © 2013 Microsoft Corporation.

Release: Monday, July 22, 2013

Contents

1 Introduction 6

1.1 Glossary 6

1.2 References 9

1.2.1 Normative References 9

1.2.2 Informative References 10

1.3 Overview 10

1.4 Relationship to Other Protocols 11

1.5 Prerequisites/Preconditions 11

1.6 Applicability Statement 11

1.7 Versioning and Capability Negotiation 12

1.8 Vendor-Extensible Fields 12

1.9 Standards Assignments 12

2 Messages 13

2.1 Transport 13

2.1.1 Confidentiality 13

2.2 Message Syntax 13

2.2.1 RTP Packets 13

2.2.2 RTCP Compound Packets 14

2.2.3 RTCP Probe Packet 14

2.2.4 RTCP Packet Pair 14

2.2.5 RTCP Sender Report (SR) 15

2.2.6 RTCP SDES 15

2.2.7 RTCP Profile-Specific Extension 15

2.2.7.1 RTCP Profile-Specific Extension for Estimated Bandwidth 15

3 Protocol Details 17

3.1 RTP Details 17

3.1.1 Abstract Data Model 17

3.1.2 Timers 17

3.1.3 Initialization 17

3.1.4 Higher-Layer Triggered Events 18

3.1.5 Message Processing Events and Sequencing Rules 18

3.1.6 Timer Events 18

3.1.7 Other Local Events 18

3.2 RTCP Details 18

3.2.1 Abstract Data Model 19

3.2.2 Timers 19

3.2.3 Initialization 20

3.2.4 Higher-Layer Triggered Events 20

3.2.5 Message Processing Events and Sequencing Rules 20

3.2.6 Timer Events 21

3.2.7 Other Local Events 21

4 Protocol Examples 22

4.1 SSRC Change Throttling 22

4.2 Bandwidth Estimation 22

4.3 Key Derivation 23

5 Security 25

5.1 Security Considerations for Implementers 25

5.2 Index of Security Parameters 25

6 Appendix A: Product Behavior 26

7 Change Tracking 27

8 Index 28

2/2

[MS-RTPME] — v20130722

Real-Time Transport Protocol (RTP/RTCP): Microsoft Extensions

Copyright © 2013 Microsoft Corporation.

Release: Monday, July 22, 2013

1 Introduction

This document specifies the Real-Time Transport Protocol (RTP/RTCP) Microsoft Extensions (RTPME), a set of extensions to the base Real-Time Transport Protocol (RTP) specified in [RFC3550]. RTP is a set of network transport functions suitable for applications transmitting real-time data, such as audio and video, across multimedia endpoints.

Sections 1.8, 2, and 3 of this specification are normative and can contain the terms MAY, SHOULD, MUST, MUST NOT, and SHOULD NOT as defined in RFC 2119. Sections 1.5 and 1.9 are also normative but cannot contain those terms. All other sections and examples in this specification are informative.

1.1 Glossary

The following terms are defined in [MS-GLOS]:

authentication
base64
cipher
datagram
encryption
network address translation (NAT)
Unicode string
User Datagram Protocol (UDP)
UTF-8

The following terms are specific to this document:

cipher block chaining (CBC): A DES mode of operation that chains blocks of cipher text as specified in [FIPS46-3].

codec: Short for encoder/decoder. An algorithm used to convert media between digital formats, especially between raw media (for example, audio or video) data and a format that is more suitable for a particular purpose (reducing size for example, in the context of RTP). The conversion from raw data is regarded as the encoding step, and the conversion back to raw data is regarded as the decoding step.

conference: An RTP session involving multiple participants.

connectionless protocol: A transport protocol by means of which endpoints communicate without a prior connection arrangement, and in which each packet is treated independently as a datagram. Examples of this type of protocol include Internet Protocol (IP) and User Datagram Protocol (UDP).

connection-oriented transport protocol: A transport protocol by means of which endpoints communicate after first establishing a connection, and in which each packet is treated according to the connection state. An example of this type of protocol is Transmission Control Protocol (TCP).

contributing source (CSRC): A source of a stream of RTP packets that has contributed to the combined stream produced by an RTP mixer. The mixer inserts a list of the synchronization source (SSRC) identifiers of the sources that contributed to the generation of a particular packet into the RTP header of that packet. This list is called the CSRC list. An example application is audio conferencing where a mixer indicates all the talkers whose speech was combined to produce the outgoing packet, allowing the receiver to indicate the current talker, even though all the audio packets contain the same SSRC identifier (that of the mixer). See [RFC3550] section 3.

Data Encryption Standard (DES): An encryption standard that specifies a FIPS approved cryptographic algorithm as specified in [FIPS46-3].

Dual Tone Multiple Frequency (DTMF): The signaling system used in telephony systems, in which each digit is associated with two specific frequencies. Most commonly associated with telephone touch-tone keypads.

forward error correction (FEC): A mechanism in which a sender uses redundancy to enable a receiver to recover from packet loss.

jitter: Variation in the network delay that is perceived by the receiver for each packet.

message digest algorithm 5 (MD5): A cryptographic hash function that generates 128 bits of hash value as specified in [RFC1321].

mixer: An intermediate system that receives RTP packets from one or more sources, possibly changes the data format, combines the packets in some manner and then forwards a new RTP packet. Because the timing among multiple input sources will not generally be synchronized, the mixer will make timing adjustments among the streams and generate its own timing for the combined stream. Thus, all data packets originating from a mixer will be identified as having the mixer as their synchronization source. See [RFC3550] section 3.

multimedia session: A set of concurrent RTP sessions among a common group of participants. For example, a video conference (which is a multimedia session) may contain an audio RTP session and a video RTP session. See [RFC3550] section 3.

non-RTP means: Protocols and mechanisms that may be needed in addition to RTP to provide a usable service. In particular, for multimedia conferences, a control protocol may distribute multicast addresses and keys for encryption, negotiate the encryption algorithm to be used, and define dynamic mappings between RTP payload type values and the payload formats they represent for formats that do not have a predefined payload type value. Examples of such protocols include the Session Initiation Protocol (SIP) ([RFC3261]), ITU Recommendation H.323, and applications using SDP ([RFC2327]), such as RTSP ([RFC2326]). For simple applications, electronic mail or a conference database may also be used. See [RFC3550] section 3.

packetization time (P-time): For audio, the amount (in milliseconds) of audio data that is sent in a single Real-Time Transport Protocol (RTP) Packet.

participant: A user who is participating in a conference or peer-to-peer call. May also be used in reference to the object that is used to represent this participant on the implementation.

port: The "abstraction that transport protocols use to distinguish among multiple destinations within a given host computer. TCP/IP protocols identify ports using small positive integers." The transport selectors (TSEL) used by the OSI transport layer are equivalent to ports. RTP depends upon the lower-layer protocol to provide some mechanism such as ports to multiplex the RTP and RTCP packets of a session. See [RFC3550] section 3.

Real-Time Transport Protocol (RTP): A network protocol that provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio and video.

RTCP packet: A control packet consisting of a fixed header part similar to that of RTP packets, followed by structured elements that vary depending upon the RTCP packet type. Typically, multiple RTCP packets are sent together as a compound RTCP packet in a single packet of the underlying protocol; this is enabled by the length field in the fixed header of each RTCP packet. See [RFC3550] section 3.

RTP packet: A data packet consisting of the fixed RTP header, a possibly empty list of contributing sources, and the payload data. Some underlying protocols may require an encapsulation of the RTP packet to be defined. Typically one packet of the underlying protocol contains a single RTP packet, but several RTP packets can be contained if permitted by the encapsulation method. See [RFC3550] section 3.

RTP payload: The data transported by RTP in a packet, for example audio samples or compressed video data. For more information, see [RFC3550] section 3.

RTP session: An association among a set of participants communicating with RTP. A participant may be involved in multiple RTP sessions at the same time. In a multimedia session, each medium is typically carried in a separate RTP session with its own RTCP packets unless the encoding itself multiplexes multiple media into a single data stream. A participant distinguishes multiple RTP sessions by reception of different sessions using different pairs of destination transport addresses, where a pair of transport addresses comprises one network address plus a pair of ports for RTP and RTCP. All participants in an RTP session may share a common destination transport address pair, as in the case of IP multicast, or the pairs may be different for each participant, as in the case of individual unicast network addresses and port pairs. In the unicast case, a participant may receive from all other participants in the session using the same pair of ports, or may use a distinct pair of ports for each. The distinguishing feature of an RTP session is that each maintains a full, separate space of SSRC identifiers. The set of participant included in one RTP session consists of those that can receive an SSRC identifier transmitted by any one of the participants either in RTP as the SSRC or a CSRC or in RTCP. For example, consider a three- party conference implemented using unicast UDP with each participant receiving from the other two on separate port pairs. If each participant sends RTCP feedback about data received from one other participant only back to that participant, the conference is composed of three separate point-to-point RTP sessions. If each participant provides RTCP feedback about its reception of one other participant to both of the other participants, the conference is composed of one multi-party RTP session. The latter case simulates the behavior that would occur with IP multicast communication among the three participants. The RTP framework allows the variations defined here, but a particular control protocol or application design will usually impose constraints on these variations. See [RFC3550] section 3.