This topic describes the signaling process and definition of the Real-Time Streaming (RTS) signaling protocol, and enhanced Session Description Protocol (SDP) negotiation.

Note This topic is intended for developers who master the basic knowledge of Web Real-Time Communication (WebRTC).

Background information

TCP-based live streaming may have a latency of more than 3 seconds or even more than 6 seconds. To resolve this issue, ApsaraVideo Live provides the RTS feature. RTS is a value-added feature that uses User Datagram Protocol (UDP).

  • RTS provides easy-to-access live streaming services with a low latency of milliseconds, high definition, and smooth playback. RTS supports the playback of tens of millions of concurrent streams.
  • The design of RTS is open and standardized.

    In addition to RTS SDKs that ApsaraVideo Live provides, you can use self-developed clients to push streams to or pull streams from CDN nodes by using a signaling method similar to that of WebRTC. ApsaraVideo Live provides worldwide CDN nodes and excellent scheduling algorithms to facilitate you to manage and use large-scale live streaming services with low latency.


  • ApsaraVideo Live is activated. For more information, see Activate ApsaraVideo Live.
  • The RTS feature is enabled. For more information, see Overview.
  • An HTTPS certificate is configured for your domain name. For more information, see Enable HTTPS.

Signaling process

Signaling process
  1. The client sends a request with an SDP offer.
    1. Create an RTCPeerConnection object on the client, specify whether to receive or send audio and video signals, and then create an SDP offer.
      // Specify whether to receive or send audio and video signals.
      { offerToReceiveVideo: true, offerToReceiveAudio: true }
    2. Send a stream pulling request from the client to ApsaraVideo Live by using the HTTPS POST method. The request body is a JSON string.
      For more information about the request parameters, see the "Definition of the RTS signaling protocol" section.
      • The version parameter specifies the version of the RTS signaling protocol. Set the value to 2.
      • The sdk_version parameter specifies the version of the RTS SDK. You can set the parameter as needed.
    3. Send the constructed request to ApsaraVideo Live based on the signaling URL by using the POST method. Specify the source URL in the JSON-formatted request body. The content of a signaling URL is basically the same as that of a source URL, except the protocol header.
      • Signaling URL: https://domain/app/streamname?auth=xxx.
      • Source URL: artc://domain/app/streamname?auth=xxx.
      POST /app/streamname?auth=xxx HTTP/1.1
      Host: domain
      Connection: keep-alive
      Content-Length: 2205
      Content-Type: application/json
  2. The server returns a response with an SDP answer.

    After the server of ApsaraVideo Live verifies the request, the server generates an SDP answer and returns a response that includes the information about the live streaming node to the client. For more information about the response parameters, see the "Definition of the RTS signaling protocol" section.

  3. The client initiates Interactive Connectivity Establishment (ICE).
    1. After the client receives the response with an SDP answer, specify the session description in the RTCPeerConnection object.
      peerConnection.setRemoteDescription(new RTCSessionDescription(answer.jsep));
    2. Use the RTCPeerConnection object to initiate ICE and Datagram Transport Layer Security (DTLS) encryption. After the signaling channel is established, the client can pull streams from ApsaraVideo Live. This way, you can implement stream pulling and playback based on the standards of WebRTC.
  4. The client closes the connection.
    The client sends a DTLS alert message that initiates a disconnection to stop stream pushing or playback. Disconnection
Sample code for the HMTL5 player
// Create peer connection and local offer sdp.
peerConnection = new RTCPeerConnection();
peerConnection.onicecandidate = iceCandidateCallback;
peerConnection.ontrack = remoteStreamCallback;
peerConnection.createOffer({ offerToReceiveVideo: true, offerToReceiveAudio: true })

// CDN live post pull stream request.
function signaling_pull(offer_sdp) {
  console.log('local offer sdp', offer_sdp);

  peerConnection.setLocalDescription(offer_sdp).then(function() {
    // Get pull stream url.
    var stream_url = $("#stream_url").val();
    console.log("stream url:" , stream_url);

    // Add sdk and protocol versions.
    var protocol_version = 2;
    var sdk_version = "0.0.1";

    $.ajax({url: stream_url, data: JSON.stringify({
          mode: "live",
          version: protocol_version,
          sdk_version: sdk_version,
      type: "post",
          var signal = JSON.parse(result);
          peerConnection.setRemoteDescription(new RTCSessionDescription(signal.jsep)).then(function() {
              console.log("get remote answer sdp: ", signal.jsep.sdp);

Definition of the RTS signaling protocol

  • Protocol channel: a short-lived connection based on HTTPS.
  • Protocol format: JSON.
  • Protocol description
    • Request parameters
      Parameter Type Required Description Value
      mode string Yes The mode. Set the value to live. live
      version int Yes The version of the protocol. 2
      push_stream string No The pushing URL. A string
      pull_streams []object No The stream that you want to pull. You can pull multiple streams at a time. For more information the nested parameters under the pull_stream parameter, see the "Nested parameters under the pull_stream parameter" section.
      sdk_version string No The version of the RTS SDK. A string
      jsep.type string Yes The type of the SDP message. Set the value to offer. offer
      jsep.sdp string Yes The description of the SDP message. N/A
      Table 1. Nested parameters under the pull_stream parameter
      Parameter Type Required Description
      url string Yes The source URL that starts with the artc:// header.
      amsid []string Yes The media stream ID (MSID) of the audio stream that you want to pull. In this example, set the parameter to rts audio.
      vmsid []string Yes The MSID of the video stream that you want to pull. In this example, set the parameter to rts video.
    • Response parameters
      Parameter Type Required Description
      code int Yes The HTTP status code. A value of 200 indicates that the request is successful. For more information about how to handle errors, see the "Error handling" section.
      trace_id string Yes The globally unique ID (GUID) of the request. It is generated by Alibaba Cloud CDN. Save the request GUID. You can use the request GUID to troubleshoot issues.
      jsep.type string Yes The type of the SDP message. The value can only be answer.
      jsep.sdp string Yes The description of the SDP message that is generated when CDN nodes pull streams from the origin.
  • Examples
    • Sample request
                      "rts audio"
                      "rts video"
              "sdp":"v=0\n\ro=- 6839248142876176651 2 IN IP4\n\rs=-\n\r Omitted content"
    • Sample response
              "sdp":"v=0\r\no=- 1591173291 2 IN IP4\n\r Omitted content"
  • Error handling
    If the stream pulling request is valid, the HTTP status code 200 is returned. The error handling result varies based on the returned HTTP status code in the JSON-formatted response body. The following code shows the code and message parameters in the response:
       "code": 200, // A value of 200 indicates that the request is successful. For more information about the HTTP status code 200 and other status codes, see the "Status codes" section.
       "message": "success" // The returned message.
    Table 2. Response parameters
    Parameter Type Description
    code int The HTTP status code. For more information, see the "Status codes" section.
    message string The returned message.
    Table 3. Status codes
    Status code Description
    200 The status code returned because the request is successful.
    403 The error code returned because the authentication failed.
    404 The error code returned because a stream to be pulled does not exist.
    611 The error code returned because the client is required to play the streams over TCP.
    302 The error code returned because the client is required to send the request to a new address.

Enhanced SDP negotiation

Messages are exchanged in SDP format during signaling. SDP negotiation is generally based on RFC 4566. RTS expands more semantics to make the negotiation compatible with the characteristics of the live streaming industry. RTS supports more container formats of videos and audio and more communications protocols. This way, RTS resolves the issue that WebRTC supports only the Opus format for audio and does not support B frames. RTS meets the needs of increasing streaming protocols.

  • Support audio in Advanced Audio Coding (AAC) formats

    RTS can transmit audio in various AAC formats over RTMP. The AAC formats include AAC-LC, AAC-HE, and AAC-HEv2. For more information about AAC formats, see ISO IEC 14496-3. RTS can transmit audio in AAC formats by using the Low-overhead MPEG-4 Audio Transport Multiplex (LATM) container format. LATM determines whether the encoding information about audio is transmitted in in-band or out-of-band mode based on whether the audio contains the encoding information. In-band transmission sends the encoding information for each audio frame. Out-of-band transmission sends the encoding information only once.

    The muxconfigPresent parameter in an AudioMuxElement array specifies whether the information in AudioSpecificConfig is transmitted in in-band or out-of-band mode. Therefore, LATM is more flexible than Audio Data Transport Stream (ADTS). If the information in AudioSpecificConfig remains unchanged, the information in StreamMuxConfig can be first transmitted in an SDP message. For more information about LATM, see Application-Bulletin_AAC-Transport-Format.

    During signaling, RTS parses the encoding information during audio stream pushing and returns the parsed information in the negotiation response.

    SDP offer:
    m=audio 9 UDP/RTP/AVPF 120 96 
    a=rtpmap:120 MP4A-LATM/44100/2  
    • Case 1: AAC-LC
      • AudioSpecificConfig = 0x1210
      • SDP answer:
        a=rtpmap:120 MP4A-LATM/44100/2
        a=fmtp:120 cpresent=0;profile-level-id=1;object=2;config=400024203fc0
    • Case 2: AAC-HE
      • AudioSpecificConfig = 2b920800
      • SDP answer:
        a=rtpmap:120 MP4A-LATM/44100/2 
        a=fmtp:120 cpresent=0;profile-level-id=1;object=2;config=4000572410003fc0;SBR-enabled=1
    • Case 3: AAC-HEv2
      • AudioSpecificConfig = eb8a0800
      • SDP answer:
        a=rtpmap:120 MP4A-LATM/44100/2 
        a=fmtp:120 cpresent=0;object=2;profile-level-id=1;config=4001d71410003fc0;PS-enabled=1;SBR-enabled=1

    References: AAC / HE-AAC Parameters.

    If SBR-enabled=1 is added in the fmtp attribute of MP4A-LATM, the AAC format is AAC-HE. If SBR-enabled=1 and PS-enabled=1 are added, the AAC format is AAC-HEv2. As shown in the preceding figure, the AAC format is evolved from AAC-LC to AAC-HEv2. Therefore, the SBR field or both of the SBR and PS fields are used to indicate an AAC format. In addition, config=StreamMuxConfig is added in the fmtp attribute. StreamMuxConfig assembles the information in AudioSpecificConfig of the streams to be pushed and includes the parameters that are related to the details of the encoding information. The client can obtain the details as needed.

  • Support videos in H.265 format

    RTS parses the encoding information of the videos in H.264 or H.265 format during video stream pushing and returns the information about the videos in H.264 or H. 265 format in the SDP answer.

    Case: H.265
    • SDP offer:
      a=rtpmap:102 H265/90000
    • SDP answer:
      a=rtpmap:122 H265/90000
  • Support videos with B frames
    During signaling, the client can add a field in the SDP offer to specify whether to decode videos with B frames.
    • For example, BFrame-enabled = 1 is added in the fmtp attribute of videos. A value of 1 indicates that videos with B frames can be decoded.
    • If videos with B frames are not supported, RTS can transcode the source video streams to remove B frames.

    To decode videos with B frames, RTP timestamp = PTS can be added. The client decodes each frame based on the increasing sequence number.

    In addition, RTS can return a composition timestamp (CTS). This allows the client to calculate the decoding timestamp (DTS) based on the following formula: presentation timestamp (PTS) = DTS + CTS. If an SDP offer contains a=extmap:{$id} uri:webrtc:rtc:rtp-hdrext:video:CompositionTime, RTS adds extension identifier = {$id} to the first Real-time Transport Protocol (RTP) packet of each video frame. The value of the id variable is determined by the SDP offer that is sent by the client.

    RTS allows the client to determine whether to decode videos with B frames and whether to return CTS information. RTS values a general method in communication since RTS is designed.

    Partial content of the SDP offerPartial content of the SDP offer
    Packet capture during stream pullingPacket capture during stream pulling

Description of MSID

References: The Msid Mechanism.


This document defines a new SDP [RFC4566] media-level "msid" attribute. This new attribute allows endpoints to associate RTP streams that are described in different media descriptions with the same MediaStreams as defined in [W3C.WD-webrtc-20160531] carry an identifier for each MediaStreamTrack in its "appdata" field.

The value of the "msid" attribute consists of an identifier and an optional "appdata" field.

The name of the attribute is "msid".

The value of the attribute is specified by the following ABNF [RFC5234] grammar:

  • msid-value = msid-id [ SP msid-appdata ]
  • msid-id = 1*64token-char ; see RFC 4566
  • msid-appdata = 1*64token-char ; see RFC 4566

An example msid value for a group with the identifier "examplefoo" and application data "examplebar" might look like this:

msid:examplefoo examplebar

The identifier is a string of ASCII characters that are legal in a "token", consisting of between 1 and 64 characters.

Application data (msid-appdata) is carried on the same line as the identifier, separated from the identifier by a space.

The identifier (msid-id) uniquely identifies a group within the scope of an SDP description.