Real-Time Streaming (RTS) is realized based on the Web Real-Time Communication (WebRTC) signaling method. RTS supports low-latency live streaming with the help of worldwide Alibaba Cloud Content Delivery Network (CDN) nodes and excellent scheduling algorithms of Alibaba Cloud. This topic describes the specifications of the RTS signaling protocol. This topic is intended for developers who master the basic knowledge of WebRTC.
Signaling process
The following figure shows the signaling process.

Signaling process
- The client sends a request with a Session Description Protocol (SDP) offer.
- Create an RTCPeerConnection object on the client, specify whether to receive or send
audio and video signals, and then create an SDP offer.
// Open audio and video, recvonly or sendonly { offerToReceiveVideo: true, offerToReceiveAudio: true }
- Send a stream pulling request from the client to ApsaraVideo Live by using the HTTPS
POST method. The request body is a JSON string. For more information about the request
parameters, see the Definition of the RTS signaling protocol section of this topic.
Note
- The
version
parameter specifies the version of the RTS signaling protocol. Set the value to 2. - The
sdk_version
parameter specifies the version of the RTS SDK. You can set the parameter as needed.
- The
- Send the constructed request to ApsaraVideo Live based on the signaling URL by using
the POST method. Specify the source URL in the JSON-formatted request body.
POST /app/streamname?auth=xxx HTTP/1.1 Host: domain Connection: keep-alive Content-Length: 2205 Content-Type: application/json
Note The content of a signaling URL is basically the same as that of a source URL, except the protocol header. The following URLs provide examples:- Signaling URL:
https://domain/app/streamname?auth=xxx
- Source URL:
artc://domain/app/streamname?auth=xxx
- Signaling URL:
- Create an RTCPeerConnection object on the client, specify whether to receive or send
audio and video signals, and then create an SDP offer.
- The server returns a response with an SDP answer.
After the server of ApsaraVideo Live verifies the request, the server generates an SDP answer and returns a response that contains the information about the live streaming node to the client. For more information about the response parameters, see the Definition of the RTS signaling protocol section of this topic.
- The client initiates Interactive Connectivity Establishment (ICE).
- After the client receives the response with an SDP answer, specify the session description
in the RTCPeerConnection object.
peerConnection.setRemoteDescription(new RTCSessionDescription(answer.jsep));
- Use the RTCPeerConnection object to initiate ICE and Datagram Transport Layer Security (DTLS) encryption. After the signaling channel is established, the client can pull streams from ApsaraVideo Live. This way, you can implement stream pulling and playback based on the standards of WebRTC.
- After the client receives the response with an SDP answer, specify the session description
in the RTCPeerConnection object.
- The client initiates a disconnection.
The client sends a DTLS alert message that initiates a disconnection to stop stream ingest or playback.
Sample code for the HTML5 player
// Create peer connection and local offer sdp.
peerConnection = new RTCPeerConnection();
peerConnection.onicecandidate = iceCandidateCallback;
peerConnection.ontrack = remoteStreamCallback;
peerConnection.createOffer({ offerToReceiveVideo: true, offerToReceiveAudio: true })
.then(signaling_pull).catch(errorHandler);
// CDN live post pull stream request.
function signaling_pull(offer_sdp) {
console.log('local offer sdp', offer_sdp);
peerConnection.setLocalDescription(offer_sdp).then(function() {
// Get pull stream url.
var stream_url = $("#stream_url").val();
console.log("stream url:" , stream_url);
// Add sdk and protocol versions.
var protocol_version = 2;
var sdk_version = "0.0.1";
$.ajax({url: stream_url, data: JSON.stringify({
mode: "live",
version: protocol_version,
sdk_version: sdk_version,
jsep:description,
}),
type: "post",
success:function(result){
var signal = JSON.parse(result);
peerConnection.setRemoteDescription(new RTCSessionDescription(signal.jsep)).then(function() {
console.log("get remote answer sdp: ", signal.jsep.sdp);
}).catch(errorHandler);
}});
}).catch(errorHandler);
}
Definition of the RTS signaling protocol
The RTS signaling protocol establishes a short-lived connection based on HTTPS. The protocol uses messages in the JSON format. This section describes the request, response, and error codes based on the RTS signaling protocol.
Sample request
Request:
{
"version":2,
"sdk_version":"0.0.1",
"mode":"live",
"pull_streams":[
{
"url":"artc://demo.aliyundoc.com
/liveApp****/liveStream****",
"amsid":[
"rts audio"
],
"vmsid":[
"rts video"
]
}
],
"jsep":{
"type":"offer",
"sdp":"v=0\n\ro=- 6839248142876176651 2 IN IP4 127.0.0.1\n\rs=-\n\r Omitted content"
}
}
Parameter | Type | Required | Description |
---|---|---|---|
mode | string | Yes | The mode of the stream. In this example, set the parameter to live. |
version | int | Yes | The version of the protocol. In this example, set the parameter to 2. |
push_stream | string | No | The ingest URL. |
pull_streams | []object | No | The stream that you want to pull. You can pull multiple streams at a time. For more information about the parameters nested under the pull_stream parameter, see the following table. |
sdk_version | string | No | The version of the SDK. |
jsep.type | string | Yes | The type of the SDP message. In this example, set the parameter to offer. |
jsep.sdp | string | Yes | The description of the SDP message. |
Parameter | Type | Required | Description |
---|---|---|---|
url | string | Yes | The source URL that starts with the artc:// header.
|
amsid | []string | Yes | The media stream ID (MSID) of the audio stream that you want to pull. In this example,
set the parameter to rts audio .
|
vmsid | []string | Yes | The MSID of the video stream that you want to pull. In this example, set the parameter
to rts video .
|
Sample success response
Response:
{
"trace_id":"2_1591173296_101.227.XX.XX_702080732320_dec327eb6eed0e0b07b349c8a565****",
"code":200,
"jsep":{
"type":"answer",
"sdp":"v=0\r\no=- 1591173291 2 IN IP4 127.0.0.1\n\r Omitted content"
}
}
Parameter | Type | Required | Description |
---|---|---|---|
code | int | Yes | The HTTP status code. If the request is successful, the code 200 is returned. For more information about error codes, see the following table. |
trace_id | string | Yes | The GUID of the request. The request GUID is generated by Alibaba Cloud CDN and can be used to troubleshoot issues. Keep the request GUID properly. |
jsep.type | string | Yes | The type of the SDP message. In this example, the value answer is returned. |
jsep.sdp | string | Yes | The description of the SDP message that is generated when CDN nodes pull streams from the origin. |
Error code | Description |
---|---|
403 | The error code returned because you are not authorized to perform the operation. |
404 | The error code returned because a stream to be pulled does not exist. |
611 | The error code returned because the client is required to play the streams over TCP. |
302 | The error code returned because the client is required to send the request to a new address. |
Enhanced SDP negotiation
Messages are exchanged in the SDP format during signaling. SDP negotiation is generally based on RFC 4566. RTS expands more semantics to make the negotiation compatible with the characteristics of the live streaming industry. RTS supports more container formats of videos and audio and more communications protocols. This way, RTS resolves the issue that WebRTC supports only the Opus format for audio and does not support B frames. RTS meets the needs of increasing streaming protocols.
Support audio in the AAC formats
RTS can transmit audio in various Advanced Audio Coding (AAC) formats over Real-Time Messaging Protocol (RTMP). The AAC formats include AAC-LC, HE-AACv1, and HE-AACv2. For more information about AAC formats, see ISO IEC 14496-3.
RTS can transmit audio in AAC formats by using the Low-overhead MPEG-4 Audio Transport Multiplex (LATM) container format. LATM determines whether the encoding information about audio is transmitted in in-band or out-of-band mode based on whether the audio contains the encoding information. In-band transmission sends the encoding information for each audio frame. Out-of-band transmission sends the encoding information only once. The muxconfigPresent parameter in an AudioMuxElement array specifies whether the information in AudioSpecificConfig is transmitted in in-band or out-of-band mode. Therefore, LATM is more flexible than Audio Data Transport Stream (ADTS). If the information in AudioSpecificConfig remains unchanged, the information in StreamMuxConfig can be first transmitted in an SDP message.
During signaling, RTS parses the encoding information during audio stream ingest and returns the parsed information in the negotiation response, as shown in the following code.
SDP offer | SDP answer | ||
---|---|---|---|
AAC-LC | HE-AACv1 | HE-AACv2 | |
|
|
|
|
|
|
|
If SBR-enabled=1
is added in the fmtp attribute of MP4A-LATM, the AAC format is AAC-HE. If SBR-enabled=1
and PS-enabled=1
are added, the AAC format is HE-AACv2. The AAC format is evolved from AAC-LC to HE-AACv2.
Therefore, the SBR and PS fields are used to indicate an AAC format. In addition,
config=StreamMuxConfig
is added in the fmtp attribute. StreamMuxConfig assembles the information in AudioSpecificConfig
of the streams to be ingested and contains the parameters that are related to the
details of the encoding information. The client can obtain the details as needed.

For more information, see AAC-LC / HE-AACv1 / HE-AACv2 Encoder Parameters.
Support videos in the H.265 format
RTS parses the encoding information of the videos in the H.264 or H.265 format during
video stream ingest and returns the information about the videos in the H.264
or H. 265
format in the SDP answer.
Encoding format | SDP offer | SDP answer |
---|---|---|
H.265 |
|
|
Support videos with B frames
During signaling, the client can add a field in the SDP offer to specify whether to
decode videos with B frames. If the client adds BFrame-enabled = 1
in the fmtp attribute of videos, the client can decode videos with B frames. In this
case, RTP timestamp = PTS
can be added, which means the client decodes each frame based on the increasing sequence
number. If videos with B frames are not supported, RTS can transcode the source video
streams to remove B frames.
In addition, RTS can return a composition timestamp (CTS). This allows the client
to calculate the decoding timestamp (DTS) based on the following formula: Presentation
timestamp (PTS) = DTS + CTS. If an SDP offer contains a=extmap:{$id} uri:webrtc:rtc:rtp-hdrext:video:CompositionTime
, RTS adds extension identifier = {$id}
to the first Real-time Transport Protocol (RTP) packet of each video frame. The value
of the id
variable is determined by the SDP offer that is sent by the client. The following
figures show the partial content of the SDP offer and the packet capture during stream
pulling:


RTS allows the client to determine whether to decode videos with B frames and whether to return CTS information. This ensures general capabilities in communications.
MSID mechanism
For more information about MSID, see The Msid Mechanism.