Reducing streaming latency has been a hot topic for a while. Historically with HTTP based streaming, latencies of tens of seconds and even in the order of one minute is not unheard of. With traditional broadcast being in the realm of single digit seconds and new techniques such as WebRTC showing latency can be down to sub second, the question is of course which approach to pick. In this article we’ll dive into the how of WebRTC, and compare it with HTTP from the viewpoint of a streaming service. Let’s dive in!
A quick recap on the 'why' of HTTP based streaming
We used to have RTMP, the Real Time Messaging Protocol. It was used for media delivery, and gave you sub second latencies. But when HTTP based streaming protocols (such as Adobe HDS, Microsoft Smooth Streaming, Apple’s HTTP Live Streaming and MPEG-DASH) became more available, they nearly replaced all of the RTMP delivery to end viewers. The cost? An extremely high latency, often 40 or more seconds, and that’s just talking about the latency in the protocol.
So as an industry, why did we switch to these protocols? There are actually a number of reasons. (Some of you will without doubt think about the iOS App Store restrictions, heavily promoting the usage of HLS, but that’s not the main one.) The main reason for this is scalability. HTTP based streaming is in essence a very simple approach: you chop a video up in different files and serve everyone as if it was a static asset. In this sense, it turns live streaming into a sequence of files being loaded in exactly the same way as if loading images or scripts on a website. (For more insights, check our article on how HLS works here.) In contrast, streaming protocols such as RTMP require active servers, keeping a connection open between the server and client for the duration of the session.
When the web was growing, it had to scale. In order to scale, easy to operate caching servers were installed, which could store commonly asked files and serve them out with need for hefty calculations or keeping connections to clients open for long times. It sparked the creation of large CDN providers, allowing anyone with a base web server to start serving large audiences. By leveraging static files for streaming, scaling became suddenly easy. There is no need to install and coordinate high amounts of relatively expensive servers or run complex software, but instead “off the shelf” CDN services can be used, scaling to millions of viewers (of course it’s not THAT easy, but you get the gist of it).
For streaming we have seen something similar. As audiences grow, scale becomes more important. (Our State of the Industry guide will inform you on what to look at for in 2021.) If you add on top of that the new abilities compared to RTMP, that comes with HTTP based protocols such as Adaptive BitRate switching (ABR) . Which allows from their end to better tune to the vastly different network conditions of all of those viewers. We can mark some significant pro’s in the HTTP based streaming column. Today the number of viewers keeps growing and most streaming delivery is done through HTTP based protocols as a result of their advantages.
If HTTP based streaming is so great, why have WebRTC?
WebRTC is an extremely powerful standard with great tools behind it, and a lot of industry support. It was developed to support real-time communication capabilities (RTC, get it?) to applications, allowing for video, audio or any other time of data to be sent between peers. This is a significant difference when comparing with HTTP based streaming, which is client-server oriented.
With WebRTC, it suddenly became possible to easily build voice- and video-communication solutions between clients, even working from within your browser. Think about how “easy” it has become to use video chat when working from home. Odds are extremely high you will be using WebRTC when on a call. If I would need to build something similar, without a doubt WebRTC would be the first idea on the table.
In essence, WebRTC is a lot more advanced and a lot more complex compared to standard server to client HTTP based streaming. This makes sense. With WebRTC connections are set up between clients and in normal setups, servers are only used to coordinate peers connecting to each other. There are all kinds of challenges which pop up when trying to connect users potentially sitting behind a massive firewall, on all kinds of different devices, on all kinds of networks needing to both send and receive data. Receiving data from a managed server is easy, but sending data to another client, that’s a lot more complicated. WebRTC does bring an answer to these challenges making use of a combination of different techniques. It makes it an ideal approach when the need to connect multiple clients in your network directly is a must-have.
The capability to send data between peers also opens up new options, for example having one viewer send media data to another, relieving stress from the server. There are plenty of examples of media focussed “peer-CDNs” which make it even easier to scale media distribution (and reduce normal CDN costs by leveraging your viewers’ bandwidth instead of the CDN’s). In this article however, we’re not going to focus on that.
How can WebRTC be used for streaming?
Let’s focus on actually delivering video with WebRTC in a server to client kind of approach. When looking at more of a server to client kind of setup (and making the server act as a peer as well), everything becomes easier. The server side of the equation can be controlled easily, removing the need for WebRTC TURN servers (which are server proxies in case peers are behind a restrictive firewall, but of course you can control this on your server) and allowing to collapse STUN servers (the servers used to open a port/connection) and messaging servers (to share connection data on other peers, or in this case the servers acting as peers) into the actual streaming servers. On the other hand, WebRTC does still require active stateful connections, which has an impact on scalability.
By deploying streaming servers as peers within a WebRTC connection, one can leverage WebRTC to do traditional server to client streaming. With the cost of active connections, this does mean streaming services using WebRTC will need a dedicated kind of CDN. One which doesn’t come cheap. And you will either be paying a lot to have large instances of servers, or you will require a lot of servers. As an example, if you would assume a certain server instance with a specific size could serve an amount of “X” viewers, one could set up a hierarchy, allowing multiples of X viewers to be reached easily by setting up X servers behind a first origin server, allowing for X*X viewers to connect in total. This approach could be repeated, allowing for any arbitrary number of viewers in the end. The amount of servers will however grow, depending on the size of X (which will be tied into the size of the instance, and thus the operating cost).
With WebRTC being focussed on low latency video delivery, it does come with some (dis)advantages. For one, the protocol comes with built-in handling of lost packets (WebRTC usually defaults to UDP). When the network connection to a client deteriorates, packets will be dropped automatically and when the next frame is rendered, the latency remains low. While good for some use cases (for example a conference call where a lost frame matters less), this can be undesired for other use cases such as premium sports delivery. Thanks to the stateful connection, a server-peer could however pick up on data being lost, and scale down the video quality in a server-side ABR kind of way. Similarly, the stateful connection could allow for heavy personalisation of the feed, not just for network efficiency, but also for cases like SSAI. While WebRTC does not provide this out of the box, it is an interesting option. All of those things do require even more compute on the edge, which in turn will reduce the amount of clients a single server can handle.
How do we really compare WebRTC against HTTP based streaming?
When really comparing the two options, it often boils down to a handful of questions:
- What kind of latency do you need?
- How large is the audience going to be?
- How much are you willing to spend?
Answering the latency questions depends on which latency range are you targeting.
- "Latency is not important"
In this case, HTTP based streaming is likely the best choice. It will be the easiest and cheapest to set up.
- "Latency should be below 8 seconds"
Still HTTP based streaming is the best choice. LL-HLS and LL-DASH are probably the best protocols for the job. Even with relatively standard configurations, the 4s mark is well within reach.
- "Latency should be around 1 second"
Here the choices are more limited. While LL-HLS and LL-DASH can be tuned to handle this, they are not very well suited. Dipping below the 1s mark will also become practically impossible for them at scale. For the HTTP based protocols, HESP is an option as it can provide sub second latency at scale with standard CDNs, including strongly reduced time to first frame for stream startup and channel change, but the current downside here is that the number of vendors providing support is still relatively limited (although growing consistently). WebRTC remains a valid option here as well, especially when the audiences are small. It can even become the must-have if you need a deployment where every millisecond counts (but in that case, the audience likely is small anyway).
- "Latency in the 200-300ms range"
In this case, you will likely need WebRTC. Protocols like HESP can get close, but it’s really pushing the limit here. While WebRTC will also have issues reaching this if the network is not optimal, this impact will also be there for other protocols. It will come with some disadvantages if you want to scale WebRTC for these kinds of ranges, with higher cost compared to WebRTC at higher latencies, an increase in some QoE metrics (such as time to first frame and perceptual quality) and a lack of ABR. If hitting this latency target is a must however (although it’s not for common cases), there is little alternative available.
Looking at the audience size, there is a clear victory for HTTP based protocols when reaching large audiences. While WebRTC “CDNs” do exist, the cost is often extremely high. Looking at popular events and streaming services, HTTP based delivery is still the way to go. (I would be VERY interested if someone ever runs a Superbowl style event on WebRTC, but personally I don’t see how the economics would work.) A new trend I do see is the hybrid model: for a small audience, WebRTC is used and for a larger distribution, there is a switch to a slightly higher latency and standard HTTP based protocols. This is for example the case to have a handful of people which are actively placing bets or are bidding in an auction at a low latency where those who are simply watching don’t suffer from being one or two seconds behind.
Cost-wise, there is relatively little information to compare: most services which allow you to stream over WebRTC don’t share public numbers and when these are shared, there is quite a big difference based on the committed volumes. One thing which is very clear however, is that the cost is often a multiple of the delivery cost of a traditional CDN. As delivery is one of the highest running costs for most large streaming services, this is not something which should be discarded lightly. Same as with audience size (audience size often means cost anyway), the hybrid approach can make sense from a cost perspective. If you are however looking to keep costs in check, HTTP based streaming will be the way to go here.
What we recommend.
My personal recommendation is probably somewhat coloured. What I do believe in, is that things should be used for what they are built: abusing a specific technology for something it hasn’t been built for often causes issues down the road.
For most cases I would recommend the usage of HTTP based protocols. The reasons are simple: it scales at a reasonable cost and the most common latency target can be reached. The latency target most streaming services have, is not in the hundreds of milliseconds range (and even WebRTC can have difficulty here if the network isn’t ideal). The result is that HTTP streaming will cover the largest part of the market at ease. I am a firm believer that HTTP based protocols such as HESP can provide the answer to go even below the latencies of LL-HLS and LL-DASH, and solve a lot of issues not even solved by WebRTC such as channel change speed and network independence. (Do note I work for THEO which is a strong supporter of HESP, hence the color of my recommendation here.)
This doesn’t mean WebRTC isn’t without cases where it shines. If a really low latency is needed, it is ideally suited to solve the issue, especially when the audience is small, or a hybrid approach is used where a small number of users is served using WebRTC and the larger audience is served with HTTP based streaming.
Do you have any questions left unanswered? Things you agree/disagree with? Feel free to reach out to us, we’re happy to discuss your use case and see which approach would fit best!
Want to talk to us about what is the best solution for you? Contact our THEO experts.