Webinar - VOD - Webinar on LL-HLS

This talk from CTO and Founder of THEOplayer, Pieter-Jan Speelmans, explains how Apple’s modified its approach to low-latency. Starting with a reminder of the latency problem with HLS, Pieter-Jan explains how Apple originally wanted to implement LL-HLS with HTTP/2 push and the problems that caused. This has changed now, and this talk gives us the first glimpse of how well this works.

Pieter-Jan talks about how LL-DASH streams can be repurposed to LL-HLS, explains the protocol overheads and talks about the optimal settings regarding segment and part length. He explains how the segment length plays into both overall latency but also start-up latency and the ability to navigate the ABR ladder without buffering.

Webinar Transcript

Pieter-Jan: So, hello and welcome everybody. Today we will be talking on low latency, topic which I find very interesting myself, and more specifically we'll be talking about low latency HLS. And as most of you already know, low latency has been a topic in the industry for quite a long time. For me personally, it has been going on for almost five years by now, investigating what kind of solutions can be used to improve the latency on the LL-HLS, well, on the HLS specification itself.

So, during this webinar, what I will actually try to do is transfer some of those experiences that we've built up over the past years. First, give up a quick introduction of how low latency HLS came to be, there were a lot of twists and turns in the story that happened there. So, it's quite important to use the right versions. And after that, I will give a high-level overview of what I see as the three basic mechanisms, which are very important, and which are very notable and have the highest impact on latency inside the protocol. In the LL-HLS specification, or at least in the updated HLS specification, there are a lot more updates than only those three. But these are the three ones which are definitely the most important to influence how the protocol works. Of course, there are also quite a lot of knobs and dials in this protocol, so I'll go through some of those knobs and dials which can greatly impact latency, but also some other very important QoE metrics for all use cases. And after that, I will close off by providing a quick look ahead where you will be able to use the protocol, which platforms are supported, which will be supported, and to give you a peak preview answer to the question - will low latency HLS be usable everywhere as a single protocol? Or will there still be a mix of low latency HLS and Dash or something else? As well as a quick highlight on the challenges that we see with the current implementations of packagers and players.

So, let's get started! And first things first, for those of you who don't know, very briefly - what is HLS? Well, HTTP Life Streaming Protocol released by Apple in 2009. And the biggest goal was to solve the scaling problem when streaming. And it is extremely simple, it basically takes a very long video stream. It splits it up into small segments, and those are listed into a playlist so that you can discover them, so that you can download them and then just play them one after the other. So, a basic download loop is very simple: you download what is called a master playlist, you download the playlist for a specific rendition you want to play, and then you just keep downloading a segment, a rendition playlist, a segment rendition playlist, and so on and so on. Very simple. It has resulted in HLS being very popular, through its simplicity but also through some of the other advantages such as scale.

Pieter-Jan: So, the real question is what is the problem? So why do we need to change this? It is simple - it is working great, it is used across the board very often, so why do we need to change this? And of course, as the title of this webinar already mentions the problem is latency. And more specifically in HLS, it is specified to have very large buffers, which means that there is a latency in the protocol of about four segment durations. In order to make it work well, you also need to have keyframes at the start of every segment. And you don't want your segments to be too small as a result, because otherwise your media compression will suffer, which in the end has a big impact on your QoE and on the quality that you deliver to your customers. But because of the latency you don't want your segments to be too big either.

So as a result, what we've seen so far or well what everybody has been seeing in this industry is that common latencies vary between 8 seconds to up to 30 seconds or I've even seen cases where it would go up to one and a half minutes. Which is quite a lot for some use cases, if you want some interactivity even that 8 second mark is still quite high. So, there were a lot of cases where people wanted to enable interaction with the viewers or with the content or through second screen apps or Twitter where that latency became a problem.

Now one of those places where that became a problem was actually Periscope, so Twitter's live streaming platform. And they actually presented the solution that they've built in one of their articles, but actually we presented it at Streaming Media West somewhere in 2016, where we actually described how we had helped Periscope to playback what they had called LHLS, a streaming protocol which instead of the 24-30 second HLS deployments they had before, where they would actually go into the lower single digit second ranges, around two to five seconds in a stable scenario at scale. And of course, with that blog post, it was very fun when it was released, we could finally start discussing with other people how it was working. And it was actually very simple as well: it just started using HTTP 1.1 chunk transfer for segments. And it would predict what the next segment would be in the playlist, already announced it so that client could actually start requesting it, reducing basically the time a client would need to discover and to download a segment.

So, with this being made, it was a very interesting time, of course, other people wanted to start using this as well. And a big milestone was actually made in 2018 when community initiative was started in an attempt to standardize a lower latency HLS version. This happened in the sidelines of the HLS.js initiative. And the way how they approached it was slightly different from the Periscope approach. They would actually add a new tag to highlight that a certain segment was a predicted segment, basically, to allow a player to identify, if I start a download of this segment, will it be predicted and will it take a while before I get feedback or will I get a response immediately.

But then of course around the time when the first specification was actually completed, Apple happened. They launched the first version preliminary specification of Apple low latency HLS. Just to make it a little bit clearer, I'll call it LL-HLS throughout the rest of the webinar. This is also the approach that well most people in the industry seem to take, given that Apple dubs it low latency HLS themselves. So, the preliminary specification that they launched, it actually broke completely from the approach that Periscope and the community LHLS took. So, they had a number of strange requirements as well, not strange, but it raised some eyebrows definitely within the industry.

So, in essence, they made it quite simple: instead of sending out one big segment over a chunk transfer, LL-HLS actually would split up a segment into different parts. A part can be added at the end of playlists near the life point, and it can be removed when it is no longer needed or when you are no longer close to the actual life point. So, it gives you the advantage of larger segments when you're doing a replay case, but it gives you the benefit of small segments when you would be close to the life point.

And in parallel with that, there was another change, and that was actually quite a major change that HLS did, because in the past HLS would, as I explained earlier, keep refreshing that playlist: it would send a playlist request to a server, it would get a response, there would be a certain timeout, and it would request a new playlist, hoping that new playlist would contain new data. It had the advantage that the server would be a passive origin, it wouldn't need to be active, which is extremely useful, but on the other hand you could get stale data as a response.

Because of course getting stale data would mean that you need larger buffers to actually handle that, in order to cope with that, LL-HLS introduced blocking playlists. And even more than that, it added query parameters to the playlist URLs, which is a very interesting approach, and with those simple mechanisms it actually allowed to get latencies down significantly.

On top of that, there were a few other things. On one side, it also required HTTP/2 push, but CDNs, there was quite a lot of things to do about that. CDNs weren't really ready to solve or to deliver with HTTP/2 push. It raised questions on scalability, it raised questions on all kinds of other fronts, and also having that media playlist and the media data itself delivered on the same edge, which would be needed if it would be push, was something that is far from desirable for content replacement. Because of course, that would mean that you can't just do a simple manifest manipulator anymore, that manifest manipulator also needs to serve the content, which, of course, is not how most, for example, server-side ad insertion use cases work. So that was quite an uproar. People started implementing it, nonetheless.

But then, actually, beginning of this year, Apple happened again. Because of all the uproar, something magical happened - Apple rethought the HTTP/2 push approach, and they actually removed the HTTP/2 push requirement completely. So, what they did was they added a new tag called - preload hint, or EXT-X preload hint, to be more precise. And what it would actually do is it would identify future data needed. If I say that you'll probably be thinking - oh so they went the community low latency HLS approach, which is a parallel which has been made more often, because it is actually quite similar. Later on, Apple actually updated the official RFC, so stating that this is going to be the approach that we will take. And then during WWDC, they actually announced that the results of using preload hint compared to HTTP/2 push, those results were actually in a lot of cases better. So that was very interesting.

Also, during WWDC, there was the magical announcement, which was more or less to be expected, that iOS 14 will fully support Low Latency HLS and we actually expect that around September that will be released. So officially marking the first Apple supported devices which will have native support for Low Latency HLS built in. So that's quite the roller coaster there going from Legacy HLS all the way in the beginning and in 2009 to Low Latency HLS supported by Apple here in 2020.

Pieter-Jan: So, what are the most important things? I already mentioned it earlier, and we already highlighted them as well, the three biggest changes that Apple made to HLS to enable low latency:

The first thing is of course the part tag. So, the EXT-X part, the fact that you can split segments into more granularity when you're close to the edge, close to the life edge that is. That's a very important one, that's a very important first change. And of course, linked to that, the ability for a player to anticipate which data will be needed next. That's the second very big one. So, the EXT-X preload hint tag is extremely important in order to reduce latency. And last, but definitely not least, are the blocking requests that you can make.

So, I already explained how a blocking request can be made for a playlist, but it can also be made for a preload hint. And it might seem trivial, but because of the fact that if you perform a blocking request, the server can start sending the data as soon as it becomes available. That is a game changer compared to just sitting around as a player, hoping that when that timer expires that the data that you will receive will be recent and will not be stale or synchronizing when the timer has to be fired exactly, that is a big impact on buffer time that you have there. So that's a massive reduction in your playback buffer that you can do by just making those three small changes.

So, there are other changes as well, these definitely are not the only changes that are made to HLS. There are also things like rendition reports, which allow you to know what the certain part number or the current segment is for any other rendition, which is very important if you want to do adaptive quality bitrate changes. There are also concepts such as Delta playlists, which have been introduced, which actually allow you to send an update of a playlist rather than an entire company, reducing the overhead of the playlist in general. And all of those changes are very useful. But if I have to pick three changes which are the biggest or which have the biggest impact on latency, then these three that are on the screen are definitely it. Those parts, preload hints and blocking requests, they are truly game changers for HLS when it comes to latency.

Pieter-Jan: Another question that I often get as well is: “Why should I care about low latency HLS? Why wouldn't I use low latency DASH?” Because, low latency DASH has been around for a little bit longer than low latency HLS, support for servers is up and running. Low latency HLS still needs to get that ecosystem in place. They still have a lot of use cases, there will be problems implementing.

But of course, the one thing that low latency HLS has, which is extremely interesting, is that Apple ecosystem. By the end of this year, almost all Apple devices, will be capable of playing low latency HLS out of the box. So that's a very hard ecosystem to ignore. On the other hand, low latency DASH is there as well. So, the big advantage is actually that you can reuse the streams that are set up for low latency DASH to generate segments and parts for low latency HLS as well.

Low latency DASH is actually quite simple as well: you just get a manifest and then you start downloading a segment. And you normally loop over just downloading the next segment and the next segment and the next segment. And similar to community low latency HLS, they would just get delivered in a chunk transfer mode, delivering the media from the server as it becomes available.

Now, with low latency HLS, what you can actually do is you can make segments and parts as a result, you can make them byte ranges of each other. So, what you can do is you can announce a preload hint for an entire segment. A player will start downloading that preload hint and once a manifest or once a playlist gets updated, you can slice out the beginning of that segment as a byte range in a part that you announce in the playlist. And you can simply update that preload hint to start from the end of that first part. And by repeating that, you can actually reuse the same segment in a chunk transfer manner as it would be sent out by DASH.

So for low latency HLS, that pipeline would actually change to something like this - where you would get the playlist, start the download of the segment, keep it running through and through until you've completed it and downloaded it all the way, while at the meantime, refreshing that playlist to identify which parts of that segment that are being downloaded are the ones which are relevant. The reason to download that playlist again, another question I sometimes get, it's basically to allow you to interact or well intervene as a server when something needs to happen. For example, when you would need to need to start an advertisement, need to do a blackout or anything like that. That's some additional flexibility that low latency HLS has there.

So, by being able to reuse the exact same media data, this has of course a very positive impact on different parts of the pipeline: it allows you to reduce storage, it allows you to increase caching efficiency, and of course everybody is very happy if you can reuse the same files regardless of the streaming protocol.

Pieter-Jan: So, the question is then - what is next? So, what happens if you make that decision to start investigating low latency HLS? Are there specific limitations? Because in theory, it all sounds nice, but of course, the number of low latency HLS deployments is still relatively low. So, at THEO, we've been playing around with this, since its conception basically, so, we did learn quite a lot of things here. And first and foremost, what we've seen is there are three parameters which are the most important ones to tune for low latency HLS. It's very simple: it's segment size, it's GOP size, and it's part size.

So, segment size was already important in legacy HLS due to the impact on latency. But in low latency HLS, it has no impact on latency anymore. Tuning the latency will be done through the part size, but more on that later. Where segment size does play a big impact is the overhead of the playlist. And of course, it also constrains the maximum GOP size that you should use. So, what do I mean with that? On one side, even if you have Delta playlists, there are three segments which will contain parts in your playlist on average. On the example on the left-hand side, I added six second segments with one second parts, resulting in six parts per second. This results in a playlist which will be about 1 kilobyte in total, at least, meaning about 8 kilobit per seconds of overhead on the network to deliver that playlist for this rendition. If you look on the right-hand side, there are about 10 second segments with half a second parts, which results in 20 parts per segment. A playlist would be around 3 kilobytes per playlist, more or less, making the overhead three times bigger.

Additionally, as well, because I tuned that part size to half a second instead of a second, it means that the playlist has to be loaded twice as frequently. So that totally accumulates to almost 50 kilobit per second as data for the playlist alone. And this problem actually gets worse if you think about how, you normally have two renditions for playing, one for audio and one for video. So that again doubles the overhead of the playlist. So, if you have a one megabit per second content stream, the playlist on the left-hand side would be 1.6% overhead, while the example on the right-hand side is already 10%. And this percentage of course increases if you would have a 400 kilobits per second feed for example as a lowest bitrate for your viewers, then that playlist would actually be close to 25% of overhead if you would have audio and video in there, which is quite significant. Definitely, segment size is still a parameter to tune.

As I already mentioned as well, GOP sizing tuning is very important as well. And why is it important? It's important for your QoE, it's important for your startup time. Most people think that this impact can be too big, but it actually is very big. It determines how often you will insert a keyframe or an IDR frame into your stream, which is basically a full image. As a result, it's a lot bigger compared to an incremental frame. And putting those keyframes in your stream often means that you are increasing the bandwidth. Or if you want to keep the bandwidth the same, you're reducing the quality that you deliver for your customers. So, if that overhead of inserting a lot of keyframes becomes too big, you'll have a problem. And we actually see that this is becoming too big when you go below a two second keyframe interval. It varies a little bit, it depends a little bit on your type of content, but in general you would want to keep your GOP sizes larger than that. On the other hand, it's also quite important not to make your GOP size too small. Why? If you make it too small, then, first of all, it's constrained by the segment size, but also, every segment should really start with a keyframe, and if you don't have enough keyframes, the problem is you do have to start decoding from a keyframe. So that's where it actually impacts the startup time and the latency at start.

So, take these three examples: they all have 500 millisecond parts, the first one has six second segments and six second GOPa. Second one, ten second segments, last one - six second segments, two second GOPs. According to the LL-HLS specification, players should maintain at least a three-part buffer before the end of the lifestream. The player also needs a key frame to get started from. So, I indicated on this slide where the player would actually start. So, for the first example, that would actually be five seconds before the end of the life point, second stream it would actually be seven seconds before the end of the, or the current life point, while the lowest one, so the third one, actually has latency at the start of about three seconds. There are of course also other ways to handle this. Your player could wait for the next keyframe, there are all kinds of other things, but even in that case it would actually be quite difficult, because you would be waiting for quite a long time.

Another approach that is quite common, also used in low latency DASH, is to speed up or slow down your playback so that you can gain and reduce the latency at start. But especially when there is audio, it will just feel unnatural, especially when there's a song being played. And that's a big problem as well. So, you're not always able to do that.

Also, where is the GOP size very important as well? It's also there when you're switching networks or switching renditions due to, for example, network behavior. First off, the player will always need to identify a keyframe, load that data, load the data between the keyframe and the life point. And before you can do all of that, it's very common that your buffer runs out. So, an alternative could be to wait for the next keyframe and, well, hope that you have enough time to wait for that keyframe. But if your network is going down fast, which usually is happening, then you will probably be in for this. And your users be waiting, watching a spinner, and nobody really likes that, let's be honest.

So, in contrast, if you have smaller GOPs, that problem becomes less large. The amount of data that you need to download becomes a lot smaller, and a lot more convenient switching points become available. So, if you update that slide and update that GOP size with a two-second GOP interval as I did on this slide, you see that there's a lot better balance. Going back in time is a lot easier. Waiting for the next keyframe becomes a lot easier as well. And based on our results, that two second GOP size is actually a pretty good balance. Why? You preferably would always have a keyframe in your buffer or have the next frame that you have to download, or well, the next segment that you have to download next part to start or to contain at least the keyframe. And well, that two second GOP size, especially with 500 millisecond part size, that gets you right into that sweet spot.

And as a result, this gets me to part size, because this part size, as you've already noticed in the last examples, it has a very important role as well. And one of the most important impacts that the part size has is the latency. So, in contrast with legacy HLS, low latency HLS part size hugely influences latency. And as mentioned earlier the first thing how it impacts latency is because you have to keep that minimal of three buffer parts as a buffer. And this is similar to the legacy HLS requirement where you would need to have three segments in your buffer. But it means that you're three parts behind the life point and it can be even worse because there can be one which is being generated at any point in time. Which would mean that you would be up to four parts behind of the life point. And in this example that I made here, you can actually see what that impact is of having a part duration. And you can actually see that the buffer would become extremely big if you have very big part sizes. Why?If you have a two second part size, that just means that you could have like up to eight seconds that you're behind the life points or seven seconds in this example. But it has a very big impact and even though those three parts, are not always needed in the strict sense, you can in a lot of cases creep a little bit closer to the life point, but it has some impact on stability of your stream.

So, what this would show is - it would make you think that having a short part size is likely better. That's true in some aspects, but it is important to keep in mind that playlist overhead as well. If a part size becomes smaller, it doesn't only mean that there are more parts to be listed, but also that playlist needs to be reloaded a lot more frequently. And what I've actually seen is - I've seen streams where they would have one frame parts, which is quite horrible, it's really a terrible idea, because it means that your playlist has to be loaded continuously. And having short parts, introduces a really significant overhead.

And based on our tests, we actually see a graph similar to this. When a part size becomes too small, latency tends to go up rather than down, the overhead simply becomes way too big. And as a result, what we actually recommend is try not to go below half a second. Stick somewhere between half a second and one second parts. It is pretty okay, hitting two, three, four second kind of protocol latency, that's quite acceptable. And if you have, for example, 500 millisecond parts, two second GOP size, six second segments, you can achieve low latency at a pretty reliable scale with a reasonable startup time. And it's our assumption at least that we will see a lot of deployments like this.

What we actually recommend is something along these ranges. So don't try to make your segment size too small, don't try to make it too big either. Somewhere between two and six seconds seems reasonable for us. Same with GOP size, you can have smaller GOPs if you have larger segments, but again don't make it too big. And for the parts, really would urge you to avoid going below 500 milliseconds. The benefit just isn't there anymore. That overhead really becomes way too big. And this kind of setups, really can make it work for you.

Pieter-Jan: So, another question that I very frequently get is, where will I be able to use this? So, as as we've already discussed - iOS 14, TVOS 14, and the new Mac OS Big Sur update, they will all support this natively with a release which we anticipate will be in September. But what does that mean for other platforms? And there is some good news here. We actually anticipate that you will be able to use low latency HLS on most platforms.

So, to give you a quick overview, if we look at Android devices, similar devices like Android TV, Fire TV and Smart TVs, as well as streaming devices, low latency HLS can be used. We've actually built a first beta player, which we've been testing quite intensively over the past months. The thing that is unclear for us is if native players, for example, ExoPlayer will implement this. So far, there's not really been any movement. And well, it's relatively unlikely that Google will bet on low latency HLS, it would make a lot more sense if they bet on low latency DASH. But definitely from a player standpoint, Android, that's a check in the box, it can definitely be provided here, and it can be supported there. So that's good.

HTML5, another very important platform of course. I I see this as web browsers but also Tizen, webOS, Vizio. All kinds of other platforms, we don't actually expect native support will be broadly implemented. If I look at HLS versions that are supported on those platforms, they're usually version 3, version 4, extremely old versions. But by using media source extensions, it is actually possible to add support for low latency HLS. This is something that we've done in one of our beta players as well, and we have successfully tested low latency HLS on web browsers, on smart TVs. So definitely a check in the box. And we have seen some encouraging messages on the video dev Slack channels indicating that other players and open-source players will also implement this. There doesn't appear to be a lot of effort that has started yet, but I do anticipate that. Once that September timeframe passes, a lot of other players will also be announcing or starting with availability on low latency HLS for web browsers.

So last platform or last environment that I see as a big question mark actually at today is the Roku platform. It's quite notorious of trying to enforce its own native player. But their current support pages do seem to suggest that they support the RFC from March 2019. So that's not too bad actually, it's lagging three versions behind. But I am mildly optimistic. I don't really see a big urge for Roku not to support it. But on the other hand, of course, they haven't seen to have done any low latency DASH so far either. So, it might still take a while, but I am quite hopeful that Roku will eventually pick up low latency HLS. And I'm pretty sure that otherwise there will be some other approaches that become available to get that low latency HLS stream on Roku as well.

Pieter-Jan: So, we're actually approaching the end of the webinar now. And as I mentioned, there are a few other things that you have to keep in mind:

The low latency specification for HLS is relatively new. Most vendors are not ready to deploy this just yet. And we have tested this with half a dozen of different packages, different CDN combinations and all kinds of different encoding configurations and different settings. And in general, results are extremely good, but there are some question marks still, some issues to resolve before deploying it into production. For example, some basics such as subtitles, but also server-side ad assertion, content protection. They're still more or less virgin territory for a lot of the packaging vendors. Implementations of this are very sparse or virtually non-existent according to at least my knowledge. And so far, every deployment that we've done still seems to have a different combination of preferences in encoding, different content, different use cases. They're still quite unique and it will still take a while I guess before the different vendors in the industry have agreed on how they will really implement this and to make sure that compatibility is there. I mean we're working on it and working with all of the different partners, but it will still take a while of course before the entire ecosystem has the same level of support across the port.

And during our next webinar we will actually discuss some of those challenges and some practical implementations together with a number of partners to see how the latency HLS can be deployed in time for an iOS 14 launch, so definitely stay tuned on that. In the meantime, if you have any questions, really don't hesitate to ask them in our Q&A session in about a minute. And of course, you can always reach out to our team or reach out to me after this webinar and you can of course reach us through the theoplayer.com website.

After this webinar we will also publish our new low latency HLS test page. Basically, the beta player that we have available right now to allow everybody to start testing their low latency HLS streams and start building those new experiences for their viewers.

So, thank you everyone for attending the webinar. I hope you stick around for the Q&A session just after this, and of course, take care and stay safe.

A State-of-the-Industry Webinar: Apple’s LL-HLS is finally here.

Watch a State-of-the-Industry Webinar Recording : Apple’s LL-HLS is finally here

Webinar Transcript

Want to deliver high-quality online video experiences to your viewers, efficiently?

Subscribe to news and product updates