Webinar - VOD - Low Latency Webinar

Webinar transcript

Johan: Welcome to this webinar on low latency, fast startup and fast channel changes. Three critical components for a top-notch user experience combining the best of main screen TV with the interactivity and the device coverage of online viewing.

In this webinar we will explain what causes latency, where we are with the state-of-the-art current generation protocols such as low latency DASH and low latency HLS and how to best use them. And ultimately, give an intro into the future with a preview of HESP, our next generation protocol.

For the people subscribed to this webinar, it's no surprise that online video is booming. We see year after year an increase in online video consumption, both in terms of our views and in terms of online video revenues. We see the number of platforms and devices growing year after year. This calls for online streaming solutions that are scalable and that are deployable on virtually every connected device with a screen. HTTP adaptive streaming is used for that reason. It ensures scalability over the internet and adaptability to variations in the available bandwidth. The use of the HTTP protocol leads to universal access on every device, on every location.

But not only the volume of online video streaming is skyrocketing, also the expectations of viewers are more and more demanding than ever. Who still remembers the time when a thumbnail-sized crappy video was delighting people? Who is still ready to go for a cup of coffee before the video really starts? Today's viewers are no longer satisfied with delays between live events and online viewing. The current 30 to 40 seconds or even a minute in extreme cases are no longer accepted. Significantly more people will watch online if the delay would be reduced. People also expect low zapping times, people will abandon the service if the zapping time is too high. And these expectations simply continue to increase. Today's viewers want the same quality experience they know from mainstream TV. And this on any device. everywhere and with all the interactivity bells and whistles that the internet has to offer.

Johan: Before diving into more details per protocol, let us clarify what we mean by latency. There are several aspects of latency, and as we will see later, in many cases, trade-offs will have to be made. The most obvious definition of latency is the end-to-end latency, also called the glass-to-glass latency or the live latency. That is the latency between the action occurring in real life and the action being viewed on a screen. So, think of an athletics race - let us say 100 meters sprint, the athletes start and finish. The live latency is the delay users face between the real race and the race on the online screen. And for 100-meter race, with traditional streaming protocols, chances are nearly 100% that the athletes already finished the race before the online viewer has seen the start of the race. Clearly this is not what viewers hope for.

In between the live event and the display on the viewer's device, there are several steps, such as the encoder, the network and CDN, and the buffer in the player. The protocol latency focuses on the delay introduced between the output of the encoder and the display on the screen. In this webinar, we will focus on the protocol latency, because that's what THEO is expert in. But there is also a startup latency, or the channel change latency. That's the time that it takes between the moment the user indicates he or she wants to watch a new channel and the moment that this channel actually starts playing. Obviously, people want instantaneous zapping and startup times.

To provide online video at scale, HTTP adaptive protocols such as HLS and MPEG-DASH have been designed. These approaches are built on the capabilities of the Internet. The video is cut into small files, called segments, that can efficiently be transferred over the internet and CDNs. Universal reach is guaranteed through the use of HTTP. The HTTP protocol can obviously be used for streaming video in a browser, but it can also be used on many other devices. Moreover, the HTTP transfer protocol facilitates easy network transfer without hassles of firewalls and other blocking elements, because a client simply requests the subsequent files with subsequent HTTP requests.

By using different files for the video, it's also straightforward to change the bitrate of the video if the network capacity changes. One simply needs to request the same segment, not a lower bitrate. Unfortunately, this approach of segmenting the video in small files is also the source of latency. Each small file must first be encoded. When it's encoded, it can be transferred over the network. Once it's received by the player, it will be put in a buffer. This buffer is typically three segments deep to cope with network fluctuations. So only then the video in the segments will be played out on the client device. In combination with a historical guideline of 10 seconds video segments, all this torrent forward and buffering easily adds up to 40 to 50 seconds latency.

So, coming back to your athletics example, not only for the 100-meter sprint the athletes arrived at the finish before you see the start, but this also counts for the 400-meter.

An obvious solution is to reduce the segment size. Latency with 6 seconds segments is lower than latency with 10 seconds segments. And as can be seen from the drawing, latency with 2 seconds segments is lower than latency with 6 seconds segments. This is because of a very simple reason: the segments are shorter, so we can more rapidly access and display the video package inside the segments. And the buffer, typically measured as several segments, is shorter as well.

There are a number of disadvantages though: first of all, when using very short segments, the encoding has much less options in terms of GOP size and in terms of the needed IDR-frames compared to the much smaller P and B frames. And this results in either, higher bandwidth for the same quality, either, a lower quality for the same bandwidth. Secondly, the smaller the segments, the more frequently the player needs to make requests to the CDN, and the more frequently the playlist that contains all the segments and where to find them must be updated.

The solution is to cut the segments into smaller pieces, without the need of having smaller segments. Low latency DASH or CMAF-CTE uses chunk transfer encoding. Each segment is divided into a series of non-overlapping chunks. And every chunk is independently of one another, generated by the encoder, sent out over the CDN and the network and received by the player. We do no longer have to wait until a segment is completely finished, before the first chunks can be sent out to the player. The first chunks of a segment can already be encoded, sent and received before the following chunks are available. This approach significantly reduces the latency.

The lowest latency can be as small as a few chunks. In practice, we also need to take manifest handling and time synchronization and buffering for network issues into account. But thanks to chunk transfer encoding, we can reduce the latency to multiple seconds instead of the multiple tens of seconds that we had before.

Low latency HLS follows a comparable but different approach. The low latency HLS segments are divided into a number of smaller pieces, much like the CMAF chunks. These smaller pieces of video are called HLS partial segments or in short parts, because each part has a short duration, it can be packaged and published much earlier than its parent segment. The low latency HLS parts are individually addressed. Either as tiny files or as byte ranges in the complete segment. The use of byte ranges allows handling the segment as one file out of which the parts can be separately addressed and requested based on the start position and the length of the part in the segment. Moreover, in combination with the preload hints, this allows to ask for all parts in one request. In contrast, when the parts are packaged as individual files, each part is separately requested by the client. The low latency HLS approach necessitates very frequent manifest updates containing information on the newly created parts, and hence very frequent manifest requests, the use of HTTP to reduce the overhead per request.

Pieter-Jan: Thank you, Johan, that was indeed a good overview. So, we indeed have two options. So, we have low latency DASH and low latency HLS. One early stage, one still in definition. And it allows us to do low latency streaming. Of course, it's not that simple yet. Still all early stage. And it's very important to make sure that you have a player and a packager which are completely compatible with each other. It might seem trivial in this early stage, but it's actually not. What we've seen so far is that with the specifications not being completely clear, everybody still has different interpretations, there's different design choices, and as a result there are occasionally compatibility issues.

One that I actually saw last week was a discussion with a potential customer wanting to use low latency HLS, but the server that they were using was still implementing the H2 push approach, while AuraPlayer is actually using the latest version of the specification, which is the prefetch hint approach. The same we sometimes see with low latency DASH, not with different implementations of the specification, but for example certain recommendations which are not followed, such as using a time server, which has an impact on the latency end-to-end. So, it's actually pretty critical to test this properly and to make sure that you have a player and a packager which are tuned to each other.

For us, of course, what does that mean? We do test out of the box with a lot of different player and packager vendors. If you are listening and you're a player or packager vendor, or at least a packager vendor for us who is not on this list yet or who is not working with us yet, let us know. We're always happy to test and set up a collaboration, not just to cover end-to-end streaming and make sure that end-to-end compatibility is guaranteed, but preferably we also try to know what the parameters are to tune the setup best to get. Basically, the highest user experience and then the best approach for the customers in the end.

Pieter-Jan: So, compatibility is one issue, but there are a number of others. One very clear issue that we're still seeing is of course - ABR. Lower latency means smaller buffers in most cases, because of course buffers is overhead in latency. But the big problem that we see there is squeezing those buffers. Well, those buffers have a purpose, and that purpose is simply put - to ensure continuous streaming. Of course, you don't want to impact your end user experience, so you'd really need to make sure that you have a network management strategy or a strategy to cope with network changes. And we actually see two kinds of network changes:

On one side, we see buffer, buffer need for fluctuations within the network. And of course, on the other hand, there are capacity changes where adaptive bitrate algorithms are needed to tune for the right quality and the right bitrate of the stream to be picked.

Something which has been going around in the industry lately fairly often is - ACTE, a way to do ABR, like adaptive bitrate for chunk transfer encoding. I like it, but I don't like it too much. One of the reasons is when we started testing with it, it became very clear very fast that there are some assumptions which don't always hold. It's not always the case that idle periods only happen in front of a CMAF chunk, idle periods tend to happen as well in between HTTP based chunks, and that's really tricky, because ACTE actually assumes that all the CDNs, all the network clients, for example the browsers, that they don't do re-chunking and that they just pass on every HTTP chunk as it becomes available on the origin. But that's not really the case.

Additionally, ACTE expects all of those chunks to be delivered in a continuous way. But that's also not guaranteed. As re-chunking is happening in the network or somewhere in other clients, this is becoming a problem. And what it actually does is - it's reducing the ability for ACTE to have frequent samples. That's a pity, also, because ACTE tries to average the bandwidth over the last three samples or over the last three chunks. But that also has an impact, again it doesn't allow you to react fast to network changes. So, there's a number of other approaches which are needed there. And on top of that, there is the linear prediction that ACTE's model is doing, but it's not really true for all of the different networks that we're seeing out there. There are bursts in wireless communication. There are varying round trip times, there are network drops when you're switching between networks. And well, they're not really considered in the ACTE prediction models, but they are very common if we look at production use cases today.

So, we've actually been looking over the last, well, it seems years by now to optimize this, and the approach that we've come up with is actually to purely based on the HTTP chunks instead of looking at the CMAF chunks as a larger whole. It also allows us to get more sample points, which allows us to react a lot faster to changes in the network. It doesn't mean that you have to switch bandwidth or switch quality fast, but it is useful to also predict things like how is your buffer going to evolve. It's also not just bandwidth and raw throughput that we're measuring. It's really important to get a full fingerprint of your network. Measure round trip times, measure differences in round trip times, measure differences in bandwidth, because only the average doesn't really give you all the information that you need to get a clear view on your network and capabilities.

And it's not just ABR either. I mean, our ABR, it's not super complex, but it's not super simple either. We actually try to calculate for every quality, for every frame, both in the buffer and outside of the buffer, how long it would take to upgrade that or to get access to that frame so that it can be used for playback, including things like downloading a playlist, downloading a manifest, downloading initializers or maps in HLS, and then checking if we can make sure that our buffer is always full and that we can hit all of those deadlines. That's quite important! It's quite simple as an algorithm or as a concept at least, but it is proving to work pretty well.

And of course it's one thing to do adaptive bitrate, but we also, the other thing that we actually try to do is to make sure that buffer size is managed based on that network fingerprint as well. You don't want to have the situation where you're switching up and down continuously because you're in a fluctuating wireless environment. You want in that case to have a continuous quality, preferably not too high but also not too low, somewhere right in that sweet spot, but also with the appropriate buffers in place to make sure that your buffer doesn't starve. That's just the other tricky thing with ACTE.

People are trying to use ACTE at this point for both HLS and DASH. For DASH, that's what it was designed for, for the chunk transfer encoding part. For low latency HLS, the same algorithms don't really apply. There's a lot of other information, there's preload hints which are still used with chunk transfer encoding, or can still be used with chunk transfer encoding, but there's a lot more bursts, I mean there's all kinds of other information as well which you can reuse such as the duration of the parts, but also for example the knowledge on independent parts, but also there's the other restrictions that you have with blocking and the playlist needs, which you should really take into account.

Pieter-Jan: Other than that, of course, there's device support. In general, it looks pretty okay. There's of course the iOS case where we don't expect low latency DASH to be used in production too much. The restrictions on the App Store are a little bit too painful or well, for most companies, a little bit too risky to really go into production without low latency HLS deployed there.

I think the trickiest thing will actually be smart TVs and some connected devices. On the connected devices and especially on older devices, if you are restricted to use native players or limited players, if they cannot be tuned to support that low latency use case because they can't upgrade the protocol versions that player supports, that's going to be painful. For the smart TVs themselves, most of them are Chromium based at this point, or at least have a web browser that's highly similar. The good thing is that Chromium 43, which is used in webOS 4+ or Tizen since 2017, does have support for Fetch. The bigger problem is there that if you really want to optimize, you would want abortable Fetch to be available as well. but then you're stuck with smart TVs from 2020. So, there will be some tweaking that needs to be done there. It's not fun, but well, it's what we are used to by now anyway.

Pieter-Jan: And then if we look even further, it's not just the platforms - another big challenge is still zapping times. There's a lot of things which have been communicated here in other talks already. I think the biggest thing that people need to be aware of is that zapping times is a trade-off. You can have extremely fast zapping times in low latency environments, because you can get access to those first frames very quickly, but in those cases, you might have higher latency at startup. If you don't want the higher latency at startup, you can perfectly wait until that next segment starts, but in that case, of course, you might have a very high startup time.

Our approach to this is actually to allow our customers to tune it. We have like an auto mode that calculates how long it would take to switch channel based on the current state of the stream. But for us, you can tune what the preference is that you have, and let's be honest, this is going to be a business case decision in the end. But well, if you have six second segments, I really beg you not to wait until the edge of that next segment to pop up, because waiting for six seconds for a stream to start up will be quite painful.

Pieter-Jan: Beyond that, there are two other issues that I quickly wanted to highlight. The first one is subtitles. A lot of people tend to forget that subtitle specifications, which are in place today like WebVTT or TTML, they usually require you to put both the start time and the end time of every queue that needs to be displayed straight in the header, which makes it quite impossible to send that out in a chunked way. Because of course you don't know when that subtitle will end, if it's a queue that needs to be on screen for five seconds, you don't want the buffer of five seconds before you get that subtitle information.

There are ways around it - the things that we are seeing is that it's probably best to just have very short short-lived subtitles. So, you publish for example subtitles with a duration of about a second and then if you need to repeat that subtitle you simply repeat it and send it out again. It does provide some overhead and it does require you to have knowledge in the player to identify the duplication of those subtitles so that you can avoid blinking, so that the subtitle disappears and reappears. It also requires some buffer management for the subtitles specifically, but it's something that we're seeing becoming more and more standard based on observations with the different partners we're working with.

And then another one is of course server-side ad insertion. We have a lot of customers rolling out server-side ad insertion today, especially in combinations with the DRM. It's hard. If you combine it with low latency, it gets even harder. Especially last-minute period switches, you have to make sure that you know that period is coming up, which means for DASH, more frequent refreshes of your manifest. But also, you have to make sure that those ads are being chunked or split into parts with similar settings as your content to avoid starvation of your buffers, which is quite a big pain point as well.

Pieter-Jan: Another thing, a question we often get, can we use one protocol with one DRM across all platforms? The answer is yes and no. Of course you can use one protocol for all platforms, then usually with legacy it's HLS which is being used. If you need DRM, well then, you're in a little bit of trouble. Well, you can still go for HLS but it's a little bit more tricky. With low latency there are some distinct differences between DASH and HLS. We do see that it will become possible to roll out one low latency protocol across all devices, but it will result in a slightly less optimal way compared to having optimized protocols for every platform. A little bit similar to the legacy latency protocols, but we are pretty firm in our belief that you can reuse the same segments and the same chunks or parts across the board, which could reduce significant overhead already.

It will still evolve, we're of course observing that closely, so once there's more information, we will keep you up to date as well.

Pieter-Jan: Another question that pops up is of course “What should my configuration be like?” There's a lot of parameters, a lot of knots and based on what we've seen so far, probably chunk sizes and part sizes for low latency HLS and DASH should be somewhere between the 400 milliseconds and the one second ballpark. If you go lower there will be a lot more overhead. If you go higher you will have, well, not really low latency anymore. 400 milliseconds give you about two seconds of latency, a little bit less depending on the CDN that's in between. If you go to one second, we see like three to four second latencies, which for most of the cases today is still acceptable and of course overhead is a lot smaller. If you're only going for low latency DASH for now, you can actually use smaller chunk sizes. But for low latency HLS, I would really not recommend it.

And then if you look at segment sizes - the six second recommendation that Apple makes is still pretty okay, it mostly impacts your startup time and your startup latency, but you can fine-tune that. It's a little bit of a trade-off as I mentioned earlier. And all in all, it's still pretty okay. You can always go lower, but it depends a little bit on the content that you have and how much additional overhead those extra keyframes will introduce.

If you have any questions on that, just feel free to reach out, and we're always happy to discuss what ideal parameters are for specific use cases as well.

All in all, if we look at it on the low latency part. Keep in mind it's still in motion, it's not something which is very simple, especially not with low latency HLS not being finalized. But my really firm recommendation is - make sure that you test everything properly. Tune your parameters, make sure your platforms that you want to support are supported in the way that you want to support them. And of course, make sure that you have somebody to guide you in this. I mean, there're a lot of great packaging vendors out there. There're a lot of great player vendors out there as well. Make sure that you work with partners who know what they're talking about and make sure to test. That's really my largest advice that I can give, I think.

Johan: Well, this concludes the discussion of low latency DASH and low latency HLS. There will be a Q&A session at the end of this webinar, and there we can dive deeper into your questions on these streaming technologies. However, if you would like to talk specifically on a concrete use case with us, please do not hesitate to book a meeting with one of our experts. After the webinar, we will send you an email with the link to the page where we can book this meeting.

Johan: Now, low latency DASH and low latency HLS are the current generation low latency protocols. Even when the mass deployment of low latency DASH and low latency HLS is just starting, or yet has to start, the next generation protocol already appears at the horizon. In the coming slide, we will give an introduction to HESP - High Efficiency Streaming Protocol, THEO's next generation protocol.

Whereas low latency DASH and low latency HLS still stick to the notion of segments as the basic unit of transfer, next generation protocols follow a more streaming-oriented approach. In contrast with current generation protocols, where video frames are made available in chunks or in parts, next generation protocols deliver a stream of images, transferred over HTTP. Frames are made available for transfer immediately after being produced by the encoder, at frame rate, and this allows for a very low latency.

In contrast with current generation protocols, where the video playback starts at the beginning of a segment, next generation protocols can start playback at any frame. And obviously, this allows for a very fast startup and channel change time.

In the next slide, we will highlight the capabilities of HESP - the High Efficiency Streaming Protocol:

HESP is designed ground up to meet four important criteria. Firstly, HESP offers sub-second latency. This allows for a nearly simultaneous experience of the real event and the online viewing. Moreover, it makes near real-time interactivity possible.

Secondly, HESP offers significant cost reductions in delivery. It allows to reduce the bandwidth needs compared to ultra-low latency current generation protocols and HESP targets to be a single protocol for all devices.

Thirdly, HESP is a very scalable solution. It works with the existing HTTP internet infrastructure, it's compatible with standard encoders and it can be delivered over standard CDNs.

Fourthly, HESP enables instant zapping times and instant seek times and comes with an adaptive bitrate algorithm that can very closely follow the available bandwidth in the network.

Johan: We made a comparison between HESP and low latency DASH. So, we took a camera feed coming from the THEO-office in Belgium, we encoded it into a 720p signal, we transported it to the AWS Cloud in Ireland, there it was packaged and served back to the THEO-office. And then we could measure the latency between the original signal and the received signal in a Chrome browser window.

When we compare HESP to three different LL-DASH flavours - segments of 1, 2 and 6 seconds and chunks of 1, 5 frames, we see that HESP significantly outperforms low latency DASH. HESP outperforms low latency DASH in terms of latency with 7 times smaller delivery delays. HESP outperforms low latency DASH in terms of channel change times with a 20-fold faster zapping times, and HESP outperforms LL-DASH in terms of bandwidth consumption with up to 20% savings. Now, this was just an introduction, we plan to give a webinar dedicated to HESP on May the 13th and then more information and more technical details will be given.

Johan: This sneak preview into next generation protocols concludes our webinar. Thank you for listening to us. We hope this session was interesting for you and that you learned where we are with current generation low latency protocols. What the pitfalls are, and how to put low latency in practice today using low latency DASH and low latency HLS.

We do realize that in this short timeframe we cannot give a lot of detail. If you still have questions, or if you want to know more about low latency in general and the THEOplayer in particular, please stay for the Q&A session, or reach out to us at theoplayer.com for more information or schedule a discussion with one of our experts. You can find the link on this slide. I also hope you got some appetite to learn more about the next-generation HESP protocol. We plan to have a webinar exclusively focused on HESP May 13th.

Webinar on Low Latency

Improving the user experience of online video by cutting down latency and channel change times

Watch the webinar recording where we are unveiling the Future of Streaming: Exploring High Efficiency Streaming Protocol (HESP)

Webinar transcript

Speakers

PIETER-JAN SPEELMANS

JOHAN VOUNCKX

Want to deliver high-quality online video experiences to your viewers, efficiently?

Subscribe to news and product updates