Header_Akamai & Ateme Mobile

Watch the recording of the roundtable discussion between THEO, ATEME & AKAMAI


  • Mickaël Raulet, VP of Innovation at Ateme.
  • Will Law, Chief Architect at Akamai. 
  • Pieter-Jan Speelmans, Founder and CTO at THEO Technologies.

Recorded Live on 19 August 2021






Webinar transcript


Sassan: Greetings and welcome to our panel discussion this afternoon or whenever you might be at this time is at your part of the globe. I am Sassan Pejhan, VP of Technology at ATEME. I'll be moderating this session on a unified and efficient solution for low-vacancy streaming.

OTT streaming has become ubiquitous. There's no data about that anymore and it's not comparable in terms of subscriber numbers to traditional pay TV. And it has several advantages over broadcast as most of you are aware: interactivity, personalization, mobility. But it does also have its shortcomings, or at least there's been some criticisms levelled at it. One of these is that it has a very high end-to-end latency, typically 30 to 60 seconds, which is an issue really for live applications. In the past couple of years, new techniques such as chunk encoding and transfer have emerged, and they promise to reduce this latency to just a few seconds. And this was a topic of our webinar back in May, for those of you who attended or were with us then.

The other criticism that's typically levelled against streaming is created by this dichotomy, if you will, between the fact that there are two dominant protocols. There's DASH and there's HLS. And this has led to duplicate storage and bandwidth costs for service providers. And our speakers today will explain how a unified and efficient solution for low latency streaming is possible thanks to two recent technical advances. One of these is the specification of the common media application format, CMAF for short, which aims to provide a single container format for assets that are to be streamed by either DASH or HLS protocols. And the other is what's known as byte range, or basically requesting media by specifying a set of a range of bytes rather than a media file.

Now, these new techniques impact all the components, all the major components in a typical workflow. And our speakers will look at each of these separately. Our first speaker is Mickaël Raulet. And he will explain how the encoder, the packager, and the origin server operate in this new world of CMAF and byte range. Mickaël is the Chief Technology Officer of ATEME. He received his PhD degree in 2006. He joined Ateme in 2015. His team is leading numerous standardization activities on behalf of Ateme at various bodies and forums, including the ATSC, the VDB, the 3GPP, ITU, MPEG, DASH Industry Forum, and the UHD Forum, and a few more that have left off. He is managing several collaborative R&D projects between the Ateme and strategic partners.

Our second speaker is Will Law, and he will explain to us the interoperability between low latency HLS and low latency DASH, especially as it pertains from the perspective of the CDN. Will is chief architect within the media and carrier division at Akamaii. He's a leading media delivery technologist. He's been involved with streaming media on the internet for the last 18 years. He has a strong focus on client-side development. Will is currently working on low latency streaming, MPEG-DASH, technology evaluation, UHD distribution, CMAF, WebRTC, and WebTransport. He is the chairman of the Consumer Technology Association WAVE project, that's the WAVE Web Application Video Ecosystem, and he's a past president of the DASH Industry Forum. He holds master's degrees in aerospace engineering and in MBA, and he was a recipient of Akamai's Danny Lewin Award, has worked previously for Adobe, Internet, a series of five engineering and media related startups.

Our third speaker, Pieter-Jan Speelmans, will illustrate the behaviour of the player in the context of these latest technology advances. So, Pieter-Jan is the CTO at THEO Technologies, and he is guiding their technical team. He built their first solution, THEOplayer from the ground up. This helped him understand every element in the video stack. Today he is a thought leader in the industry and gets published regularly on thought leadership websites and blogs for his insights in online video technologies.

So, a very warm welcome to all three of you and thank you for being with us today. Just some logistics here: there is a demo at the end of one of our talks and these will be followed by a roundtable and also Q&A from you, the audience. If you will, please use the Q&A button at the bottom of your Zoom application. You can ask your questions at any point during the talk. You don't have to wait necessarily for the end. And we'll answer as many of these questions as we possibly can. Our speakers have said that they will remain even beyond for a lot of time if there are questions that people want to ask.

Before I turn this over to our first speaker, Mickaël, let me just go through a quick poll. And there are two questions here, if you would kindly answer them. The first is with regards to your plans to deploy either Low Latency DASH or Low Latency HLS or both or neither. So, you could vote on those. And the second one has to do with what is your target's end-to-end latency for live OTT-services? One second, two seconds, three to five seconds longer, and et cetera. So, I'll give you a few seconds to go through these questions. And this will be really useful feedback for both me and our speakers.

And okay, we still have lots of answers coming in, so I'll give you a few more seconds until this stabilizes. Okay, we can see some trends already, looks like we have a stable response and let me share these results for you. So, it's very interesting, both low latency DASH and low latency HLS, almost half of you said you will be deploying these. And as far as the latency question, three to five seconds seems to be the winner, but generally below five seconds. Well, I appreciate your feedback, thanks. And without much further ado, I will ask Mickaël to please start his talk. Mickaël, over to you.

Navigating the Landscape of Low Latency Streaming Technologies

Mickaël: Are you seeing the slides? Still have the poll here.

Sassan: Yes, I can see the slides.

Mickaël: So I will speak about, as Sassan was pointing out, on low latency DASH and low latency HLS. And since we have two other speakers, the idea is to speak about the CDN and the player, I will concentrate on the hidden side, so the encoder, the packager, and the origin server. And then I will pass the ball to Will.

So that's the end-to-end ecosystem that we will put in place for the demo that we will have at the end. So first we will get SDI ingest to Titan, that is the encoder package that we will be using, that is a single box here. Then we have an origin server, where we push from the Titan to this particular origin server, and we have two steps of pull, so from the player to the CDN, and the CDN to the origin. So, we push from all and we pull from all low latency HLS and low latency DASH. And we want to have a common format for that.

So, this particular common format that we want to use is the common media application format, that is CMAF. So CMAF is not a SPUR format to deliver OTT. It is a format to containerize the FMP4. So, the idea was to create a joint format in between two main delivery formats, that was the MPEG2-ts and the ISO-based mediafile format. So, ISO-based mediafile format is used by DASH, and MPEG2-ts is used by HLS. So, this particular new format is a restriction of what we got in DASH before, as a FMT4, and they call it CMAF. So, on top of CMAF, then you can have manifest, or playlist, but it's not part of the specification itself. So, the idea is to reduce the cacheability, and we want to do more with low latency HLS and low latency DASH, and to reduce the CDN cost, so also to have a common format, to stay with this particular common format from the very beginning until the end.

So, when do we start this particular specification? So, in 2016, some discussion in between Apple and Microsoft at first. And then it was a fast track at MPEG since we already have, let's say - the ISO-based media file format that was available before, and we also got the DASH specification that was also available at MPEG. So then on top of this, to do a new segmented format like CMAF. So that was a fast track at MPEG because everybody was willing to go into the same direction and it's more restriction of the behaviour of what we can do with DASH plus fMP4. So, in the end it was quite fast in terms of standardization. So, we got the standard published in January 2018.

So, I spoke about DASH, now I will speak about low latency using DASH and CMAF segments. So, there was a spec, a guideline coming from the DASH Industry Forum on low latency that was released in March 27, so this year. Where they were willing to demonstrate the ecosystem that we have some open-source tools that are ready to be demonstrated. So, they use FFmpeg, they use an origin server coming from FF Labs, and DASH.js from the DASH Industry Forum. Based on that then there are a lot of demos that were coming up with various people, but that was quite a good step, and we did some early demonstration with Akamai just after that. and the webinar in May. Then we got a specification from Apple, from Roger Pantos. So, this particular specification is now in revision eight, where we can do low latency HLS using either MPEG2-TS or CMAF segments. So, in this particular case, we will concentrate on CMAF. Where we can do generation of partial segments, what we can call parts. in this particular specification, we can send also the playlist delta updates, so how often we can send the playlist. We can block some requests, so it specifies how to block the request, or how to block the playlist before reloading it on the playback. And then some other important stuff in the end.

Implementing Low Latency Streaming

Mickaël: Now I will concentrate on what we are doing on inside the Ateme, what we are trying to converge with. So, in this particular case, we want to push both the manifest and the playlist and to have a common format. So, I will explain what we what we are doing. So, the idea is to push also in this particular case, a chunk again, a fragment. So, we'll see that this is what we were doing for DASH. And each 2-second segment, we need to await before the encoder can output something. So, we have to await that the full 2-second are available. In case of low latency chunk, when we have a chunk that is less than a fragment, so in this case I will have 4, I can send from the encoder every 500 milliseconds, in this case, the output of the encoder. So, I don't have to wait for the full fragment to be available. And then I can replicate that. So, in case of chunk transfer encoding, so the pull mode is well supported by the HTTP1 specification. So, the origin server can support that when you do pull. But when you do push, you need to have some, you need some advanced feature on the origin server. So, you need to cache the information and to get the information. So, you need a mechanism on top of it when you do push to the origin server and then you can pull from this particular origin server. And we want to use the same mechanism for the push mode than the one we use for the latency DASH. So, then the way that we can do that and the way that it looks like the same that what we are doing for DASH is when we push in byte range mode with HLS. So, I will explain what it is and will explain a little bit more in this presentation.

So, to get backward compatibility in between DASH and low latency DASH, so there are some special keywords to enable that in low latency mode. In this particular case, we use a 500-millisecond low latency chunk. So the availability time of set in this case, and the full fragments is four seconds. So, the availability time of set will be 3.5 seconds. Something that is also quite important when you are in low latency DASH is that you want to be NTP sync because most of the time, probably your PC is not, or your player is not. And this particular case, it can synchronize with an external NTP. and then access to this particular NTP. So, in this case, we used the NTP server from Akamai here, and this is something that was requested by the specification to be sure that the player can play in edge compared to what we are doing with a traditional DASH player.

Here I explain what we are pushing from the encoder - what we are pushing from the encoder is a fragment that is here, v11.m4s. And then you have some part mode before, some byte range mode, I would say, because this is the use case that we are using. So, we are progressing into this byte range mode by starting to zero with a length, and then we go from 134,000 to its particular length and so on. So, we build what we are doing with the chunk transfer encoding when we push with DASH. So exactly the same thing. The latest keyword is X-PRELOAD-HINT, where we are pushing a partial segment to the encoder, and we are starting to push it from the encoder, and this is where we are in the live.

So, do we need HTTP 1.1 or HTTP 2? So, what we are doing right now in the demo in the end, so we use HTTP 1.1 in between the encoder and the packager, HTTP 1.1 in between the origin and the CDN, and then we have various players that can play either HTTP 2 natively, so this is the one from Apple when we use Safari, or we can still use HTTP 1.1.

So, if we go back to, so LL-DASH is a bit more mature, I would say at the moment than LL-HLS, at least on our side, because we have done more testing on this side. So, we have a various player that we have been tested with. We have various CDN that we have been tested with. And then we have also tested with various regions what we are doing when we are doing the push mode. In this particular demo, what we were able to test at the moment is that Ateme is providing the origin, and then Akamai is connecting the CDN to this particular origin. And then we have tested with various players, so THEO, Apple, Safari, and Shaka player.

So now I will concentrate on the origin to the CDN. I will pass the ball to Will just after, but I would say that I will just explain what we are doing in between the, at the origin side, I would say. So, what we send to the, what the pull request can be done from the CDN. So, we push in byte range, so we still have the ability to provide the byte range on the origin. So, what I haven't put here is that we also have the MPD because we push the MPD from the encoder.

On this particular origin, what we do is that we do a translation from byte range to low latency chunk, so the part model, where we can extract some small files for each of the low latency chunk that we are having in the byte range. So, this is something that we do on the fly, principally for the demo and also for testing the two mods. So, then we do some generation of partial segments from byte range to this particular part mod. Delta playlist is the same, blocking of playlist reload will be the same in both. And then the latest one is that we have also some open requests only from byte range following the IETF specification. And this works with the THEOplayer. So now it's time for Will to speak about the CDN.

Maximizing CDN Efficiency with Byte Range Addressing in Low Latency Streaming

Will: Thank you, Mickaël. Okay, thank you, Mickaël. Are my slides showing? Yes. Okay, let me go. So, thank you. Let's get straight into it! So, from a CDN, we care about the cache footprint at the edge. So, let's imagine we have a low latency HLS stream with four second segments and one second parts. What's building on the left is all the individual objects that will appear at our edge cache and over a four second window. Now this window will slide forward. These objects may change in place, but we will always have this count of objects. There's quite a few of them and they're not all similar in size. So, if I want to talk about the cache footprint, I need to scale them. So, the largest one is actually going to be the video segment itself and it's matched by the video parts. This is assuming we have separate video segments, separate video parts. Our audio segments are a lot smaller, just an example I showed here, four megabit per second video, 96 kilobit per second audio. We also have audio parts, and then our playlist updates are actually relatively small. So, this is what our footprint looks like.

The footprint here shown in red is the objects that would be loaded by a standard latency client. In other words, this is an HLS client from last year that doesn't understand low latency, or also a low latency client that's choosing to scrub back into its archive window. The ones on the right would be loaded by a low latency client understanding LL-HLS and playing only at the live edge, but they're duplicates of one another. So, we need twice as much cash space to cater to both these use cases.

Now let's throw DASH in on top of that. So, DASH has less objects that appear at the edge. However, the footprint for the DASH client is very similar to that of the standard latency client. In fact, it's one segment ahead of the standard latency client in terms of what it's pulling. But this is a lot of duplicate use of cache space. What we would really like is to coalesce all these together and have a common cache footprint.

And we can achieve that through the use of CMAF and byte range addressing. So, here's an example of a LL-HLS playlist. It's non, this is a discrete part. I'm not using byte range. However, I could alternatively describe this section of media by pointing at the segment and then saying the byte range that I want. Two types of byte ranges, one where we know the beginning and the end. and the other for the hinted part where we only know the start because the encoder and packager haven't finished actually making that part yet.

So, you can have a conventionally described LL-HLS playlist. A duplicate of it would be the same content described with byte range. You can think of this as just two different languages, English and French, describing the same thing. There are two different ways of describing a flow of media data. Notice that we have ranges where we know the beginning and the end, and then we have this case of the byte range start, where we only know the beginning.

Now what are some of the implications of that open range request? So, what if I actually make a request where I know the start, but I don't know the end? So, if you read through the spec carefully and this, I'm showing v7 but v8 is the latest. Two things are important. The first is that the server must restrain itself from transmitting the bytes of the part unless the part is fully complete. This is to guarantee that the parts are sent at wire speed and not at encode speed and therefore the client can and make a better estimate of the throughput. But the second part I've highlighted in the end is actually interesting, which is if this requested range contains more than one part, then the server must keep enforcing this delivery guarantee. And that's exactly the case with my hinted part, because there's other parts after it. So, what's going to happen is the origin must deliver the hinted part and then pause, wait till the next part is fully ready, and then burst it down the same connection. So the origin begins its open-ended response starting at the offset.

What's curious here is the single request for this hinted part is going to return all the remainder of that segment. So, here's a diagram. It's similar to what Mickaël's showing. So, I'm not going to go into too much details. I'm producing my biggest segments. A conventionally addressed LL-HLS player will request segment one, part one. The origin blocks it until part one is ready. And then it bursts part one down the wire and part one is delivered in less time than its media time because I've got more throughput than I have in code speed. And that process simply continues as I play through.

What does a range requested client do? Well, it's going to request segment one at this time, the origin is still going to block for the duration of the part. And then the origin begins as aggregated response where it bursts the data at exactly the same time as it was doing in the discrete addressing case. Then there's no bytes over the wire, but down the same connection, and then continues to burst subsequent parts. And that's the behaviour. that we can actually serve both the HLS client and the DASH client with, because the DASH client wants exactly the same behaviour from the origin and from the edge server.

Will: So, what does the startup request flow look like? Let's imagine you're a player, you're given this media playlist here. Well, you're going to walk your way back up and you could simply duplicate the behaviour of a discrete addressed player and make individual range requests for each of these objects, the last one being open. That would be seven requests in total. However, there's an optimization here. You could get by with just one request. If you made a request from zero forward, the origin would return all of the data it has. It would burst everything it had up until the start point, and then it would continue to release it at those part boundaries. So, we can substitute seven requests for one request. And that's a great efficiency for us because one of the detriments of HLS is the high number of requests we need to make.

But we've got a problem here, which might not be apparent. Your CDN's a little confused. So, let's imagine the CDN edge. There's a client request for range equals zero against an object whose full size you don't know, but you've already got 100 bytes of it. In other words, it's a media segment that's building. And let's imagine its actual size, which you don't know is a thousand bytes. So, you've got two things you could do. You could wait until you receive an end of file signal from the origin and then return a 200-response code and content length of zero and the entire object. So, there'd be a delay. This is in fact the default behaviour of most CDNs today. Or you could immediately burst back the 100 bytes that you have. and then begin a 200 response and continue delivering the media as you receive it until the end is terminated. This is the behaviour we want for our streaming case, but the Edge server doesn't know if it's in the middle of a streaming case or not. These are both valid use cases.

So how can the client signal to the server the type of behaviour it wants? Turns out there's an RFC for this. Someone thought it through, RFC 8673. And what it says is the client should never make an open-ended range request, if it's expecting an aggregating response from a fixed offset from a CDN. So, what it does instead is it sets a start offset and the last byte position, it substitutes in a very big number. This number is so big that it could never possibly be confused with the actual end range. And it's a hint. Now the convention proposed in this RFC is max safe integer. And this is a signal to the server that it should begin the 206 response instead of waiting and delivering a 200.

Double reminder, this is only needed if the start by position is not zero. If start by position is zero, you can make a conventional request and you'll get either a chunk transfer encoding response on an H1 connection or an aggregating H2 connection.

So now let's re-examine our startup request flow. Imagine we've got a playlist here and importantly this playlist has two independent parts per segment. We want to start with the second one, so we minimize our latency at this offset here. So, what the client would request is it would make a get request starting at that offset. And then it would use this convention of this magic large number here. And the server would say, ah, I know what you mean. I'm giving you a 206 response. And I'm going to deliver bytes from this offset. I'm echoing back this large number. And I'm going to signal that my total content length is not known, so it's star. And this is fully standards compliant at this point. And off we would go.

Now the second case is, say I'm starting at the start of the segment. So, this is a segment which only has one independent part, or it has multiple, but I choose to start at the beginning. In this case, the client doesn't have to use that convention. It can simply make a request for the segment and the server will respond. It knows it's aggregating. It won't return a content link, but you'll still get the valid response. You get a 200 instead of a 206.

So, what does steady state look like? So, I might start off in the middle of one of my segments, but after that, I'm requesting segments at the boundary and I can skip ahead, but you'll notice all I'm doing is actually making complete requests for these segments. So, two important observations here. The first is that a LL-HLS client using byte range addressing only has to make one request per segment duration, not per part duration for each media type. That's a good saving and efficiency. The second observation, it's implicit, not so obvious is you can actually play a byte range addressed LL-HLS stream without making any byte range requests. And in fact, what Pieter-Jan is going to show you later, look carefully at the response codes. They're all 200s, not 206s. So, it's a curious fact.

Let's look at the benefit then, request rate improvements. If I have four second segment, one second part, by using byte range addressing, I can get a 37% reduction in the number of requests my client makes to the server and the origin. Now it still has to update the media playlist at the part target duration. So, it's got to do that to keep its visibility into the stream.

But if I get a four second segment, half a second part, which is what we're going to demo later, I get 43% reduction. And that's a significant reduction. If we've got a million requests coming in for a large-scale live event, having 430,000 less requests against the CDN, is a material savings in the cost of delivery.

A quick note on segment structure. You might have multiple independent parts in your segment, but then choose to make your contiguous segment have a single GOP. because there's some encoding efficiency to having a larger GOP. However, if you do that, you're going to lose the one-to-one correspondence between the parts and the segment. So, you really want your segment to be a pure concatenation of your parts. You want it to look like this and not like that, and then you'll get the efficiency we're describing in this webinar.

So quick summary, combining CMAF media encapsulation with byte range addressing gives us increased cash efficiency at both the origin and at every CDN distribution tier. It gives us a marked decrease in request rate from our LL-HLS clients. It offers interop between standard latency HLS clients, low latency HLS clients, low latency DASH clients, and in fact, standard latency DASH clients as well. And it does for the edge case where we want to start our range-based response in the middle of a segment, we do need an RFC convention to get us past some variance in CDN behaviour. And with that, I'll hand it over to Pieter-Jan to carry on with the player.

Exploring Low Latency Streaming Impact and Accessibility Considerations

Pieter-Jan: Thank you. I will start sharing as well. And if all goes well, you should now be seeing my screen. So, I mean, it's indeed the case, we're somewhere at the end of the spectrum. We're all the way to the right on the player side, which is the big you are here button. And it's really, you can really see this as a little bit the end of the line. I mean, we're very dependent as a player. We really depend on whatever the CDN is delivering to us, whatever integrations we have to pull with all kinds of other systems. And of course, if you're using CMAF with low latency DASH, low latency HLS, all of this still has to work in sync.

So, the question is, what is the impact? And if you look at it, in normal HLS and a normal DASH, it's really simple. You will have a client, request a segment and the server will basically just send it, close the connection again and that's it.

Now, of course, what we've been talking about is doing this for low latency. And to do this in a way that we can indeed keep the cache footprints on the CDNs the same. And what will happen in that case is, basically the request will still be the same, but there will be smaller chunks that are being delivered. over the same connection if you use byte ranges with CMAF. And that's actually a very big improvement. And even though there is some idle period in between, and even though there is some blocking of the playlists that is needed in order to get the data fast and efficiently, it does make a big difference.

And the question is, why does it make a difference? So, imagine a scenario where you have two requests that you need to send, request one and request two, request one being a request for a playlist that's being loaded, request two for actual media data. And in parallel, you're actually generating the actual media data. And what will happen is that those requests will block. And at a certain point in time that data will become available, you'll get the playlist, you'll get the media data, and what does a client have to do? It has to set up basically a new connection to get the new playlist, the new media data, and it has to continue, and it goes on and on. And like Will also showed, you will need a lot of requests for this.

If you compare this to a collapse situation, you don't have to do that. So, if the client, instead of doing individual requests, makes a range request for the media data, it will still need to load the actual playlists, but all of the media data can actually be loaded using that same request. And there will be idle time in between, but that's actually quite okay.

However, if you start looking at it, and if you really take a browser log, and this is quite specific for browsers, you don't have this kind of disadvantages in other native clients or well, in other clients on other platforms. But in browsers, something that you see is, well, you don't have one request, you still get two requests. First, the request with response code 200, which has almost no size, it contains almost no data.

What's the cause of this? And if you start looking at it, it's actually quite interesting. So that first request is actually a cross origin resource sharing request. Your browser needs to have access, it needs to be allowed to actually access the files there. And by default, it can normally just fire a request, get back the headers that it needs, that it needs that show you're allowed to do this. But if you're using range requests, it's no longer dubbed as a simple request. So, what you actually need to do is you have to ask permission upfront. So, you get an extra request checking to see, hey, can I use a range header here? And then of course, the second request will simply contain that range header and it will return that data.

But that's of course one request that you could have avoided as well. And a simple approach, a very simple approach, and Will has already hinted towards it as well, is that you can actually avoid having to use that range header if you need to request that entire file from start to finish. It's again, one less request that you have to do, and it actually works quite well. There are of course possibilities that indeed the CDN, as Will mentioned, won't understand it properly or that there are other issues, but in the end-to-end chain that we've been testing here, it actually works very, very well. And the other advantage is this actually makes the request completely identical to the requests that are being sent by DASH clients. So, there's a massive benefit that you can get here, not just in saving the request, but also in simplifying the flow.

Of course, as I mentioned, there is blocking of playlists. So that does have impact. Also blocking on media segments has impact. Why? If you look at the basic ABR case, well, you need to calculate your network speed. And calculating network speed, it was very simple in the past. You would just look at the amount of bytes transferred, divided by the time that it took, and you have a reasonable bandwidth estimate. Of course, right now, you're getting something like this. So, you get a piece of data, then you have an idle time in which you're waiting, and then you get more data. That's not really ideal, so you can't use that simple method anymore of calculating what your bandwidth is.

And the real approach that you have to take is you have to get a reasonable estimate of that waiting time. And there have been a lot of different approaches that were proposed in the past. doing things like site loading data in parallel, trying to identify those idle periods, trying to identify where the bursts are coming from, and all of those things are great, but you do have to be quite smart about it. You do have to really take care about what you are measuring because if you blow up this algorithm and if you get a wrong estimate, it can really mess up your ABR.

The big advantage, for example, in low latency HLS cases, there is extra metadata. There's extra information in your playlist when parts start, when parts end. All of that data, it really can be used to increase your bandwidth estimates quite significantly. And it's not just those kinds of things that have a big impact. There's actually another thing. Imagine that you have an origin, which is sending out a certain amount of bytes in one goal. For example, let's say it's sending out 30 kilobytes, you have a cache in between that can only send out 20 kilobytes at the same point in time. If you look at something like that, you will actually send the data, but not all data will arrive in the same block at the same point in time. So, in the network, there might actually be a part that is partially sent, then a part of it is held back in the network and then it's being forwarded again. And all of those things while it might seem that the impact is irrelevant, it actually has a massive impact on your network estimates. So as a client, you really have to take those things in account, that re-chunking in the network that has a significant impact.

Pieter-Jan: Another question that I get quite often is, I mean, what about accessibility? What can you do with subtitles? And the average subtitle actually stays on the screen quite long, but If you have a latency of only a few seconds, then having, I mean, you might not even know the end time of your queue. It might be that the presenter all of a sudden goes into a burst of different sentences and you have to pop up new queues. It makes things quite complex.

The best approach that I've seen so far is actually to also use CMAF to do the transport. There are approaches like IMSC-1, which are supported both in HLS in DASH. It basically means that you're wrapping TTML data inside ISO BMFF containers. It's very simple. It keeps that same cache footprint. It's very consistent and it works everywhere. That's great. But it also has some caveats, let's say, on the player side. You will be repeating different queues because you will be wanting to split them up into smaller parts as well. So that can lead to blinking and all kinds of other. let's say unwanted behaviour if you don't really take care about it.

And I mean, all of that is great, but probably the real question that you're asking is, where does this really work? Can I actually use this across all of the platforms? Of course, iOS, iPadOS, most of the Apple devices, we've been hearing low latency HLS. Well, they of course support this. They support this pretty widely. And in parallel, of course, you have Android. There are a lot of great low latency HLS and low latency DASH clients out there available for mobile devices. On laptops, all kinds of desktop devices, of course, I mean, there's a lot, there are a lot of different clients that already support this today, especially low latency DASH support is pretty broad. And the same can even be said about smart TVs. A lot of them are operating on Android. A lot of them are operating on HTML5. So that's a pretty good check in the box as well.

The biggest question that I usually get is, but what about connected devices, for example, Roku? Well, actually, as Will also showed, the legacy latency clients, they will also work with the same cache footprint. They will be a little bit behind, but it is actually a similar footprint as well. And even on those devices, a lot of them have clients for low latency DASH and low latency HLS available already. So, I mean, from my perspective, at least, there is no real reason not to start using this kind of approach. It has benefits all around, and it definitely is a major improvement, well, for the entire ecosystem, if you would ask me.

Now, what I can now show you, if everything works well, we actually did a few recordings earlier this week. First recording that I will show you is actually on the left-hand side, you will see low latency HLS in action. On the right-hand side, you will see low latency DASH in action. You don't have to worry too much about the difference in latency between the two. It's basically a configuration issue. Both are quite equal. But the most important thing is on one hand side, you can see on the top here that the same connection is being reused quite nicely thanks to the use of HTTP2. But also, and more importantly, is that you will see those segments coming in here on the HLS side, and you will see exactly the same segments coming in on the DASH side as well. So, a really nice reuse of exactly the same data for both HLS and MPEG-DASH, which of course is great because that's in the end what we're trying to achieve here to be able to use the same data across all of the different devices that we will be using.

In parallel, we have another demo here, which actually shows a number of different devices. We have a big LG WebOS TV here, running low latency HLS. We have low latency HLS also running on a Mac. And then we have an Android device running low latency DASH and a Safari on the iPad running low latency DASH as well. I'll replay it again. As I mentioned, you don't have to worry too much about the latency being different on one device or the other. It's quite difficult to synchronize them all to start playback at the same point in time, which can impact, of course, how far you are, but you can actually synchronize that on the client side as well if you want. So that's, yeah, actually pretty nice that we can get support on so many different platforms already today.

And with that, I'll send it back to you, Sassan.

Understanding the Role of HTTP2 in Low Latency Streaming

Sassan: Thank you. Thank you, Pieter-Jan. And thank you to all three speakers for really lucid and clear explanations in very compact form. Just a reminder to our audience, feel free to use the Q&A button at the bottom of your Zoom application to send us your questions. We'd like to make this as interactive as possible and get you engaged.

But I'll start off the kind of round table or Q&A session, if you will. And I'll start with Will actually. In all three presentations, I heard the term HTTP2 being used. Mickaël had it, you had it, and so did Pieter-Jan. Help us understand more clearly what is the role played by HTTP2 and how critical it is in this whole ecosystem.

Will: So HTTP2 is a success today HTTP1, it's an improvement on HTTP. The Apple spec, there's a lot of confusion in this, so it's good to clarify. The Apple spec, when it came out in draft form, leveraged a feature of HTTP2, which was server push. And this was later dropped out of the spec and replaced with preload hit, which was a great move by Apple. It's a much simpler approach. It gives the same timing benefit and it's scalable. So, it was a good move forward. But Apple retained the requirement that the client must connect to the nearest server using H2. And the reason for this is other features of H2 that Apple feel it important to leverage. One of them is ping frames to be able to estimate the actual latency between the edge and the client and the other is to set priorities for requests because you're requesting both Media objects and playlist updates down the same request Apple felt those were enough benefit that it wanted to mandate That a valid client would use h2 to connect to the edge now

Do you actually need that? No Pieter-Jan is running players that are connecting with h1 you can happily run a low latency client. It's all about the optimizer. I think undoubtedly, it's good to that have that information but you absolutely need it, no. And what the client, the CDN edge will talk H2 to the client, but it'll convert that back to an H1 request that's going back to the origin. And the Ateme origin in this case will speak both H2 and H1. In the demos we're showing, we were actually just connecting with H1 and the CDN is doing the translation from the chunk transfer encoder of an HTTP 1.1 response into an H2 aggregating response. Cause there's no such thing as transfer encoding, chunk transfer encoding with H2.

Sassan: So, I think both in your response and also in the diagrams that Mickaël showed, there doesn't seem to be much benefit in using H2 between the CDN and the head end, the origin server and so forth. That doesn't seem to be much there for us.

Will: I think there is a benefit. Ultimate CDNs have issue today going forward to origins with H2. That's not as broadly supported as going forward with H1. But I think at the end of the day, having H2 everywhere and eventually, I just saw a question pop up from Eric Hertz on H3. So yeah, should we go to H3 for this in the future? Yes, Quick has benefits in terms of portability. We see higher average throughputs under, on aggregate solutions. I think it's more important from a system level that you support multiple connection types. You can't go out with a solution that says, that can only use this one, otherwise it's not going to work. So, it will work with H1, H2 and H3. I have demo showing H3 connecting to the Akamai edge as well, and then we convert it back to H1, back to the origin. So, all of those are possible.

In general, it's always better to use the later protocol. Each evolution brings additional benefits.

Navigating Ad Insertion Challenges in Low Latency Streaming

Sassan: Mickaël, let me turn to you for a different topic, which comes up a lot about every time I talk to my customers, at least about low latency, the question I often get is, okay, “How is this going to impact ad insertion, dynamic ad insertion, because many of these services rely on ad revenues.” So how does all work with ad insertion? What's the impact there?

Mickaël: So, the, I would say the impact is not so huge, except that you have to do everything with a shorter time. So that's the main issue. But what we are doing with MPEG2.ts with the pre-roll that we are having with the SKITI-45, we can preserve it when we are doing the timeline with the DASH manifest or the timeline with the HLS playlist. So, then you can create that in advance. So, you still have to negotiate with the AD server more quickly at some point. I would say that everything will be done in a low latency fashion at some point as well. So that will be needed if we want to insert it in the ad-ad server, at the server side. And then probably Pieter-Jan will have also some other answer for the client side.

Sassan: Before I switch to Pieter-Jan, the ad decision makers, the third-party servers that make those ad recommendations, are they in your, to your knowledge, you know, up to date to work in these faster timelines? Or do they need to be upgraded for that?

I think Mickaël is on mute.

Mickaël: Sorry, what was the question?

Sassan: So, the ad recommendation engines, these third-party ad recommendation engines, they have to work now in faster time. Are they, have they been upgraded to do that, to use these new technologies?

Mickaël: No, I don't think at first, but if we want to have a smoother playback and everything that is working fine, start going from low latency to a chunking that will not be in low latency mode may be quite difficult at some point, so for the playback. So having something that is more, it's like the same codec, the same audio codec, the same video codec. I would say that everything going the same direction is always better.

Sassan: Pieter-Jan, now your response and the same question, the ad insertion question from your perspective, the player.

Pieter-Jan: I actually think that the server side is probably harder because indeed the decisioning servers need to come up with answers a lot faster. From a client side, the biggest or the most important thing is that the stream contains uniform parts still. So, what you don't want to have is, you don't want to switch to an advertisement and all of a sudden have larger parts or well potentially even shorter because then when you're switching back to main content, it will be bigger again.

So, keeping it more or less uniform is always the best thing to do. Of course, from a player side, you usually don't have much time to switch decoders anyway. So that's not really going to be an impact, but definitely, I mean, if the stream is more uniform, it'll have a positive impact on the buffer. And that's definitely something that you always have to look at, especially in low latency cases. Buffer is one of the main contributors to latency. but it's also one of the main contributors, of course, to stability and playback. And that's equally important. So, they have to be tuned very, very well in order to make sure that you have both low latency and the good viewer experience.

Sassan: And is that true? So, if the ads and the main content are encoded very different resolutions, different bit rates, does that throw off in any way the player behaviour, your bandwidth estimation? Going to the buffer, you mentioned you have a bigger buffer to accommodate for those differences.

Pieter-Jan: It does have impact, of course. As I mentioned, you do need to reset your decoder potentially. Most platforms these days can do smooth transitions in that, and if not, of course, players can anticipate that and can take care of that. Bandwidth-wise for us, at least, it doesn't really matter that much. It used to in the past, but these days we actually measure bandwidths. compared to different origins. So, if you suddenly switch to an origin serving advertisements, we will initially base a bandwidth on the last known estimates that we have, but we will actually take that into account when we're switching servers so that we actually can anticipate if the bandwidth is different, if we then see a next app popping up.


Sassan: I'm looking at some of the questions from our audience. This first one came in pretty early, I think it was before we had actually started his presentation. “What is the use of blocking the playlist?” Will, you may have answered this in your talk, but let's just clarify this again, if you will.

Will: No, but I can answer it. Blocking the playlist is all about optimizing timing. So, you need to know, the player needs to know when media data is available. Now the player, if you tell the player, it's available at a certain time, the player might ask late for it, or it might ask early for it. In most, in either case, if it's late, it's lost latency. If it's early, the origin might give it a 404. So, it's hard to nail it precisely down to the, to the frame. So instead, Apple came up with a solution, which is, hey, Mr. player, just ask early for it all the time. Ask a second for it two seconds before you know it's ready. So, you're safe. We'll block, in other words, we'll hold that response open until the data's available and the fraction of the microsecond that's available, we're going to release it to you. So, you get the data as fast as possible. So, it's actually a very simple and efficient mechanism of minimizing the delay in retrieving segment data. It does put more load onto the origin in terms of intelligence.

And that's a big shift from prior versions of HLS to this latest version. Used to be you could just put an Apache server, file server. and run HLS. You can't do that anymore. You need origin intelligence to make the low latency port work.

Sassan: Well, let's cover that in more detail. All this extra load, how does it work out when you have to scale it to millions of users as you alluded to in your talk? This extra load.

Will: Again. That's the beauty of these solutions. It's cacheable content. So we’ve cached that segment when its at the edge. And we're coalescing forward requests. So, if a million people come to the edge, we'll only get 10 or 100 requests going forward to our cache, and eventually only one request may be going forward to the origin to pull the data. So, if you can cache content, you can scale it. That's how CDNs work. So, it's great to be able to, what chunked encoding does is decouple our latency from our segment duration. And what we're talking about today is taking that decoupling and combining it with a cross format solution. So, we minimize the objects at the end. So, I cache one object, I deliver it to all my clients. And it really, it doesn't matter if there are a million of them. That's HTTP scaling and it's what CDNs do today.

Sassan: Excellent, thank you.

Sassan: Let me go to the next question from our audience, still cost them. And this is for Pieter, I think. “Pieter mentioned synchronization of playback. Our application can tolerate some latency, but sync would give us some advantage. For example, presenting video with and without captioning subtitles in the same room. What mechanism would you recommend for player synchronization?”

Pieter-Jan: Well, there are actually a lot of different mechanisms that are possible. Actually, DASH Industry Forum has also standardized some parts of it or made some recommendations. An approach that you see coming back quite often is speeding up or slowing down playback. It definitely works quite well. If you keep the percentages low enough, then unless if it's audio content and you know the song really well, it will actually not, yeah, it will actually not.

Will: I'm going to help Pieter-Jan out. I happen to have a slide on this. So, this is what he's talking about here.

Pieter-Jan: You can always go ahead if you want, but in-

Will: No, no, go. I have a slide and I thought, and then I have a video after this showing it working. So, finish your explanation and then I'll show the video.

Pieter-Jan: Yes. So, I mean, the playback rate adjustment, as I was saying, is something that is quite common. And the other thing that Will has on his slide here is the external- time source and the common latency target. The external time source, why is it important? Of course, you need to be able to measure your latency. As Mickaël actually mentioned earlier, in DASH, there is the notion of time servers. You cannot trust the client's clock. I mean, you simply cannot. They're way too often that they're just plain wrong. So having a time server and having that external time source is very crucial if you want to start synchronizing.

Both your devices need to have a common ground. Once you have that common ground and once you know where you want to go to, so once you have that latency target, you can start using playback rate adjustments to actually reach that target. In some cases, of course, it might be that difference is way too big so that you want to skip some frames and seek forwards or seek backwards or just hold for a few moments. But in most cases actually doing playback rate is a very good approach and I see that will has this video ready so...

Will: It's not a video, it's just a screenshot. It's showing exactly what Pieter-Jan on was saying. This is three different devices/ They're not talking to each other at all. They simply got their heads downplaying a live stream. They know how to calculate latency. They have a time source they can trust, and they've all been told to play it at a certain distance behind live and we get here, you can see the timestamps to within 60 milliseconds of each other or playback. So, this is, you might call it poor man's sink, but I think it's really practical. You can use it at scale. And it's also eventually consistent because we're doing playback rate adjustment. Your player might start off three seconds from other players, but then they will all coalesce at the same time. And if one of them rebuffers, it falls behind, it pulls itself back to life again.

Sassan: This is great when people are watching the same game and want to interact on social media.

Sassan: The next question, this is really a Mickaël question, “Do we anticipate that the increased bitrate variability with content adaptive encoding will wreak havoc on client bandwidth estimation in low-latency live streaming?” So, I get that question quite a bit every time I talk about CAE. What are your thoughts about that, Mickaël, and your experience?

Mickaël: So, I think it does also some impact on the player, but I think Pieter-Jan might also have some answers to provide. The variation will be quite high in the low latency chunk that we will have, depending on if we have an i-frame that will be part of it or not, and the subsequent, let's say, chunks that we will have behind it.

But in general, at least on the adaptivity of all the, let's say, the ladders that we will have, we'll get exactly the same in all of them. So, they will follow exactly the same rules somehow. With low latency HLS, we provide this information inside the low latency playlist itself. So, we give this information to the player, somehow, with the start and the length of each of the low latency chunks. So, this provides some information on how big are the variations between all the chunks. DASH might be a bit more challenging, but I believe Pieter-Jan has some tricks to handle that because that's something that anyway we already have and that's something that we have seen working with many players right now.

Sassan: Pieter-Jan, do you have any tricks?

Pieter-Jan: There are always a lot of tricks and as I mentioned earlier as well, it's all about taking all the variables into account and VBR definitely has a very big impact. I mean, what you will often see in those cases is, as Mickaël was saying, you will have a burst when you have a very big key frame. And then, I mean, your bandwidth will appear to drop a lot lower if you have a relatively low motion kind of stream. And it's indeed the case that if you have more data, of course, you can measure more accurately. But I mean, most players are doing this by now. I mean, a lot of players are using things like weighted moving averages and those kinds of things. I mean, if you weigh the amount of data that you have or if you use it as a parameter in your estimates, that usually works pretty well to still get a reliable estimate, even if you're using VBR. And I mean, let's be honest, there's no real reason to use a constant bitrate unless if you want that filler data on your network for some legacy reasons.


Sassan: Next question is also for you, Mickaël, I think. So, the question is, “The benefit to a CDN is clear. But I write and understanding the primary benefit to a content creator is a simplification of the encoding workflow.” Is that true?

Mickaël: That's also something that we want to do that in between all of us, I believe, to create also the end-to-end ecosystem. So that was a sort of that said, one quite important topic is CMAF. CMAF was having the C in between the MAF and the Cs for Common Media Application Format. So, the idea of having a single format going throughout the end-to-end ecosystem, and even with some play out that we may have in front of us in the future, so everybody is speaking about CMAF ingest, and CMAF ingest, you can put that at various places, and maybe in the future in front of a transcoder. That's not the case right now, but that can be the case in the future. So having a single object that we can go from the very beginning. till the end, will simplify the end-to-end ecosystem. So, we could use CMAF also with the play out itself, going to the transcode, to the just-in-time packager, to the edge, and to the player. So having a single format going throughout all the end-to-end ecosystem.

Sassan: Yeah. And next question which will impact all three of you actually is “How will low latency affect analytics and reporting? How will the measurement of success be different with low latency?” And I guess you could talk about analytics on the origin server and in the CDN as well as the player. So maybe we take it.

Will: I can answer this because the WAVE project, we care about the entirety of the system. It's going to change and analytics to date doesn't actually collect the latency of different players. It hasn't been affected. So, we've got to start firstly collecting the latency coming back from the player. But then the reminder that low latency is always a trade off with quality and risk. You go too low on your latency; your player can start rebuffering with segmented media. So, from a content provider, you want to assure, you want to assure yourself that your end users are still getting the quality level you want, but at a lower latency. Now, if latency is a function of last mile connectivity, the corollary there is that maybe all your users won't get exactly the same quality.

So, you've got to change the mindset a little bit in terms of I have a latency target, but if my player is smart enough to realize that for this user, it can't meet that target, it needs to fall back from live to a stable place where the QoE that you're measuring is within a sufficient ratio. So, I think the metrics have to start looking at that. Where in my range of my player sitting, are my players smart enough to adjust their target latency to give the end user equivalent QoE metrics?

And we just defined common media client data over at CTA as well. This is the client sending data to CDN and make a loop back. And one of the metrics we're collecting is buffer length. So, attributes like that, we can judge the health of these players, not individual sessions so much, but on the aggregate across the delivery surface and know where we're having problems. So, some, some new changes coming to media and a little bit.

Sassan: Pieter-Jan, on your side, I mean, obviously a huge amount of data is being collected on the player side. anything you would like to add to what will just explained?

Pieter-Jan: Yeah, I mean, something that I also think would be very interesting to start measuring more often. I know some analytic systems do, but some don't, is actually not just the latency, but also how far off you are and how often you are actually rate adjusting. Because if, for example, all of a sudden, your buffer starts running low, then your client might have to make the decision to actually, well. Hold off a little bit more and increase the latency to make sure that you're at the healthy buffer level at every point in time. And of course, I mean, we've seen that with ABR in the past. I mean, clients going up, down, up, down when their ABR algorithms aren't very intelligent. I do expect that we will see some clients having unintelligent synchronization algorithms, which will go too close, too far, too close, too far. And of course, from a viewer perspective, that would be horrible as well. So definitely something.

I mean, in the end, you don't want to measure the individual aspects. You really want to measure the quality of experience for the viewer, because that's where you can make a difference. So, I mean, there's indeed a lot of data available. Latency is definitely one to measure. Buffer health is usually one that's being measured already. but also, things like playback rates, amount of frames that are being skipped, definitely something that's interesting to start monitoring.

Sassan: Another topic that seems to be popular with our audience is that of DRM. “What is the impact of adding DRM to the workflow?” And does it impact low latency? Let's see, shall we start with Mickaël on your side and then we'll go down the floor, lift the flow.

Mickaël: It will impact a bit because you have to do the DRM-ization after producing the first MP4, but looking at the end to end. Let's say latency that we can achieve, nobody will see it in the end. What would be the latency that we add on top of the end-to-end ecosystem? We are speaking on something where we are making chunks that are 500 milliseconds, maybe less. So, for DASH we did less. Apple is recommending higher latency for the low latency chunk. But in the end, you won't see the difference.

I see another question about the latency in the end, so that's end to end. We can achieve five seconds end to end with DASH coming from the SDI to the playback, even less, so we see some demo at less, but in general we put the encoder in the mode that is not the broadcast mode.

So, if we compare what we are doing for the, let's say the broadcast, we are having the broadcast in between two seconds to three seconds. end-to-end with a professional decoder and our broadcast encoder at first using MPEG2TS. If we go to low latency DASH or low latency HLS, we can be at the same numbers going through internet and coming back to the local place. But you are really close to the edge, so you are putting the player in a challenging mode, and everything is in a challenging mode. But in the end, if you add the DRM, you will add not really much more than what we are doing today.

Sassan: OK. I'm assuming the CDN, it's transpired to the CDN. So, you don't care. So that goes to Pieter-Jan on the player side. Do you care?

Pieter-Jan: Of course we care. There are actually two things. One thing. I actually do think depending on which systems that you're using that it could have impact on the CDN, but that's more the discussion, are you using CTR mode or are you using CBCS mode? Because of course, depending on the device that you're delivering for, you might have to use both. If you're using legacy Android devices with older Widevine, then you have to use CTR mode to do the encryption. If you're delivering it to FairPlay devices or newer devices running Widevine and PlayReady, you can actually use CBCS. And I actually think that that's the way forward, the way that we have to go to, which will allow us to still have that unified cache footprint.

But I do agree on the other hand, as a second item with what Mickaël said, that latency wise, you won't see a big difference. We've done tests here; it doesn't really add significant latency. The biggest impact you actually have is that on a client side, you don't have a lot of time anymore to get a DRM license in case you're rotating keys, or in case that you're turning DRM on and off when you're in a server-side ad insertion kind of mode and that does have an impact. If you have a one-second buffer, you're starting to fill that buffer with encrypted data that you don't have a key for, that means that you have one second to get a license with the appropriate decryption keys. And if we're done in the scenario that Will was talking about with a million different viewers, that are a million different hits on your license server that need to be answered within a second. And the longer it takes, the more chances you have that viewers are going to see a stall. So there, it definitely has an impact. Things like a clear lead can reduce that pain slightly, but I mean, in the end, scaling those license servers is actually going to be the biggest challenge. That's at least what I'm anticipating.

Sassan: That's a very good comment about the mixture of content that's protected and not protected with ads. Very good point, thank you, appreciate that.

Sassan: There was a follow-up question on the previous discussion on synchronization where you were showing a screenshot and the question is, “Does DASH or HLS lend itself any better to synchronization? Is there a difference between the two?”

Pieter-Jan: You can go ahead, Will.

Will: Yeah, I would say no. That synchronization mechanism depends on the player's ability to adjust playback rate and to calculate your latency behind live. Both DASH and HLS provide mechanisms for doing that. So no, and you can sync HLS players to DASH players and DASH players to each other and HLS players to each other.

Sassan: Would you concur with that, Pieter-Jan?

Pieter-Jan: I would definitely concur. I do think that DASH has a slight edge here. For the simple reason that there are ways inside the DASH specification that you can actually start specifying how high or how low that rate is allowed to be in order to ensure that the client isn't doing anything wrong. Not speeding up too much or not slowing down too much, depending on the content. So, I do think that there's a slight edge there, but in general, I mean, it doesn't really matter that much.

Will: There's another nuance here. DASH provides some, when you give a timing, when you give timing information in DASH, it's very precise as to whether it's the time stamp as it goes into an encoder, as it comes out of an encoder, as it goes into a packager. Because your encoder might take 500 milliseconds to encode something. Whereas with Apple, program date time is not specified. It's intentionally generic. It's up to the operator to choose whether that PDT represents the time it went into the encoder, out of the encoder, or into the packager. So that's why even we were running these demos earlier, when you might set your DASH Player to two seconds and your LL-HLS to two, but their reference point and wall clock time might not be the same. So, you have to compare those two carefully.

Pieter-Jan: Yeah, you do have to figure out what the setup is for each protocol. But once you have that, and I mean, usually it's static, then you can synchronize all of the clients across the board and run the same latency regardless of the protocol, regardless of the device.

Sassan: A couple more questions. I appreciate you guys staying over time, but I think it's great to engage our audience here. “Is there any impact on end-to-end latency when using HTTP 1.1 for low latency HOS?” You may have touched upon this before, Will, but maybe additional prior.

Will: Well, it's hard to say impact. H2 provides some benefits. You might see aggregate throughput increase slightly in the 10 to 12% range. So that might help you avoid a rebuffer. So, will it be an impact? Yes. In general, if you can use H2, use it if it's available. If you can't use it, it can still be made to work with H.1.1. At the levels, at the latency levels, people indicated were popular three to five seconds. I don't think there's that much material difference. But as we get down to two second and below, then it starts to become more important, and you need to tweak it a little more.

Sassan: Okay. And along those same lines, I think this is another question which I think we may have answered, but perhaps some clarification would be helpful. “What I understand is H2 is mandatory for Apple player to play a little bit to HOS and other players can still use H1.” I don't think you said it was mandatory, but maybe you can clarify again.

Will: It is mandatory if you read the spec, it says you must use H2. So, you can go build a player that uses H1, Apple's not going to do that. Apple, Apple want H2 and they're using H2. Other people can build players that use H1. Now an origin might choose to only support H2. And that's the origin might say, well, that's what the spec says. That's all I'm going to support. But it's, it clearly works with H one. There's much, there's still broader support for H one than H two. So, I would, as a CDN, we support both types, and we convert the H2 to H1 to go back to an origin. And I think most origin builders are going to support both.

Sassan: Well, we had an origin builder here with Mickaël, your origins. Uh, are you supporting both?

Mickaël: We support both on the origin, but I would say the simpler will be the better on our side so having if we can at the moment support one of the two that will be easier for us to manage having multiple connection inside the origin is also making the origin more complex than what we are doing right now. I just want to come back to the previous answer from Will so what we have tested so far with Apple is that all the players that are using AVPlay needs H2 right now. So, if you don't have H2 with AVPlay, it doesn't enable the low latency mode inside it. AVPlay is the player inside Safari. If you want to connect it directly, then you need H2 to enable the low latency. And that I have tested with various implementations, so I can confirm that. Maybe in a new version of iOS we may have a different behaviour, but at least in the latest one it's still the behaviour that we are seeing.

Sassan: Okay. And on the player side Pieter-Jan, you're building players for both platforms presumably. Any thoughts on that?

Pieter-Jan: For us, I mean we treat it as transparent. We've seen that H1 works perfectly fine. So, if it's being served on H1, then we will take that. If it's being served on H2, well, all the better. As Will mentioned, it does allow you to, well, reduce the latency if you are on a higher protocol, but in the end, I mean, the difference isn't too significant based on our tests so far.

Sassan: It seems our audience is really interested in this topic, so I'll continue with one or more two questions if you would bear with me. “Can the server push feature of HTTP to be supported by CDN and seen apps?”

Will: So different answers for different CDNs. I can speak for Akamai. So, we within our media CDN, we don't support server push. We do support server push for our web website platform, but you should read just this last week or two weeks ago Chrome or blink actually gave notification that they're deprecating H2 server push out of Chrome browser. Everybody's getting excited, but their point was, it's really not being used a lot. And part of it could be the having to maintain that forward into H3, which is a lot of work for them. So, I would be very hesitant at this point to start relying on server push as part of your streaming workflow under H2. I think there's very interesting protocols coming up like web transport. which allows unidirectional and bidirectional streams, reliable streams between client and server, as well as unreliable data grants. And those two in combination, you can give you a very efficient streaming structure. So, we might look more towards web transport in the future than we would look towards server push under H2 to implement our streaming. And it's not giving any benefits here compared to what Apple have done with Pre-Him. It would give exactly the same latency, yet it's a lot more complexity.

Pieter-Jan: I even think that Apple commented or that Roger commented at a certain point in time that the results were in some cases even better using preload hints given the infrastructure that is available on the web right now.

Sassan: That's interesting. Thank you both for those titbits of information. I think that's very helpful to our audience. And again, as you guys speak, I think our audience are motivated to ask more questions about this.

Sassan: This last question is, “If I have a transparent proxy player and CDN, can we have a player to proxy on H2 and proxy to CDN on H1.1? Can we have a combination?”

Pieter-Jan: I mean, it's basically the same that Will was explaining. They're proxying H2 to go to H1 in the background towards the origin. I mean, you could have that translation anywhere in the network, I would say.

Will: Just be careful. We're dealing now normally with proxies. You're dealing with objects that are of unknown and discrete size. And as we get into low latency streaming, the defining change is that everything is objects of unknown size. And you'd be surprised even when we go down into the CDN code, how much implicit logic there was and how things were built, assuming that I knew the size of an object when I start serving it, or I start requesting it. And the point is we don't with, with this transfer mode. So, you've got to be careful. Is your proxy correctly proxying an H1 chunk transfer and coding response into an H2 aggregated response as it goes forward? And what happens with things like RFC 8673 support in your transparent proxy?

Sassan: Mickaël this last question would go to you. “How reliable is ABR with low latency DASH and low latency HLS? “

Mickaël: We are doing exactly the same thing that what we were doing before. So, there is no change, except the chunking that we are doing. So, the, it's more the way we are packaging things. But before and we are using exactly the same encoding. So, the, there is no change in what we are doing for the, for the stack. So, so we can get even something better, I believe for the, for the CAE that we were speaking before. So, the content adaptive encoding, it's even better in low latency mode, since the player can react more smoothly to the...

Will: Yeah, the reliability comes on the player side. It's certainly harder to do ABR in low latency than it is to ABR with a 20 second buffer. So, I think whenever a player vendor says they can do low latency, you should ask to see the demo where there's two- or three-bit rates and the player is switching, because that's fundamentally harder to do than a single bit rate, which is the vast majority of low latency demos.

Pieter-Jan: Yeah, that's definitely the case. I mean, it's a little bit related also to the topic that I had. I mean, the bandwidth estimates are significantly more difficult as soon as you have that idle period on your connection. The other thing that is actually very important is, it's what Will was saying, you don't have as big a buffer anymore, so you do need to be able to react very fast. And I mean, there are still players out there that are for example, running requests until the ground basically, so they don't break up requests when they see that their buffer is going down. There's a lot of new logic that had to be built into our player as well to be able to optimize this, but it is definitely possible to get it as reliable as it was at higher latencies. It does require a lot of tuning. A lot of it is also related to sizes of the parts, sizes of the segments, intervals of independent frames, because of course you do need those to be able to switch towards an alternative quality.

Something that I'm actually expecting is that a lot of clients will go back to the old mode first of crashing down to the lowest quality when something seems to be wrong. But I mean, based on what we've seen at least and what we've been implementing, that's not really needed. There are definitely ways to anticipate this. In DASH, it is slightly more difficult, I must say, compared to HLS. In HLS, you do have the independent notations, which is quite useful from a player perspective, because you might in some cases know that you can switch in the middle of a segment instead of having to start at the beginning. Because for example, if you have a six second segment, and you're at second, let's say five, at the point in time when something goes wrong. The question is, do you have a second to continue buffering and switch at that point in time, or do you have to re-download the last five seconds as well at a lower bit rate? And it's that kind of behaviour that you really need to anticipate from a player's side.

Sassan: Thank you, thank you for that insight. And it seems that we have exhausted all of our audience’s questions.

Closing Remarks

Sassan: And again, thank you all for graciously staying over time. 20 minutes belonged over a lot of time. So, appreciate your time and appreciate the audience who stayed with us. And hopefully this was useful to everybody. And thank you again to our panellists and our speakers. And hopefully we'll see you in the not-too-distant future in another webinar and follow up and see where we are with low latency.

And I've just got a question, “Will this be available for colleagues to watch?” Yes, we will make both the slides and the recording available to you. If you've registered, you will receive a link and within the next few days, hopefully we'll have it all ready to watch and download. So, with that, I'll let you answer any other questions from our panellists or speakers. Any last comments?

Pieter-Jan: I would say good evening or good afternoon or good day to everybody!

Sassan: Yeah, wherever you are on the planet. All right, thank you so much. Thank you all for listening. Thank you Will, thank you Mickaël, thank you Pieter-Jan, and thank you the audience. Goodbye.

Back to top


Mickael Raulet_Hexagon


VP of Innovation at Ateme

Pieter-Jan Speelmans - Hexagon-1


Founder and CTO at THEO Technologies

Will Law_Hexagon


Chief Architect at Akamai

Want to deliver high-quality online video experiences to your viewers, efficiently?

We’d love to talk about how we can help you with your video player, low latency live delivery and advertisement needs.