Webinar - VOD - HESP - THEO Technologies

Join us for an exclusive webinar unveiling the revolutionary High Efficiency Streaming Protocol (HESP). Led by industry experts Pieter-Jan and Johan.

Dive into the cutting-edge technology shaping the future of streaming. Discover how HESP transforms latency, scalability, and user experience, and stay tuned for a live demonstration showcasing its seamless synchronization and lightning-fast zapping capabilities. Don't miss your chance to be at the forefront of streaming innovation.

Webinar transcript

Pieter-Jan: Hello and welcome everybody to our webinar on HESP, the High Efficiency Streaming Protocol!

Just first to get started, as you may know we have some time for a Q&A session after this talk, so if you want to submit questions, feel free to submit them through the chat box at any point in time and we will tackle those during the Q&A session.

Pieter-Jan: Why did we start with developing HESP? As most people know already in this webinar, online video is booming. There is more and more demand for streaming services and there are more and more problems to get your content across all of the devices and all of the networks. Especially viewers, their demand, their requirements, they are continuously increasing.

They're more demanding than ever and it's not always acceptable for viewers to have a bad experience. And we see that abandonment of viewers who have a bad experience is increasing incredibly. So, with multiple streaming servers popping up, it is extremely important to keep that user engagement and to make sure that there is no subscriber churn.

If we look at OTT streaming in general, setting up a distribution pipeline is usually a big trade-off. It's not hard to achieve low latency, so you can get real-time interaction like conference call software, or you could also set up a pipeline that allows you to scale over massive networks with pretty good compressions and cost-efficient delivery, like you need for example for an event like the Super Bowl, or you can set up streaming pipelines that allow you to get excellent viewer experience without any stalls, with excellent zapping times.

But setting up a pipeline that combines all of them, well, that's where it gets tricky. And that's not the thing which is simple at all. For example, if you would use something like webRTC or RTMP, you can achieve ultra-low latency. But the problem is that you need active streaming servers, which means that it's more difficult to scale. You also have problems with getting clients to support the protocols across all different platforms. There are higher bandwidth requirements which are usually needed and it all results in ramping up your costs to set up your streaming pipelines quite quickly.

On the other hand, if you look at examples like HLS or MPEG-DASH, like the HTTP-based protocols, they don't have that scaling problem like webRTC and RTMP, but they have as a massive disadvantage the latency, which is extremely high, because files need to be generated, they need to be transported over multiple requests, then they need to get buffered in clients, and it's always an entire file at a time, or today in small chunks, a chunk at a time. But it always impacts latency, or it always impacts viewer experience because all of those things they need to get tuned, and it always stays a trade-off.

Over the last year we've seen a massive movement from the high legacy like the high legacy type of latency with 60 seconds and even more of latency with long segments in HLS and DASH. We've seen that migrate to shorter segments to achieve around 10 second latency, but it's still not good enough. And over the past months then we've seen that more and more people are moving even further with like the chunk transfer and the Apple latency HLS approaches to go to a magnitude of single digit second latencies. But all of those changes which are being done are actually patches, workarounds on the protocols that were not really designed to achieve low latency. And as a result, it comes with massive trade-offs which are just ever increasing when you look at bandwidth usage and user experience and zapping times. And that's for us the issue with this kind of approach.

It's not an ‘or’ that you have to achieve. You really need to go to a protocol that can handle all of those different things at the same time, not just using RTMP to get ultra low latency or to use HLS or MPEG-DASH, which were designed for massive scale. But you need a protocol such as the high efficiency streaming protocol that we designed, HESP in short, that really takes care of all of those different things.

And that's what HESP was actually designed for. It has been designed to bring a broadcast-like experience and scale to OTT, combined with sub-second latency to allow for interactivity and delivering with multiple devices, or at least a massive amount of devices, in a synchronized way at the same time. So, in order to allow this protocol to also scale in a cost-effective way, it not just reduces bandwidth overhead which we see in traditional streaming protocols and which is being added even further with LL-HLS and LL-DASH to achieve a lower latency. It also uses just standard HTTP infrastructure. It allows to scale an existing infrastructure with standard HTTP CDNs, and it can even fit in a stack with standard encoders. So, this basically allows HESP to be implemented in a streaming stack with minimal intervention.

On top of that, HESP has been designed to allow near-instant zapping and seeking, and it provides full adaptive bitrate capabilities, which basically allows users of the protocol to dynamically adapt to various network conditions, basically avoiding any stalls or the spinner riddle experiences that customers have started to dread.

Pieter-Jan: When we compare it to other protocols, because that's of course where it gets interesting, we see that it doesn't just score better in some of the more focused areas like low latency and bandwidth, but we also tuned it to score extremely well in zapping times, scalability, cross-platform availability, so basically the reach of the amount of devices that you can reach with one single protocol, but also of course, the adaptive bitrate capabilities.

And it becomes even more interesting when we look at the numbers. So, when we put this in numbers, HESP can actually achieve up to seven times less delivery delay compared to, for example, low latency DASH using chunk transfer encoding.

Zapping times is even more of a difference because it reduces zapping times by about 20 times. It actually brings zapping times down to good old analog TV zapping experiences, which is a massive difference from OTT zapping. And this is not just for zapping, but this is also the case, for example, when you are seeking. So even when you are seeking, you get instant continuation of playback. Instantly, no spinners, no spending time. watching at silly little circles turning around.

And of course, another very important one is bandwidth savings. HESP actually eliminates all of the overhead or as much as the overhead as possible compared to other streaming protocols where it can allow up to 20% of bandwidth savings, compared to for example low latency DASH and this can have a massive impact on your cost structure when delivering media to your viewers.

On top of that, of course, we are a player company, so when we designed HESP, players were very important for us as well. Where most of the adaptive streaming protocols, the HTTP-based ones, have a pretty good coverage on web and mobile, and are doing okay on smart TVs and streaming devices, HESP has also been designed to not just bring these experiences to standard environments, but to also bring it to mass distribution environments and to really bring it to the big screen. To make it available on set-up boxes with set-up box clients across different stacks and to make sure that every platform can basically run HESP in the end or an HESP enabled client.

Johan: Thank you, Pieter-Jan. You probably all wonder how this is possible. Well, the key of HESP is to use existing technologies and to combine them in a very clever way. And we focus on simplicity. We focus on the essential to do what streaming video is all about: bringing the video as efficiently and as scalable as possible to the viewers.

To start, HESP uses a manifest. An HESP manifest contains the minimal information on the video, such as where it can be found, and what qualities there are, and that's about it. Now, that information does not change often. Typically, only when a new piece of video, such as an advertisement, arrives, or when a new quality or a new audio track is exposed. That also means that you do not need to fetch the manifest just before starting the playback, just to get the exact information on the latest segment updates.

HESP is delivered over HTTP 1.1. it uses chunk transfer encoding at a very small granularity to allow a very low latency. It uses byte range requests because byte range requests allow you to start at a given position in the video and not just at the start of a segment. And that's obviously very beneficial to reduce the startup latency. Now, alternatively, also HTTP2 frame-based streaming could be used.

HESP relies on two complementary streams to achieve its astonishing results. Our first stream, the initialization stream, contains the keyframes of all the images. Now, this stream is not regularly used. It's only used when we start a new stream. At that moment, we request the most recent image that's available in the initialization stream, or another image if you want to start at a specific location. And because the initialization stream's images are keyframes, playback can start at once. Now, keyframes are expensive in terms of bandwidth. So, we do not want to continue playing out the following images from the initialization stream. And that's when the continuation stream kicks in. After the keyframe of the initialization stream, images are requested from the continuation stream. And that's a stream that's regularly encoded for low latency purposes. And request the images from this continuation stream starting at exactly the right location by using a byte range request.

Now let us visualize this: assume we have an initialization stream with frames capital A1, capital B1, and so on. At some point in time, a user wants to start watching video. So, the player will then request the most recent initialization frame, let us say capital C1. Next, the player will automatically request the following images from the corresponding continuation stream at exactly the right location, so small d1. And then it's followed by small e1, small f1, and so on. And it's the protocol details of HESP that will define how we exactly find the small d1 and e1 after having received capital C1. Now assume that the user wants to change the channel. The player then requests a new initialization frame, it requests the most recent initialization frame of the second video, let us say capital G1. And after that, the continuation stream takes over with small h2 i2 and so on.

To implement the HESP protocol, some components in the video delivery chain need to be modified. Now, we can still rely on standard contribution feeds to the encoder, nothing changes there. We can still use regular encoders, however, with a specific configuration tuned for low latency in general and HESP in particular. The packager obviously needs to support the HESP protocol and the HESP feeds are then distributed over a regular CDN. Now this CDN should just support some transfer encoding such as for low latency DASH and byte ranges such as for low latency HLS. And then finally, obviously the player needs to be HESP compliant as well. Now, while THEO provides a player, we do not commercialize packagers. For that, we are working together with packaging vendors such as Synamedia.

HESP will come in two flavours, depending on the exact needs of the users. Firstly, we'll have a profile that's highly optimized, highly optimized to achieve the lowest latency, the lowest bandwidth, the lowest zapping times. And this profile works with long CMAF-CTE segments, minutes let's say, and with ultra short chunks. One chunk is one frame. It only uses P and I frames for the continuation stream and each P/I frame should only reference one previous frame.

Secondly, we have a profile that's optimized for maximum compatibility. The target here is to reuse low latency dash and low latency HLS streams. Obviously, in that case, we will not have bandwidth savings, but we can still benefit from significant improvements in latency, zapping times and startup times. Now in this profile, the continuation stream is a regular CMAF-CTE stream with segments typically around 5 to 6 seconds and with chunk sizes of a few frames, so let us say 200 ms. In the compatibility profile, we also support P frames. In this case, a chunk will correspond to a sub-GOP.

Johan: We started this explanation by a statement that the manifest file is only occasionally updated. And more specifically, indicate the arrival of a new piece of video. We will now explain this in more detail.

We can think of the video as the complete program offered to viewers. So, we'll have to say a sports game. The complete video of a sports game is made up of several parts. A first quarter, then an advertisement insert, then a second quarter, then a pause with some commentary, then a third quarter and so on. These parts or presentations are the lowest granularity in the manifest. The sequence of presentations is all that the manifest gives as information to the player.

Of course, a presentation can be split in several segments. The player will then automatically request the segments in a presentation. And a segment addressing happens automatically within a presentation because we want to have an efficient and continuous delivery of the continuation stream. The identification of the segments monotonically increases.

Now, since the manifest is not regularly fetched by the player, we obviously need an additional mechanism to inform the player that the manifest file changed. And we do that by inserting a marker in the continuation stream. The marker itself does not contain manifest information. It simply triggers the player to download the manifest file again. And that new manifest file then contains information on the new presentation that will come. So let us say - an advertisement insert. At the right moment, the player will request a new initialization packet corresponding to this new presentation. And then it will request the frames from the continuation stream.

An example use case for these markers is to insert advertisements. The ad server will then be the driver to modify the manifest and to insert a marker.

Pieter-Jan: So, in contrast to most currently used streaming protocols, HESP has actually been designed with all of the common use cases that operators are using today in mind.

On the content protection side, it has been designed to take care of encryption, but also DRM support. They're an intrinsic part of the specification and user experience has been optimized even when those usually limiting factors with streaming protocols are being used.

The same can even be said for metadata transport ranging from subtitles, but also keeping in mind other media and other metadata to enable rich and interactive viewer experiences. The metadata also allows capabilities such as server-side ad insertion to transport metadata for tracking, but also foundations have been led to allow things for stitching and personalized content, blackouts, those kind of interactive use cases which could have a requirement to be played with or without DRM, with or without metadata or other accompanying information. And this is in strong contrast to most other streaming protocols today where handling those use cases becomes a big headache for everybody involved.

And of course, it has already been mentioned a few times, something which we find extremely important is - making sure that you don't just reduce the buffers for your streaming protocol, but that you can make sure that all of those spinners stay avoided, so you have no buffering. And in order to do that, HESP has been designed and additional ABR algorithms have been designed to make sure that we can always identify network capabilities extremely well for HESP, which basically allows us to avoid those buffering instances. And with HESP network sensing capabilities, we can actually identify network drops within a few dozens of milliseconds to make sure that we can make the right decision at any time so that those stalls can be avoided, or that the quality for a viewer can be increased efficiently immediately after additional network capabilities become available.

Johan: Let us start with displaying these three webcams together.

One camera feed is shown in a large window, that's the main camera you want to look at. The other cameras are displayed in smaller windows. And if you already can, simply by clicking on the smaller video windows, select one of the camera feeds for display in the main window. So, if we click on a smaller video window, we see that the corresponding camera feed is now displayed in the larger video window. Now imagine that these videos are different camera feeds coming from the same sports event. In that case, you want the camera feeds to be in sync. HESP luckily supports synchronized playback. We show this by having the cameras pointing at the same clock. And we can then see the clock time values displayed in the three windows. And if there is any difference, we could measure and see the difference between the time values in the three windows. Now, so obviously it's hard to see, so, if we freeze the image, we can see that the playback in the three windows is actually nicely in sync.

The following demonstration proves the high zapping speed of HESP. We only display one camera feed at a time, and by clicking on the remote pointer, just like remote control would do, we can toggle between the three different camera feeds. So, you see camera 1, camera 2, camera 3. And we measure the zapping time by measuring the time it takes between the request to change to a different video channel and the moment that the first image of this new video feed is sent to the display. And so, you see that we end up with zapping times, order of magnitude, 50 milliseconds, 55, 60 here, 50 again, 52. So always instantaneous zapping because this 50-ish milliseconds is hardly perceivable by the human eye.

The end-to-end latency is measured by displaying a reference clock. Then we film this reference clock, and then we display the filmed clock. And then the latency can then simply be derived by calculating the time difference between the time when the images from the reference clock are captured and the time that these images are displayed back in the video player. Now, in this case, the clock is simply displayed in the web page. We capture the web page by the webcam, and we display the image back. into the player in the web page. Obviously, like this, it's very hard to see the difference, but we made the screenshot to freeze the image. So here you can see the difference between the time when an image was captured by the webcam and the time that exactly the same image is displayed on the screen. So, you can see here that we are at 370 milliseconds. Now in general, with this setup, we'll be somewhere between 300, and 400 milliseconds glass-to-glass latency.

So, to wrap up these demonstrations, HESP offers sub-second latencies, instantaneous channel change & startup times, synchronized viewing, and all of that at scale over existing infrastructures, and with bandwidth savings as an additional bonus.

We anticipate that the first HESP certified solutions will be commercially available in Q4 this year. So please look for announcements at IBC. And of course, products will come with DRM support, metadata, server-side ad insertion, advanced ABR, and more.

Pieter-Jan: We anticipate that HESP, the first fully HESP certified solutions will become commercially available later this year in the fourth quarter. So definitely look out for announcements around IBC. Products will become available with the DRM support, metadata, server-side ad insertion, advanced ABR, and all of the other capabilities that we've seen and that we've discussed during this seminar.

So, it only leaves me with a few more things to mention. First of all, thank you all for listening. Thank you for sticking around until the end. More information will be available soon. We are planning to make gradually more and more information on HESP available. But if you have any questions, as I mentioned at the beginning of this talk, feel free to submit them through the chat box for the upcoming Q&A session. Or don't hesitate to reach out to our experts for a more in-depth examination of the use cases that you see. So, thank you all and take care.

Pieter-Jan Speelmans - Hexagon-1

PIETER-JAN SPEELMANS

Founder & CTO at THEO Technologies

Pieter-Jan is the Founder and the head of the technical team at THEO Technologies. He is the brain behind THEOplayer, HESP and EMSS. With a mission to ‘Make Streaming Video Better Than Broadcast’, he is innovating the way video is delivered online from playback all the way to ultra-low latency streaming. Pieter-Jan is committed to enable media companies to easily offer exceptional video experiences across any device.

Johan Vounckx_Hex-01

JOHAN VOUNCKX

VP of Innovation at THEO Technologies

Johan Vounckx is VP of Innovation for Theo Technologies. He works on inventive methods to improve the delivery of streaming video. Prior to joining THEO, he worked for major players in the video and broadcast industry, such as EVS Broadcast Equipment and Telenet (Liberty Global Group). Dr Vounckx received an MSc and a PhD from the University of Leuven, Belgium.

HESP

Ultra Low Latency, Ultra Fast Zapping, Low Bandwith at Scale

Unlock the Future of Streaming: Introducing HESP Protocol Webinar

Webinar transcript

Speakers

PIETER-JAN SPEELMANS

JOHAN VOUNCKX

Want to deliver high-quality online video experiences to your viewers, efficiently?

Subscribe to news and product updates