Share this
Operating HESP with low encoding costs
by Pieter-Jan Speelmans on October 6, 2021
The High Efficiency Streaming Protocol (HESP) comes with a lot of advantages. It allows for sub second latency over standard HTTP CDNs (and the cost to scale benefit they bring), with unrivalled channel change times. The high QoE HESP delivers is the result of it’s main difference with alternative protocols: the use of Initialization Streams, which are streams containing IDR frames at a very high rate. By using an IDR from these Initialization Streams to kick-start the so-called Continuation Streams (which is similar to your run-of-the-mill HLS or MPEG-DASH segments) starting playback of a new track can happen in a few 100ms, which in turn enables fast channel change and allows to reduce buffer sizes and latency dramatically. Initialization Streams do come at a cost: even though they are not streamed to end viewers, they need to be encoded. In this article, we’ll dive into the actual impact on encoding of HESP compared to normal (LL-)HLS and (low latency) MPEG-DASH encoding.
Comparing HESP from a cost perspective
The HESP specification has been published through the IETF since earlier this summer. Ever since, we’ve been seeing a significant increase in interest from the industry. Arguments most often heard are the need to increase the overall quality of experience and ever growing viewer expectations. It’s an arms race where those services strive for snappy startup times, high quality viewing and low latency delivery (we all want to see it first) to grow and maintain their footprint within the troops of streaming services popping up left and right. As in every arms race, keeping your cost in check is a crucial aspect in order to keep up. Even though we would all want it, budgets are not infinite.
When comparing HESP with other streaming protocols, we see several advantages from a cost perspective. Streaming protocols which usually target sub second latencies such as WebRTC are often session based. This mostly results in stateful streaming servers being needed. As many experienced through scaling our RTMP servers in the past, this can be very painful and costly. This pain and cost (and the fact RTMP’s UDP is often blocked by firewalls) is for a large part why HTTP based streaming protocols such as HLS and MPEG-DASH took over: they are stateless which allows them to scale over standard HTTP CDNs. HESP can also benefit from its HTTP based transport approach to scale over those same CDNs. So far so good.
Most HTTP based low latency protocols however have another disadvantage: GOP sizes dramatically impact the end-to-end latency which can be achieved for low latency HLS and MPEG-DASH profiles. Reducing the GOP size has an impact on your compression efficiency. While the type of content greatly influences the size of this decrease, the huge volumes of viewing minutes popular services see and the associated bandwidth cost force us to monitor even small percentages. Here HESP brings another advantage as its latency is not linked to its GOP size: (the often large) independent frames can be placed where it makes sense for the content, and no set intervals are needed, allowing for a highly optimal quality-to-bit ratio. In some tests, bandwidth savings costs of about 20% become possible compared to HLS or MPEG-DASH. A percentage which is very significant.
There is however another side to that medal. The reason why HESP is able to decouple its GOP size from its latency is the existence of Initialization Streams. For every quality at which the content is presented, there are two feeds which need to be generated. In a worst case scenario, simple logic would dictate this in practice doubles the encoding cost. With 4k encoders being expensive and plenty of services seeing massive usage on some live channels, but only select audiences on others, the question is if this additional encoding cost is worth it: will the cost encoding of using HESP balance out with the cost saved on delivery scaling and bandwidth and be worth the increase in QoE?
Why are these Initialization Streams needed?
Encoding HESP Initialization Streams comes at a cost. The question of course is what the size of this cost is, and if we can influence it. In order to do this, it is important to understand why exactly these streams are needed.
Initialization Streams offer the ability to start decoding content fast. They contain independent frames, which in normal streams are available only at the start of a GOP, meaning once every 2-10 seconds depending on content and configuration. As a (video) decoder can only start playback at one of these independent frames, one such frame is needed to start playback. This is important for stream startup, but also when seeking or when switching through alternative qualities using ABR. For HLS and MPEG-DASH (but the same can be said for other protocols such as WebRTC as well), a client has two choices: either
- wait for the next independent frame to appear in the stream, or
- download the last known independent frame and start playback from there.
In the first scenario, this will mean that startup time will be impacted. Even for a small GOP size of two seconds, this means an average increase in startup time of one second, and a worst case scenario where two additional seconds are added to the startup time. This is usually not an approach one wants to make given the importance of startup time in viewer churn, especially not if your GOP size is larger than two seconds (just imagine the viewer churn if all viewers would have to wait for 5 additional seconds before playback starts!).
In the second scenario, it would mean you will increase your latency. Again, for a small GOP of two seconds you will see an average latency increase of one second, and a worst case scenario where the latency increases with another two seconds. While that does not seem bad at all in a world where average latencies with HLS and MPEG-DASH streams are north of 20 seconds, this is bad in case you want to reduce latency: every second counts. In the case of HESP which offers sub second latencies, this would mean the latency would increase by 250-500%.
Thanks to the Initialization Streams however, a player can at any point in time retrieve an independent frame, inject this in the decoder, and continue playback from the “normal” Continuation Stream. The advantage of this is not to be underestimated:
- Upon the start of playback, we can start at any point in time, at the right latency.
- When facing network issues, we can rapidly switch to a lower bitrate stream, reducing the need of large buffers which need to account for startup time of a new stream. This in turn has an impact on the latency at which we can stream.
- In case of buffer underruns, we can recover quickly by restarting playback in a fraction of a second, reducing possible stall durations.
- For viewers wanting to select a viewing angle, we can instantly switch to the other feed and provide an optimal viewer experience, without any delay or black frame insertion.
- Upon server issues, failover to alternative CDNs can be executed fastly and efficiently, minimising (and even eliminating altogether) downtime from a viewer perspective.
- When the network quality increases and throughput goes up, we can very quickly switch towards a higher bitrate stream and provide a better quality towards our viewers.
For the last scenario, one could argue that switching up does not have to be instantaneous (but it should not take extremely long either). As a result, when we look at commonalities between these cases, we see that in an ideal scenario we can swiftly switch towards:
- Our default quality at which we want the player to start playback, covering cases 1 and 4.
- A bitrate lower than the current playback bitrate (but preferably not just the lowest) which allows us to cover cases 2, 3 and 5.
One could argue that as a result, we need Initialization Streams with the full frame rate for only a select number of bit rates offered in the ladder, and that for others no Initialization Streams, or Initialization Streams with less independent frames, would suffice. In the next section, we’ll explore exactly this, as well as the impact on user experience and cost.
Tweaking your Initialization Streams
It is not a requirement that every quality in your bitrate ladder contains an Initialization Stream at the full frame rate of the Continuation Stream. A frame of the Initialization Stream is needed to do fast startup. If only a sparse initialization feed is available, startup will be slower. As a result one could argue that in an ideal scenario the full frame rate is required. However, not every stream/quality necessarily needs to start up fast. For example:
- in case the network goes down, it can be acceptable to wait for a few moments before switching to a higher bandwidth,
- when a viewer initiates a viewing session, you do not know the available network bandwidth and hence do not need every quality, but might want to pick a solid default where you want to start.
As it seems, most often we need to be able to start fast on lower bandwidth streams. This is in contrast with the costs to encode these Initialization Streams. Higher bitrate encodes are often more expensive compared to lower bandwidth encodes. As a result, considering not to encode the Initialization Stream for the top quality (or only at a fraction of the frame rate), can have a big impact on the encoding cost. The top quality stream is usually not required for fast startup in common cases: the available bandwidth will be unknown and starting up at too high a bandwidth will take an unneeded long time, and be followed by a drop in quality. As a result, excluding it from the encoding setup (and reusing IDR frames from the Continuation Stream to create a sparse Initialization Stream) makes a lot of sense from a cost perspective.
There are a number of options available:
- Generate a full ladder of Initialization Streams.
- Generating only the lowest bitrate Initialization Stream to allow for rapid switching down and low latency playback.
- Generating a lowest bitrate Initialization Stream and (one or more) sensible default bandwidth Initialization Streams to allow for low latency playback and fast startup on those default bandwidths.
- Generating the initialization feeds of (2) or (3) and sparse Initialization Streams for other bandwidths with a reduced number of frames per second to allow for faster switching towards those bandwidths.
Based on our testing, we can see that the difference in terms of encoding capacity needed between these options is rather significant. In order to validate this, our team set up a test ABR ladder ranging from 360p to 1080p at 30fps, with:
- 1080p30@4000kbps
- 720p30@2500kbps
- 540p30@1000kbps
- 360p30@400kbps
We measured general CPU load across a number of different scenarios where our encoder would take in one live RTMP feed and produce new RTMP feeds for the initialization and Continuation Streams towards an HESP packager. We set up a comparison where no Initialization Streams would be generated, which is the case for other streaming protocols like HLS and MPEG-DASH as well.
What we see is that generation of all Initialization Streams increases the encoding needs by almost 70%. In contrast, the generation of just a low bandwidth stream reduces this increase significantly and we see an increase of about 3%.
Initialization Stream setup |
Encoding CPU usage |
Impact |
|||
1080p |
720p |
540p |
360p |
||
/ |
/ |
/ |
/ |
100.0% |
High latency, high startup times HLS & MPEG-DASH reference |
30fps |
30fps |
30fps |
30fps |
168.9% |
Low latency, low startup times, maximal additional encoding |
/ |
30fps |
30fps |
30fps |
130.7% |
Low latency, higher ABR up switching time to 1080p (one GOP size) |
/ |
/ |
30fps |
30fps |
115.2% |
Low latency, higher ABR up switching time to 1080p and 720p, ABR must switch down to 540p or 360p |
/ |
/ |
/ |
30fps |
103.1% |
Low latency, higher ABR up switching time, ABR must switch down to 360p |
10fps |
10fps |
10fps |
30fps |
127.8% |
Low latency, +100ms ABR up switching time |
5fps |
5fps |
5fps |
30fps |
115.2% |
Low latency, +200ms ABR up switching time |
2fps |
2fps |
2fps |
30fps |
106.1% |
Low latency, +500ms ABR up switching time |
1fps |
1fps |
1fps |
30fps |
103.4% |
Low latency, +1000ms ABR up switching time, ABR must likely switch down to 360p |
1fps |
30fps |
1fps |
30fps |
123.4% |
Low latency, fast up switching & startup for 720p, +1000ms for other qualities |
1fps |
1fps |
30fps |
30fps |
114.8% |
Low latency, fast up switching & startup for 540p, +1000ms for other qualities |
Table 1.Comparison of encoding usage across different Initialization Stream configurations
Especially interesting is when we look at the scenarios where full Initialization Streams are present for some streams, and sparse Initialization Streams are generated for others: here we see an increase of only 15 to 23%. Based on tests, we would recommend profiles such as the 1/1/30/30 profile, or combinations such as 1/5/30/30 where we expect the total increase in encoding cost to be around 18-20%. When compared with generic sparse frame generation for all but the lowest bandwidth, these numbers start looking very good, with an increase of only 3% while impact on switching up remains at a minimum. While in these scenarios the cost for encoding would still go up, this is an increase which should be easily compensated by HESP’s benefits in cost to scale, improvements in GOP size (and cost reduction in egress traffic), and rise in QoE.
As a conclusion we can see that while generating a full set of Initialization Streams, it is interesting to look at specific requirements to trim down on the number of Initialization Streams, or generate sparse Initialization Streams. Tweaking these parameters can be crucial for services which operate large numbers of streams with limited numbers of viewers per stream and will allow you to easily keep your costs in check. If you have any questions on this, don’t hesitate to reach out to our team!
Questions about HESP? Contact us today!
Share this
- THEOplayer (46)
- online streaming (40)
- live streaming (35)
- low latency (32)
- video streaming (32)
- HESP (24)
- HLS (21)
- new features (21)
- THEO Technologies (20)
- SDK (19)
- THEOlive (17)
- best video player (17)
- cross-platform (16)
- html5 player (16)
- LL-HLS (15)
- online video (15)
- SmartTV (12)
- delivering content (12)
- MPEG-DASH (11)
- Tizen (11)
- latency (11)
- partnership (11)
- Samsung (10)
- awards (10)
- content monetisation (10)
- innovation (10)
- Big Screen (9)
- CDN (9)
- High Efficiency Streaming Protocol (9)
- fast zapping (9)
- video codec (9)
- SSAI (8)
- Ultra Low Latency (8)
- WebOS (8)
- advertising (8)
- viewers expercience (8)
- "content delivery" (7)
- Adobe flash (7)
- LG (7)
- Online Advertising (7)
- Streaming Media Readers' Choice Awards (7)
- html5 (7)
- low bandwidth (7)
- Apple (6)
- CMAF (6)
- Efficiency (6)
- Events (6)
- drm (6)
- interactive video (6)
- sports streaming (6)
- video content (6)
- viewer experience (6)
- ABR (5)
- Bandwidth Usage (5)
- Deloitte (5)
- HTTP (5)
- ad revenue (5)
- adaptive bitrate (5)
- nomination (5)
- reduce buffering (5)
- release (5)
- roku (5)
- sports betting (5)
- video monetization (5)
- AV1 (4)
- DVR (4)
- Encoding (4)
- THEO Technologies Partner Success Team (4)
- Update (4)
- case study (4)
- client-side ad insertion (4)
- content encryption (4)
- content protection (4)
- fast 50 (4)
- google (4)
- monetization (4)
- nab show (4)
- streaming media west (4)
- support matrix (4)
- AES-128 (3)
- Chrome (3)
- Cost Efficient (3)
- H.265 (3)
- HESP Alliance (3)
- HEVC (3)
- IBC (3)
- IBC trade show (3)
- React Native SDK (3)
- THEOplayer Partner Success Team (3)
- VMAP (3)
- VOD (3)
- Year Award (3)
- content integration (3)
- customer case (3)
- customise feature (3)
- dynamic ad insertion (3)
- scalable (3)
- server-side ad insertion (3)
- video (3)
- video trends (3)
- webRTC (3)
- "network api" (2)
- Amino Technologies (2)
- Android TV (2)
- CSI Awards (2)
- Encryption (2)
- FireTV (2)
- H.264 (2)
- LHLS (2)
- LL-DASH (2)
- MPEG (2)
- Microsoft Silverlight (2)
- NAB (2)
- OMID (2)
- Press Release (2)
- React Native (2)
- Start-Up Times (2)
- UI (2)
- VAST (2)
- VP9 (2)
- VPAID (2)
- VPAID2.0 (2)
- ad block detection (2)
- ad blocking (2)
- adobe (2)
- ads in HTML5 (2)
- analytics (2)
- android (2)
- captions (2)
- chromecast (2)
- chromecast support (2)
- clipping (2)
- closed captions (2)
- deloitte rising star (2)
- fast500 (2)
- frame accurate clipping (2)
- frame accurate seeking (2)
- metadata (2)
- multiple audio (2)
- playback speed (2)
- plugin-free (2)
- pricing (2)
- seamless transition (2)
- server-side ad replacement (2)
- subtitles (2)
- video publishers (2)
- viewer engagement (2)
- wowza (2)
- "smooth playback" (1)
- 360 Video (1)
- AOM (1)
- API (1)
- BVE (1)
- Best of Show (1)
- CEA-608 (1)
- CEA-708 (1)
- CORS (1)
- DIY (1)
- Edge (1)
- FCC (1)
- HLS stream (1)
- Hudl (1)
- LCEVC (1)
- Microsoft Azure Media Services (1)
- Monoscopic (1)
- NAB Show 2016 (1)
- NPM (1)
- NetOn.Live (1)
- OTT (1)
- Periscope (1)
- Real-time (1)
- SGAI (1)
- SIMID (1)
- Scale Up of the Year award (1)
- Seeking (1)
- Stereoscopic (1)
- Swisscom (1)
- TVB Europe (1)
- Tech Startup Day (1)
- Telenet (1)
- Uncategorized (1)
- University of Manitoba (1)
- User Interface (1)
- VR (1)
- VR180 (1)
- Vivaldi support (1)
- Vualto (1)
- adblock detection (1)
- apple tv (1)
- audio (1)
- autoplay (1)
- cloud (1)
- company news (1)
- facebook html5 (1)
- faster ABR (1)
- fmp4 (1)
- hiring (1)
- iGameMedia (1)
- iOS (1)
- iOS SDK (1)
- iPadOS (1)
- id3 (1)
- language localisation (1)
- micro moments (1)
- mobile ad (1)
- nagasoft (1)
- new web browser (1)
- offline playback (1)
- preloading (1)
- program-date-time (1)
- server-guided ad insertion (1)
- stream problems (1)
- streaming media east (1)
- support organization (1)
- thumbnails (1)
- use case (1)
- video clipping (1)
- video recording (1)
- video trends in 2016 (1)
- visibility (1)
- vulnerabilities (1)
- zero-day exploit (1)
- November 2024 (1)
- August 2024 (1)
- July 2024 (1)
- January 2024 (1)
- December 2023 (2)
- September 2023 (1)
- July 2023 (2)
- June 2023 (1)
- April 2023 (4)
- March 2023 (2)
- December 2022 (1)
- September 2022 (4)
- July 2022 (2)
- June 2022 (3)
- April 2022 (3)
- March 2022 (1)
- February 2022 (1)
- January 2022 (1)
- November 2021 (1)
- October 2021 (3)
- September 2021 (3)
- August 2021 (1)
- July 2021 (1)
- June 2021 (1)
- May 2021 (8)
- April 2021 (4)
- March 2021 (6)
- February 2021 (10)
- January 2021 (4)
- December 2020 (1)
- November 2020 (1)
- October 2020 (1)
- September 2020 (3)
- August 2020 (1)
- July 2020 (3)
- June 2020 (3)
- May 2020 (1)
- April 2020 (3)
- March 2020 (4)
- February 2020 (1)
- January 2020 (3)
- December 2019 (4)
- November 2019 (4)
- October 2019 (1)
- September 2019 (4)
- August 2019 (2)
- June 2019 (1)
- December 2018 (1)
- November 2018 (3)
- October 2018 (1)
- August 2018 (4)
- July 2018 (2)
- June 2018 (2)
- April 2018 (1)
- March 2018 (3)
- February 2018 (2)
- January 2018 (2)
- December 2017 (1)
- November 2017 (1)
- October 2017 (1)
- September 2017 (2)
- August 2017 (3)
- May 2017 (3)
- April 2017 (1)
- March 2017 (1)
- February 2017 (1)
- December 2016 (1)
- November 2016 (3)
- October 2016 (2)
- September 2016 (4)
- August 2016 (3)
- July 2016 (1)
- May 2016 (2)
- April 2016 (4)
- March 2016 (2)
- February 2016 (4)
- January 2016 (2)
- December 2015 (1)
- November 2015 (2)
- October 2015 (5)
- August 2015 (3)
- July 2015 (1)
- May 2015 (1)
- March 2015 (2)
- January 2015 (2)
- September 2014 (1)
- August 2014 (1)