In the previous blog, we’ve covered the 4 key factors affecting the quality of low latency streaming experience when utilising Apple's LL-HLS protocol. Additionally, we have also discussed the importance of GOP size and its impacts on the overall viewing experience and provide 4 recommendations to ensure the best viewing quality. As the last blog of the series, we are going to explain how HESP works, how it differs from LL-HLS and their performance comparison against each other.
THIS IS A SNIPPET FROM OUR “OPTIMIZING LL-HLS FOR LOW LATENCY STREAMING” GUIDE WHICH YOU CAN DOWNLOAD HERE.
Introduction on HESP (High-Efficiency Streaming Protocol)
HESP is a next-generation online video delivery technology outperforming the current generation protocols for low latency streaming at scale. It is an Ultra-low latency streaming protocol delivered over HTTP/1.1 with Chunked Transfer-Encoding and Range requests with a minimalistic manifest with low-frequency update requirement. There are two complementary streams required:
- The Initialization stream, which contains keyframes that makes it possible to start the playback at any given moment and not necessarily at the beginning of a segment or at a keyframe interval, and
- The Continuation stream, which contains the IPB frames and can be played right after the keyframe from the Initialization stream.
HESP offers a broadcast-like experience with sub-second latency and zapping time on any device or platform. It also delivers very low bandwidth consumption compared to other ultra-low latency streaming protocols such as WebRTC. Being delivered over HTTP, it is compliant with standard CDNs and offers low-cost scaling.
How does an end-to-end HESP solution work?
As mentioned above, HESP is based on using two streams for each quality/track: 1.) Initialization stream to rapidly start new streams; 2. Continuation stream for use in normal operation.
What is Initialization Stream?
The initialization stream consists of initialization packets corresponding to each frame position. The initialization packets are individually addressable. They contain an IDR frame corresponding to the frame position making it possible to start the playback at any given frame and they are contained in an ISOBMFF format.
What is Continuation Stream?
The continuation stream is packaged in CMAF-CTE, albeit with specific configurations for low latency and can start playback immediately after an initialization frame, allowing for very fast channel start and switch times. It is addressed using byte-range requests and is served using Chunked Transfer Encoding for low latency.
The segments in the continuation stream can be lengthy without any limitation for the low latency and fast zapping time, making it possible to have a large GOP size and hence lower bandwidth consumption and higher video quality.
Figure 1. HESP Implementation Diagram
In order to implement HESP, only two components of the video value chain need tailoring: the packager and the player. HESP works with regular encoders and also regular CDNs, as long as these support CTE and byte ranges.
Comparing LL-HLS to HESP
HESP provides sub-second end-to-end latency together with large GOP sizes (10-12 seconds). Thanks to the initialization stream, the quality switch in ABR is not limited to the GOP boundaries and it can happen at any given moment. This means HESP is not limited to a small GOP size. Thus, the GOP size can be kept large while having a small buffer size (HESP has sub-second target buffer) and so it is possible to have low latency and smooth quality switch at any time without risk of rebuffering.
By setting the same latency target as LL-HLS in HESP (~3sec) you would have more margin to encode the video more efficiently resulting in lower video bitrate for the same video quality and so you could save bandwidth consumption.
As described earlier, LL-HLS cannot really exploit the small part size as there are also other consequences to be taken into account; no matter how small the part size is in LL-HLS, you are limited to the keyframe interval to be able to switch the quality in bad network conditions. In HESP, on the other hand, starting the playback is not limited to the GOP boundaries. Therefore, you do not need to sacrifice video quality (smaller GOP) to have the lowest end-to-end latency.
While LL-HLS cannot really exploit the small part size to have low latency in bad network conditions, HESP offers a small buffer size, low latency, large GOP size, and higher video quality all at the same time.
You can download the complete version of this topic in our “HOW TO OPTIMIZE LL-HLS FOR LOW LATENCY STREAMING” guide here.