In the previous LL-HLS series, we’ve covered how it works and how the end-to-end solution should look as well as suitable use cases and THEO’s recommendations for LL-HLS implementations. In this blog series, we want to focus on how to tune better for Low Latency Streaming with an introduction to HESP and where HESP is improving LL-HLS.
THIS IS A SNIPPET FROM OUR “OPTIMIZING LL-HLS FOR LOW LATENCY STREAMING” GUIDE WHICH YOU CAN DOWNLOAD HERE.
LL-HLS builds on the successful HLS method for streaming video to - originally - Apple devices. Whereas HLS, much like its DASH counterpart, adopts segments (typically a few to 10 seconds) as the basic unit to fetch video content, LL-HLS allows fractions of a segment to be individually addressed and fetched.
This has direct implications on the latency and zapping times. The latency is not defined by the segment size, but by the part sizes since the video parts can be fetched once a part is available and not segment per segment. This makes LL-HLS suited for low latency applications where end-to-end latencies of a few seconds are required and playback closely follows the live event. The smaller parts also allow it to start more rapidly while keeping the latency small because the player can start playback before the live segment is completely available. Moreover, as we will explain through this series, in the right conditions, video can start playback with a part and not only at segment boundaries.
Optimizing for Low Latency Streaming: What are the main factors?
Depending on the use case and the desired latency, bandwidth consumption, and scalability, each of the low latency streaming protocols may be the best option. Here we go through the most important encoding and packaging parameters as well as buffer size and discuss their impact on latency, video quality, bandwidth consumption and the resiliency to the network variations.
The GOP size, or size of your Group of Pictures is one of the main encoding parameters that have a direct impact on video bitrate and video quality and an indirect impact on end-to-end latency. It determines how often a keyframe (or IDR frame) will be available. In LL-HLS, the player requires a keyframe to start decoding, meaning it can start the playback only at GOP boundaries. Longer GOPs cause higher start-up delay and higher latency.
In LL-HLS, the player is not limited to start the playback at segment boundaries and can start the playback at every independent part (the parts that start with a keyframe).
The part size has a direct influence on the end-to-end Latency in LL-HLS. The smaller the part size is, the lower the latency will be. But it is not that simple.
Apple says that the parts can be as low as 200msec. But we need to keep in mind that in LL-HLS, the player must start the playback with a keyframe. If the part does not start with a keyframe (which is the case when part size is smaller than the GOP size), the player should either seek back to a point where a part starts with a keyframe or wait for the next keyframe to start the playback. For example, consider GOP size of 2 seconds, part size of 500msec and playback request is sent at the middle of a 6-second segment. The player needs a keyframe for starting the playback. It must wait for the following keyframe in the next third part which means at least 1.5 seconds zapping time or seek back to two parts behind which will bring additional 1-second latency to the end-to-end latency.
Figure 1. Smaller part size does not necessarily lead to smaller zapping time.
The segment size in LL-HLS does not directly impact the latency as it does in traditional HLS. In general, it is nice to have longer segments that allow for larger GOP size which means higher video quality and lower bandwidth consumption. On the other hand, in LL-HLS large segment size impacts the amount of the parts which you need to list in your playlist. As a result, it affects the size of the playlist (and how much data must be loaded in parallel with the media data). Having long segments can as a result significantly increase the size of the playlist, causing overhead on the network and impacting streaming quality. Segments can’t be too small either since that imposes a smaller GOP size and therefore lower video quality and higher bandwidth consumption.
Buffer size, Network tolerance and ABR in Low Latency Streaming
There is always a trade-off between a secured smooth playback in all (network) conditions and achieving the lowest possible latency. To cope with network and other variations, LL-HLS maintains a buffer to handle the jitter and unforeseen hiccups in the video transmission. The larger the buffer, the higher the tolerance for network issues, but also the higher the latency. In LL-HLS we have a default of 3 part durations in the buffer.
For example, when you have parts of 400ms, this will mean your buffer will target size of 1.2s. Based on our tests, and with correct settings for the part and GOP size, with slightly higher part size, for example, around 1 second, we notice that the buffer size can be slightly decreased without impact on user experience. However, as a baseline, it is envisaged never to have a buffer of fewer than 2 parts.
But the network condition is not always perfect. Besides jitter, we also encounter drops and variations in the network capacity. To cope with this varying network bandwidth, ABR is needed. In order to make sure the ABR is working effectively, the buffer size should be long enough to be able to accommodate the quality switch, just in time before any glitch or rebuffering happening in the playback. Let’s consider the worst-case; If the buffer size is 2 seconds, the segment is 6 seconds, the GOP size is 3 seconds, and the network bandwidth drops to half of the video bitrate near the end of the segment. The player would need to download a new part from lower quality that starts with a keyframe. Because we are near to the end of the segment and the GOP size is 3 seconds, it means that neither the current part nor the previous part contains a keyframe and the player should download the third prior part to be able to switch the quality down. So, you would need to download 3 seconds of data while you have only 2 seconds of buffer. If you reduce the GOP size to 2 seconds, you may still get stalls during the ABR switch.
Therefore, you need to increase the buffer size to make sure you can have a smooth quality switch. A larger buffer size means longer latency. You would think of reducing GOP size to smaller values to have a proper ABR switch down without stalling but as discussed earlier, smaller GOP size comes with lower video quality and higher bandwidth consumption which brings an extra challenge to the ABR itself.
In the next blog, we will explore the impact of GOP on the viewing experience. You can download the complete version of this topic in our “HOW TO OPTIMIZE LL-HLS FOR LOW LATENCY STREAMING” guide here.