In the previous blog, we’ve covered the 4 key factors affecting the quality of low latency streaming experience when utilising Apple's LL-HLS protocol. In this blog, we will take a dive into the importance of Group of Pictures (GOP) and its impacts on the overall viewing experience.
As a refresher, the GOP size, or size of your Group of Pictures is one of the main encoding parameters that have a direct impact on video bitrate, video quality and an indirect impact on end-to-end latency. It determines how often a keyframe (or IDR frame) will be available. In LL-HLS, the player requires a keyframe to start decoding, meaning it can start the playback only at GOP boundaries. Longer GOPs cause higher start-up delay and higher latency.
THIS IS A SNIPPET FROM OUR “OPTIMIZING LL-HLS FOR LOW LATENCY STREAMING” GUIDE WHICH YOU CAN DOWNLOAD HERE.
Impact of GOP size on video bitrate
Apple’s recommended GOP size is 2 seconds. Typical LL-HLS implementations support LL-HLS with 3-second end-to-end latency when the GOP is set to 1 second. However, small GOP sizes come at the cost of higher bandwidth consumption. The smaller the GOP size is, the more frequent the keyframes would be. Depending on the video, keyframes can be 10 times larger than P frames and small keyframe intervals will increase the video bitrate and hence the bandwidth consumption.
In this table, you can see how video bitrate changes with different GOP sizes. For having a comprehensive view, four different types of videos have been tested:
- A movie (Tears of Steal)
- An animation (Big Buck Bunny)
- A bike race TV program
- A static screen streaming
|GOP/BitRate Kbps||0.5 Sec||1 Sec||2 Sec||3 Sec||6 sec||10 sec|
|Tears of Steel||4758||4060||3770||3699||3554||3554|
|Big Buck Bunny||8105||7094||6741||6687||6607||6593|
|Static Video (screen streaming)||1934||1161||775||652||504||453|
Table 1. Bitrate changes on different GOP sizes
For this calculation, the parameter factor CRF (Constant Rate Factor) is kept the same for all GOP sizes forcing the encoder to keep the same video quality in all GOP sizes. As we can see in higher GOP sizes, we can keep the same video quality while using less bandwidth.
Bitrate reduction in large GOP size can differ depending on the type of video. For example, in the static video type (e.g. screen streaming), we have up to 70% reduction in bandwidth consumption from GOP 0.5 seconds to GOP 10 seconds. For other videos, we still have up to a 20% reduction in video bitrate.
Impact of GOP size on video quality
GOP size also has an impact on the video quality. The larger the GOP size is, the higher the video quality will be. Because for the same bitrate we can put more details in the P frames when the GOP size is larger.
We studied how the GOP size affects the video quality. To measure the video quality, we use the VMAF metric. Below is a brief explanation of VMAF.
What is VMAF?
Video Multimethod Assessment Fusion (VMAF) is a video quality metric designed by Netflix consolidating four different metrics:
- Visual Information Fidelity (VIF): considers fidelity loss at four different spatial scales
- Detail Loss Metric (DLM): measures detail loss and impairments which distract viewer attention
- Mean Co-Located Pixel Difference (MCPD): measures the temporal difference between frames on the luminance component
- Anti-noise signal-to-noise ratio (AN-SNR)
VMAF score is ranged between 0 and 100 (100 being identical to the reference video). 6 VMAF points represent a noticeable difference. The VMAF default model is used in this test.
In the table below we depicted how GOP size affects the video quality in different video types. For each encoded video, a VMAF score in comparison to the reference video has been calculated. Depending on the video type, the VMAF score drop in lower GOP sizes is different. Except for the static video streaming, in all the rest of the videos, there is a significant VMAF drop between GOP 10 sec and GOP 0.5 sec. Big Buck Bunny drops by 8 points between GOP 10 sec and GOP 1 sec, which is a noticeable quality degradation.
Please note that in this test the aim was to see the impact of different GOP sizes in the VMAF scores. All videos are encoded at max bitrate 4Mbps. There could be the case that the chosen 4Mpbs is not the highest VMAF scored bitrate for its resolution, but matching the highest VMAF score for each resolution in different videos is out of the scope of this test.
|GOP/VMAF||0.5 Sec||1 Sec||2 Sec||3 Sec||6 sec||10 sec|
|Tears of Steel||79.80846||83.0559||84.5657||85.0617||85.6186||85.7711|
|Big Buck Bunny||49.5474||56.6863||61.0857||62.0536||63.8682||64.7515|
|Static Video (screen streaming)||95.7727||95.7692||95.8438||95.8537||95.8863||95.8856|
Table 2. VMAF scores for different GOP sizes
Based on the VMAF points we see that for some types of videos such as static screen streaming, the quality does not improve that much with large GOP size while you still gain a huge reduction in the bandwidth consumption in large GOP sizes (Table 1). On the other hand, for another type of video such as Big Buck Bunny, the video quality improves up to 15 VMAF points (GOP 10 seconds with respect to GOP 0.5 seconds) which is a considerable amount since every 6 VMAF points is a visually noticeable difference. We also have another pattern for the Tears of Steel video where the VMAF improvement is below 6 VMAF points (between GOP 1sec and GOP 10 sec). In this case, you still have ~20% bitrate reduction in the largest GOP size (Table1).
Impact of GOP size on zapping time and latency
In LL-HLS live streaming, if we increase the GOP size to decrease the bandwidth consumption and increase the video quality, we need to sacrifice the short zapping time and/or the latency.
The player requires a keyframe to start decoding, meaning that a large GOP will impact the zapping time and latency of the stream. It can either wait for the following GOP, implying a long startup time and low latency or it can start playback of the current GOP, implying short startup times, but potential latencies of up to the GOP size. Having large GOPs with only one keyframe every 6 seconds, for example, will mean that the player can start playback on a position once every six seconds. This doesn’t mean your zapping time will be six seconds, but it might require your player to start at a higher latency. With the 6 seconds example, starting playback immediately implies that the average additional latency at the start will be 3 seconds, and in the worst case it can reach up to 6 seconds.
Figure 1. Smaller GOP sizes enable shorter latencies.
In the next blog, we will provide 4 different optimizations for the best low latency streaming experience when utilizing LL-HLS protocol. You can download the complete version of this topic in our “HOW TO OPTIMIZE LL-HLS FOR LOW LATENCY STREAMING” guide here.