Software update: dav1d 0.3.0

Spread the love

Dav1d is an open source av1 decoder developed by the communities of VideoLAN, VLC and FFmpeg. The av1 format was created by the Alliance for Open Media, which includes Amazon, Cisco, Google, Intel, Microsoft, Mozilla and Netflix as members. The reference implementation from AOMedia was quite large because it contained a lot of research code. Dav1d is intended to be the most efficient decoder that can do its job on most platforms. The developers have released version 0.3.0 with the following announcement:

dav1d 0.3.0 Sailfish: ARMed to the teeth

TL;DR: dav1d 0.3.0 decodes AV1 videos 24% faster on SSSE3, 26% on SSE4.1 and 4% on AVX2 (all PC), and 12% faster on Arm64 (mobile).

The open-source AV1 decoder dav1d was updated yesterday to version 0.3.0. With the third release, new assembly code provides some serious performance gains on both the PC and mobile platforms.

PC

On the x86 side, this release mostly improves the SSSE3 performance of dav1d. Xuefeng Jiang contributed with prediction of chroma from luma and Paeth intra prediction functions, delivering 0.8% and 0.4% improved global performance.

Liwei Wang continued his work on inverse transform with larger 8×32, 32×16 and 32×32 and up to 64×64 blocks, providing the largest speedup of this release, way over 10% on some videos.

dav1d 0.3.0 also introduces the first SSE4.1 assembly. In most cases the added SSE4.1 instructions aren’t useful in addition to SSSE3, but Victorien Le Couviour — Tuffet found a usecase where it was. He optimized the CDEF filter, resulting in a 1.15x speedup on the module level and around 1.5% overall.

Meanwhile Henrik Gramner wrote some very clever SSE2 code to speed up entropy decoding/bitstream reading, which started to eat up a large proportion of decode time, especially on AVX2. The assembly code resulted in a speedup for all 64-bit x86 platforms, measured around 4% for AVX2 and 2% for SSSE3 and SSE4.

Overall these commits make dav1d 0.3.0 around 24% faster on SSSE3, 26% faster on SSE4.1 and 4% faster on AVX2 CPUs.

While single-threaded aomdec is still quite strong, with multiple threads dav1d 0.3.0 is making libaom an even smaller spot in the rear view mirror.

Arm64

Martin Storsjö delivered two very nice commits speeding up the loopfilter and selfguided looprestoration with NEON assembly code. Both functions were speeded up by about 3x, resulting in performance gains anywhere from 7% to 36%. Not only allows this for higher resolutions, frame rates and bitrates, but also brings down power consumption on identical content.

These updates push the first 1080p video above the 25 fps with a single core on a Snapdragon 835. Using multiple threads, 30 fps is now rock solid and 60 fps is reachable on some content.

Normalizing the results we see especially the RED clip profiting a lot, since it relies heavily on the loopfilter. Single-thread gains are between 11% and 36% (average 19%), multi-thread between 7% and 16%.

Adoption

The adoption of dav1d is also going very well. The big news is that Chromium, the open-source project behind Google Chrome and now also Microsoft Edge, adopted dav1d and will ship in by default in Chrome 74.

Firefox 67 has also improved the dav1d implementation a lot. dav1d was updated to 0.2.1 and multiple tile threads are now used. dav1d is also enabled by default on Linux and macOS in addition to Windows.

FFmpeg and VLC still use dav1d, and Handbrake is also looking at integrating dav1d as soon as FFmpeg 4.2 is released.

Youtube is also encoding more and more AV1 streams. They even have encoded a few videos in 4K and 8K resolutions up to 60fps.

Version number 0.3.0
Release status Final
Website dav1d
Download
License type Conditions (GNU/BSD/etc.)
You might also like