Variable Rate Shading on Adreno GPUs
“With high screen DPI doesn’t come high GPU fillrate” — that’s the main problem of GPUs nowadays. Modern consoles struggle to sustain stable 30, let alone 60 fps on large 4k screens. The common technique to increase FPS is rendering at lower resolution with fancy upscaling techniques like DLSS and FSR. But modern VR-capable hardware has to be able to target both very high frame rates and high image quality, and upscaling does show its limitations here — depending on implementation the image will be either blurry, too sharpened or will introduce ghosting artifacts. Variable rate shading (VRS) is a temporally stable approach of improving performance with (if applied correctly) virtually unnoticeable quality reduction.
Modern mobile Adreno GPUs by Qualcomm support Variable Rate Shading, and phones with these GPUs have been available since autumn 2021. Because our live wallpapers have to be power-efficient, we have got a test device with Adreno 642L to implement this feature in our apps.
What is Variable Rate Shading
The idea behind VRS is to rasterize a single fragment and then interpolate color between adjacent pixels on screen.
A good explanation of how VRS is implemented on Adreno GPUs can be found in the official Qualcomm Developer blog here. You can understand how simple it is by looking at this image from aforementioned blog post:
VRS is better than generic downsample of the whole frame because:
- It preserves geometry edges (except cases when the shape is determined by discarding fragments).
- Can be adjusted per each draw call — one object can be rendered at full detail while the other one will have reduced quality.
- Can be applied dynamically to keep target FPS by gradually reducing image quality.
Current state of VRS-capable hardware
At the time of writing this article, VRS is a quite new feature in the mobile world. It is currently supported only by the modern Qualcomm Snapdragon and some Samsung’s Exynos chips. A range of Adreno GPUs — 642L, 660+, 650+, 7xx GPUs support this feature, and Samsung supports them in their newest Exynos 2200 SoCs with RDNA2-based Xclipse GPUs. Both of them implement VRS using the same GL_QCOM_shading_rate extension. These chips are already used in quite a lot of new flagship and even mid-range phones.
I’ve tested apps on Samsung remove test lab on device with Exynos 2200 SoC and Xclipse GPUs. Unfortunately it seems that GL_QCOM_shading_rate extension is broken on these GPUs. While this extension is advertised, the required by specs GL_EXT_fragment_invocation_density extension which extends fragment shaders with possibility to adjust for shading rate is missing.
Samsung doesn’t use a dedicated OpenGL driver but instead translate GL API calls to Vulkan using ANGLE, and GL extension for VRS seems to be broken because of this. Hopefully some day Samsung will update Exynos 2200 drivers.
Recently announced Immortalis and Mali-G715/G615 GPUs by Arm will also have VRS support.
If we look at the other part of the mobile world, Apple mobile GPUs have supported VRS for quite some time — since the A13 chip, released in 2019. VRS implementation in Metal API is quite flexible and uses screen-space shading rate maps which allows more precise adjustment of shading rate. It is configured not per-object but even the same object could have different shading rates if necessary. This is identical to desktop implementation in nVidia’s Turing GPUs — you can read this article with good illustrations to see how it works. Also Apple has a somewhat different name for it — Variable Rasterization Rate.
Since Apple’s GPUs are based on Imagination’s IP we may also soon see PowerVR GPUs jumping the VRS bandwagon however there are no official statements about this feature from Imagination.
OK so how to use VRS on supported hardware? On Snapdragon SoCs it is implemented with QCOM_shading_rate extension. Adreno GPUs support blocks of 1x1, 1x2, 2x1, 2x2, 4x2, and 4x4 pixels. Please note that some useful dimensions like 2x4 or 4x1 are not available because they are not supported by hardware.
To apply VRS to certain objects you simply make a call to glShadingRateQCOM with desired rate before the corresponding draw calls.
To disable VRS for geometries which should preserve details and be rendered at native shading rate, simply call glShadingRateQCOM with 1x1 block size.
One of the first apps we’ve added VRS support to is Bonsai Live Wallpaper. This is a good example because it has 3 very different types of geometries ranging from perfect candidates for VRS optimizations to the very unsuitable ones.
Let’s take a look at a typical scene from the app and how different parts of image can benefit from reduced shading rate:
The best type of geometry to be optimized by VRS is the one which is blurred and has small color variation between fragments. So, for sky background we apply a quite heavy 4x2 VRS which still introduces virtually no quality degradation, especially with constantly moving cameras.
On the opposite side of the scales is leaves geometry. On the screenshot below we applied 4x4 VRS to the whole scene to showcase the issue with alpha-testing. Please note that branches, while also using the same heavy 4x4 reduction in this example, have the same smooth and anti-aliased edges, clearly showing a benefit of VRS over traditional upscaling.
Needless to say, VRS is clearly not suitable for geometries with discarded fragments.
Also because VRS is applied in screen-space, it introduces significant distortions to transparent dust particles.Their size is comparable to VRS block and they start flickering during movement. I’ve noticed a somewhat similar rendering technique used in the COD:MW game on PC when enabling half-resolution particles — sparks and other small particles flicker way too much and look very blocky.
And somewhere between these two geometries lies the ground plane. This is where we apply 2x1 rate reduction. This results in OK image quality because there’s a larger color difference between adjacent vertical pixels compared to the horizontal ones.
Where VRS definitely shines is when it is applied to geometries with very little color difference between adjacent fragments, and Bonsai wallpaper has a stylized silhouette mode where fragments use literally single color:
Here we have 3 types of shaders:
- Alpha-testing for leaves. We already know that we should not apply VRS to these geometries.
- Solid black silhouette and ground. The heaviest 4x4 VRS introduces literally zero quality degradation.
- For the sky gradient we use 2x1 blocks. Technically it would be perfect to have a 4x1 or even 16x1 blocks because gradient changes vertically and adjacent horizontal fragments have identical color but Adreno hardware supports only 2x1 ones.
All of these applied to the scene results in identical rendering (screenshots comparison found 0 pixels difference) and 1.5x of shading speed improvement.
All our wallpapers use some ways of reducing GPU load when the battery is low. Usually this is done by limiting FPS and omitting a couple of effects.
For more efficient power usage we apply stronger VRS to certain objects in low battery mode. Tree trunks are shaded with 2x1 blocks, sky and transparent effects (light shafts and vignette) are shaded with 4x4 instead of 4x2 or 2x2 blocks. This reduction of quality is still almost unnoticeable but reduces GPU load by additional 3%.
Performance gains vs quality tradeoff
You will be hard-pressed to find any difference between original and VRS-optimized rendering — color deviation is negligible, and blocky artifacts are really hard to spot. Only ImageMagick was able to show different pixels:
Both VRS-enabled and regular rendering pipelines result in steady 120 FPS on our test device (Galaxy Samsung A52s). So we’ve run a Snapdragon Profiler to analyze performance and efficiency of the optimized build. Here are the numbers:
Bonsai 3D Live Wallpaper, regular mode.
Bonsai 3D Live Wallpaper, battery saving mode.
Bonsai 3D Live Wallpaper, silhouette mode.
In the silhouette scene we don’t use different VRS blocks for regular and power saving modes because it already uses maximum block size and still renders the image identical to non-VRS one.
Long story short, we’ve improved rendering efficiency by approximately 30% with little to (literally) none image quality reduction.