How to improve MSAA performance of MTKView
The easiest and fastest way to use Metal in MacOS app is to use MTKView
. It is a handy wrapper which initializes all low-level stuff under the hood so you can get right to the fun part — implementing actual rendering.
However, because of its simplicity it does have a couple of shortcomings and for some reasons doesn’t provide access to all internal things under its hood. One of these minor inconveniences is the way it initializes multisampled render targets.
To understand why this is important let’s explain how Metal handles MSAA. It supports multiple ways of implementing it:
- You can have a multisampled render target and then resolve it to on-screen render target automatically.
- You can use this multisampled render target with custom resolve shaders.
- On supported hardware, you can omit mutlisampled render target and resolve it automatically directly to the final render target. This will still use multisampled render target but it will be memoryless.
- The same approach but with tile shaders to resolve (apply custom tone mappting, etc).
You can find a more detailed explanation of these methods with a sample XCode project in this official Apple documentation article — https://developer.apple.com/documentation/metal/metal_sample_code_library/improving_edge-rendering_quality_with_multisample_antialiasing_msaa
What is of a particular interest for us is the memoryless multisampled render targets. They are very efficient since they are transient and reside only in (extremely fast and tiny) temporary tile memory of GPU. Because of this they don’t use main memory allocations and don’t add up to precious VRAM access bandwidth.
Here is the typical 4x MSAA rasterization process with default render pass created by MTKView
:
And here is the same one but using efficient memoryless render target:
Basically, the only difference is that we substitute transient multisampled render target with the memoryless one and this results in a huge improvement in memory allocation and bandwidth. Please note that according to the Metal Feature Set tables memoryless render targets are not supported by old devices. Namely, Intel-based Macs don’t support tiled rendering and cannot use them. But if you target shiny new Apple-silicon devices then you definitely must use them because they are so extremely efficient.
The thing is that (possibly for a better support of all hardware) MTKView
initializes MSAA only with classic in-memory render targets — multisampled one for the rendering and and the final one for resolving and presenting result on the screen.
In the aforementioned official Metal MSAA example you can find a proper way of initialization of memoryless MSAA resolve but it doesn’t use this handyMTKView
— instead there’s quite a lot of glue code to make it work.
However I’ve found a hacky yet relatively simple and perfectly working way of initializing efficient memoryless MSAA resolve using the default MTKView
wrapper view.
Let’s take a look at what configuration options MTKView
does provide.
Obviously there’s a sampleCount
which will initialize MSAA render targets. Also there are depthStencilPixelFormat
and depthStencilStorageMode
fields. And you can change depth+stencil to use memoryless storage too by setting depthStencilStorageMode=.memoryless
, which also saves a lot of RAM usage and bandwidth in case you don’t need depth information of your frames.
Here’s a typical MTKView initialization code (for an Apple-silicon GPUs, which support memoryless textures):
_metalView.depthStencilPixelFormat = .depth32Float
_metalView.depthStencilStorageMode = .memoryless
_metalView.preferredFramesPerSecond = 60
_metalView.sampleCount = 4 // hard-coded 4 samples but you can query max available samples for GPU and set it accordingly
That’s cool, let’s also switch color render target to the memoryless mode too! Unfortunately for the color render target there is only colorPixelFormat
available (typically set up automatically) and there is no colorStorageMode
. So there’s no easy way to just set it up to use memoryless MSAA mode.
Still there’s a relatively simple way of switching it to the memoryless mode after it has been initialized!
The thing is that Metal API allows you to change the MSAA resolve texture of the current render pass. The descriptor of this render pass is provided to you by MTKView
and obviously it is pre-initialized with in-memory texture.
So all you need to do is on the first frame you draw to create a memoryless render target and substitute the default resolve texture with the new one.
// New memoryless MSAA texture
var textureMsaa: MTLTexture?
.................
func yourCodeToDrawStuff() {
// Before rendering, create and replace MSAA resolve RTT.
do {
let resolveTexture = view.currentRenderPassDescriptor?.colorAttachments[0].resolveTexture
if resolveTexture != nil {
let width = resolveTexture!.width
let height = resolveTexture!.height
if textureMsaa == nil || textureMsaa?.width != width || textureMsaa?.height != height {
// Auto-purge the old unused resolve texture
renderPassDescriptor?.colorAttachments[0].texture?.setPurgeableState(.volatile)
textureMsaa = try create2DRenderTargetMemoryless(width: width, height: height, pixelFormat: .bgra8Unorm, metalDevice: device)
textureMsaa?.label = "Main pass RTT"
}
}
} catch {
fatalError("Cannot create MSAA texture: \(error)")
}
// Use new memoryless texture
view.currentRenderPassDescriptor?.colorAttachments[0].texture = textureMsaa
// Do you rendering here as usual
.....................
}
func create2DRenderTargetMemoryless(width: Int, height: Int, pixelFormat: MTLPixelFormat, metalDevice: MTLDevice) throws -> MTLTexture {
let descriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: pixelFormat, width: width, height: height, mipmapped: false)
descriptor.textureType = .type2DMultisample
descriptor.sampleCount = 4 // Yes I use hard-coded 4 samples here too :)
descriptor.usage = [.renderTarget]
descriptor.resourceOptions = [.storageModeMemoryless]
let result = metalDevice.makeTexture(descriptor: descriptor)
if result != nil {
return result!
}
throw RuntimeError("Cannot create texture with pixelFormat \(pixelFormat) of size \(width)x\(height)")
}
This simple trick effectively substitutes the auto-created in-memory color render target with the memoryless one.
There is one important step to do — you must set the purgeable state to volatile for the old unused render target in order for it to free memory. Otherwise even if it won’t be used it will still keep a large amount of memory allocated for it. This is an extremely powerful and easy-to-use feature of Metal API which I love — if you don’t use some resource, API can get rid of it for you automagically. You don’t have to manually delete them as in OpenGL.
Here are some final memory usage comparisons on aMacBook Air M1 with a full-screen 2560x1600 render target:
First, a default approach — MTKView with 4x MSAA in-memory resolve texture:
This multisampled texture uses 78 MB of memory which is being accessed (both write and read) on every frame!
And here is the memoryless one:
Notice the 78 MB texture is now listed in the unused resources. It actually uses 0 bytes, only listed as a “dormant” 78MB resource which could be re-allocated in case it will be reused again.
This can be confirmed in the Activity Monitor. Before optimization:
And after:
Now my app uses just under 80 MB of total RAM instead of 150! This is a good result for a full-screen 3D app — actually it about as much as just two stock MacOS calculators! (Yes you can check it yourself — Calculator uses ~40MB of RAM which seems to be a bit excessive).
Hope this little tutorial will be useful and will make your Metal app more memory and power-efficient!