Introduction to Vulkan Render Passes

Vulkan graphics rendering is organized into render passes and subpasses. This article provides an introduction to these concepts and how to use them in the Vulkan API. If you haven’t done so, it is recommended that you read the article on ‘GPU Framebuffer Memory’ before reading this article.

Render Passes

When a GPU renders a scene, it is configured with one or more render targets, or framebuffer attachments in Khronos terminology. The size and format of the attachments determine how graphics work is configured across the parallelism available on all modern GPUs. For example, on a tile-based renderer, the set of attachments is used to determine the way the image is divided into tiles. In Vulkan, a render pass is the set of attachments, the way they are used, and the rendering work that is performed using them. In a traditional API, a change to a new render pass might correspond to binding a new framebuffer.

Subpasses

During normal rendering, it is not possible for a fragment shader to access the attachments to which it is currently rendering: GPUs have optimized hardware for writing to the attachments, and accessing the attachment interferes with this. However, some common rendering techniques such as deferred shading rely on being able to access the result of previous rendering during shading. For a tile-based renderer, the results of previous rendering can efficiently stay on-chip if subsequent rendering operations are at the same resolution, and if only the data in the pixel currently being rendered is needed (accessing different pixels may require access to values outside the current tile, which breaks this optimization). In order to help optimize deferred shading on tile-based renderers, Vulkan splits the rendering operations of a render pass into subpasses. All subpasses in a render pass share the same resolution and tile arrangement, and as a result, they can access the results of previous subpass.

In Vulkan, a render pass consists of one or more subpasses; for simple rendering operations, there may be only a single subpass in a render pass.

Creating a VkRenderPass

In Vulkan, a render pass is described by an (opaque) VkRenderPass object. This provides a template that is used when beginning a render pass inside a command buffer. The render pass is used with a compatible VkFrameBuffer object, which represents the set of images that will be used as attachments during execution of the render pass.

vkCreateRenderPass

Like many driver objects in Vulkan, a VkRenderPass object is created with a corresponding create function, VkCreateRenderPass():

`VkResult` `vkCreateRenderPass()`
`VkDevice`	`device`	Logical device used for rendering (from `vkCreateDevice`)
`const` `VkRenderPassCreateInfo*`	`pCreateInfo`	Parameters for creation
`const` `VkAllocationCallbacks*`	`pAllocator`	Host memory allocation callback (can be `NULL`)
`VkRenderPass*`	`pRenderPass`	Resulting render pass handle

As with many Vulkan creation functions, most parameters are passed through a creation structure. This approach makes it more efficient to create multiple identical objects, and provides a way to support type-safe additional parameters through extensions.

Many creation methods in Vulkan offer a call-back for applications which wish to track host-side memory usage. While important for applications that wish to have precise control over resource allocation, and useful for debugging, in most cases this callback can be left as NULL to rely on the driver's default memory allocation scheme.

As with other Vulkan creation functions, the function returns an error code if anything goes wrong - although more information may be available through validation layers if the problem is an application error. The newly-created render pass description is returned via the pRenderPass pointer.

The interesting parameters are contained in the pCreateInfo structure.

VkRenderPassCreateInfo

`struct` `VkRenderPassCreateInfo()`
`VkStructureType`	`sType`	Used for type safety and extensions, must be `VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO`
`const void*`	`pNext`	Allows extensions to provide extra parameters, must be `NULL` if not needed by an extension
`VkRenderPassCreateFlags`	`flags`	Reserved for future use (should be `0`)
`uint32_t`	`attachmentCount`	Number of framebuffer attachments used in this render pass (across all subpasses)
`const VkAttachmentDescription*`	`pAttachments`	Description of the attachments (array of size `attachmentCount`)
`uint32_t`	`subpassCount`	Number of subpasses
`const VkSubpassDescription*`	`pSubpasses`	Description of subpasses (array of size `subpassCount`)
`uint32_t`	`dependencyCount`	Number of dependencies between subpass pairs
`const VkSubpassDependency*`	`pDependencies`	Descriptions of dependencies between subpasses (array of size `dependencyCount`)

For the purposes of this article, we will begin with a simple rendering operation with only a single subpass (a render pass always consists of at least one subpass). In this case, subpassCount can be 1 and dependencyCount can be 0 (so pDependencies can be NULL - we'll come back to describe how else dependencies are used below).

VkAttachmentDescription

An attachment corresponds to a single Vulkan VkImageView. A description of the attachment is provided to the render pass creation, which allows the render pass to be configured appropriately; the actual images to be used are provided when the render pass is used, via the VkFrameBuffer. It is possible to associate multiple attachments with a render pass; these may be used for example as multiple render targets, or in separate subpasses. More commonly, a color framebuffer and a depth buffer are separate attachments in Vulkan. Therefore the pAttachments member of VkRenderPassCreateInfo points to an array of attachmentCount elements.

`struct` `VkAttachmentDescription`
`VkAttachmentDescriptionFlags`	`flags`	Used by extensions; can be `0`
`VkFormat`	`format`	Image format of the attachment
`VkSampleCountFlagBits`	`samples`	Number of samples in the attachment (used for multi-sampling)
`VkAttachmentLoadOp`	`loadOp`	What should be done to access the attachment before rendering
`VkAttachmentStoreOp`	`storeOp`	What should be done with the attachment after rendering
`VkAttachmentLoadOp`	`stencilLoadOp`	In the case of a depth/stencil attachment, how to access the stencil contents before rendering
`VkAttachmentStoreOp`	`stencilStoreOp`	In the case of a depth/stencil attachment, what should be done with the stencil after rendering
`VkImageLayout`	`initialLayout`	Layout of the attachment when first used in the render pass
`VkImageLayout`	`finalLayout`	Layout of the attachment after use in the render pass

For a simple rendering operation, we might decide to create two attachments:

	`Color attachment (pAttachments[0])`	`Depth attachment (pAttachments[1])`
`flags`	`VK_IMAGE_FORMAT_B8G8R8A8_UNORM`	`VK_IMAGE_FORMAT_D16_UNORM`
`format`	`0`	`0`
`samples`	`1`	`1`
`loadOp`	`VK_ATTACHMENT_LOAD_OP_DONT_CARE`	`VK_ATTACHMENT_LOAD_OP_CLEAR`
`storeOp`	`VK_ATTACHMENT_STORE_OP_STORE`	`VK_ATTACHMENT_STORE_OP_DONT_CARE`
`stencilLoadOp`	`VK_ATTACHMENT_LOAD_OP_DONT_CARE`	`VK_ATTACHMENT_LOAD_OP_DONT_CARE`
`stencilStoreOp`	`VK_ATTACHMENT_STORE_OP_DONT_CARE`	`VK_ATTACHMENT_STORE_OP_DONT_CARE`
`initialLayout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`	`VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL`
`finalLayout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`	`VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL`

Stencil is special because the combined depth/stencil attachment is a single attachment. Here, we aren't using stencil, so the stencilLoadOp and stencilStoreOp are irrelevant. Note that a "DONT_CARE" store op doesn’t guarantee not to touch the memory, because while they may not access memory on a tile-based renderer, an immediate-mode renderer may actually use memory to implement them during rendering; similarly, a "DONT_CARE" load op avoids the need to read the previous frame buffer contents in a tiler, but also avoids the need to perform an explicit clear of the memory which may be costly for an immediate-mode renderer.

Note: We're assuming that the images have been transitioned from VK_IMAGE_LAYOUT_UNDEFINED (on creation) to VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL and VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL before we use them, for example by using a VkImageMemoryBarrier.

There is a complication to this mechanism to be aware of: Consider the example of drawing a scene with two render passes, the second of which uses the results of the first (written with STORE_OP_STORE) as an input (LOAD_OP_LOAD) input attachment but does not write to it. If this input attachment is still wanted after the second render pass, it must still have STORE_OP_STORE associated with it: using STORE_OP_DONT_CARE causes some hardware to perform an optimization and discard the attachment content after the second render pass, even though the first render pass used STORE_OP_STORE. You may think of this as a cache discard of the output of the first render pass, where the cache line was previously considered to be valid. This is potentially a good performance enhancement, but it does mean that users need to be prepared for surprising behavior!

VkSubpassDescription

`struct` `VkSubpassDescription`
`VkSubpassDescriptionFlags`	`flags`	Reserved for future use, must be `0`
`VkPipelineBindPoint`	`pipelineBindPoint`	Should be `VK_PIPELINE_BIND_POINT_GRAPHICS`
`uint32_t`	`inputAttachmentCount`	Number of input attachments to this subpass
`const VkAttachmentReference*`	`pInputAttachments`	Array of input attachments read by this subpass (array of size `inputAttachmentCount`)
`uint32_t`	`colorAttachmentCount`	Number of output attachments for this subpass
`const VkAttachmentReference*`	`pColorAttachments`	Array of color attachments written to by this subpass (array of size `colorAttachmentCount`)
`const VkAttachmentReference*`	`pResolveAttachments`	Attachments for antialiasing (`NULL` or array of size `colorAttachmentCount`)
`const VkAttachmentReference*`	`pDepthStencilAttachment`	One attachment reference describing the depth/stencil attachment
`uint32_t`	`preserveAttachmentCount`	Number of attachments preserved across this subpass
`const uint32_t*`	`pPreserveAttachments`	Array of attachment indices preserved across this subpass, of size `preserveAttachmentCount`, or `NULL`

In our first example, we only have a single subpass, and we'll render to it directly. We won't use pResolveAttachments (so we can set it to NULL) and we do not need to preserve any attachments (so preserveAttachmentCount can be 0 and pPreserveAttachments can be NULL). The fields we don't need now will be described in more detail below, but in our simple case we can configure the (single) subpass. Before we get there, we have one more level of Vulkan object to worry about:

VkAttachmentReference

`struct` `VkAttachmentReference`
`uint32_t`	`attachment`	Id of the attachment (index into `VkRenderpassCreateInfo::pAttachments`)
`VkImageLayout`	`layout`	Layout of the attachment during the subpass

In the example we're walking through, we have two attachments in total:

	`pColorAttachments[0]`	`pDepthStencilAttachment`
`attachment`	`0`	`1`
`layout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`	`VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL`

The layout can change between subpasses of a render pass, hence the need to describe it on a per-subpass basis.

Example render pass (complete create info)

In summary, in the simple render pass we've been using as an example, we have the following two attachments:

Attachment	At start of render pass	At end of render pass
Color attachment `VK_IMAGE_FORMAT_B8G8R8A8_UNORM`	`VK_ATTACHMENT_LOAD_OP_DONT_CARE` (All pixels will be overwritten)	`VK_ATTACHMENT_STORE_OP_STORE` (Write pixels to memory after rendering)
Depth(/stencil) attachment `VK_IMAGE_FORMAT_D16_UNORM`	`VK_ATTACHMENT_LOAD_OP_CLEAR` (Don't want previous depth values)	`VK_ATTACHMENT_STORE_OP_DONT_CARE` (Don't need depth after rendering)

In total, then, our simple render pass looks like this:

*pCreateInfo

sType

VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO

pNext

NULL

flags

0

attachmentCount

2

pAttachments

pAttachments[0]

`flags`	`0`
`format`	`VK_IMAGE_FORMAT_B8G8R8A8_UNORM`
`samples`	`1`
`loadOp`	`VK_ATTACHMENT_LOAD_OP_DONT_CARE`
`storeOp`	`VK_ATTACHMENT_STORE_OP_STORE`
`stencilLoadOp`	`VK_ATTACHMENT_LOAD_OP_DONT_CARE`
`stencilStoreOp`	`VK_ATTACHMENT_STORE_OP_DONT_CARE`
`initialLayout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`
`finalLayout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`

pAttachments[1]

`flags`	`0`
`format`	`VK_IMAGE_FORMAT_D16_UNORM`
`samples`	`1`
`loadOp`	`VK_ATTACHMENT_LOAD_OP_CLEAR`
`storeOp`	`VK_ATTACHMENT_STORE_OP_DONT_CARE`
`stencilLoadOp`	`VK_ATTACHMENT_LOAD_OP_DONT_CARE`
`stencilStoreOp`	`VK_ATTACHMENT_STORE_OP_DONT_CARE`
`initialLayout`	`VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL`
`finalLayout`	`VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL`

subpassCount

1

pSubpasses

pSubpasses[0]

flags 0

pipelineBindPoint VK_PIPELINE_BIND_POINT_GRAPHICS

inputAttachmentCount 0

pInputAttachments NULL

colorAttachmentCount 1

pColorAttachments

pColorAttachments[0]

`attachment`	`0`
`layout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`

pResolveAttachments NULL

pDepthStencilAttachment

*pDepthStencilAttachment

`attachment`	`1`
`layout`	`VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL`

preserveAttachmentCount 0

pPreserveAttachments NULL

dependencyCount

0

pDependencies

NULL

Fortunately, since render passes can be reused, you may not need to do this too often. We'll see later the flexibility exposed by this mechanism.

Creating a VkFrameBuffer

A VkRenderPass is a template for how a render pass will be used. When we use the render pass, we need to provide the actual images which are to be used for rendering. The mechanism containing references to the actual images is a VkFramebuffer, which contains all the attachments used by the render pass.

vkCreateFrameBuffer

As with vkCreateRenderPass for a vkRenderPass, a VkFramebuffer is created with vkCreateFramebuffer():

`VkResult` `vkCreateFramebuffer()`
`VkDevice`	`device`	Logical device used for rendering (from `vkCreateDevice`)
`const VkFramebufferCreateInfo*`	`pCreateInfo`	Parameters for creation
`const VkAllocationCallbacks*`	`pAllocator`	Host memory allocation callback (can be `NULL`)
`VkFramebuffer*`	`pFramebuffer`	Resulting frame buffer handle

Again, to allow extensibility and reusability, the parameters are passed through a pCreateInfo pointer. (Yes, here we go again!)

VkFramebufferCreateInfo

`struct` `VkFramebufferCreateInfo`
`VkStructureType`	`sType`	Used for type safety and extensions, must be `VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO`
`const void*`	`pNext`	Used for extensions; `NULL` if no extensions are used to add parameters
`VkFramebufferCreateFlags`	`flags`	Reserved for future use, must be `0`
`VkRenderPass`	`renderPass`	The render pass (or a compatible one) with which the framebuffer will be used
`uint32_t`	`attachmentCount`	Number of attachments used in the render pass
`const VkImageView*`	`pAttachments`	Array of image views, which refer to actual images; array is of size `attachmentCount`
`uint32_t`	`width`	Width of framebuffer
`uint32_t`	`height`	Height of framebuffer
`uint32_t`	`layers`	Number of layers in framebuffer

Note that all the attachments used in the framebuffer are of the same width, height and number of layers - but that this is independent of the render pass, so the same render pass can be used with framebuffers of different sizes.

For our simple example, we need two image views: one referring to a VK_IMAGE_FORMAT_B8G8R8A8_UNORM image and one referring to a VK_IMAGE_FORMAT_D16_UNORM image. For efficiency, since we typically don't need the depth buffer to persist after rendering, the D16 image can be created with VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT in its usage flags, and can be bound to memory with the VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT set. In this case, a tile-based renderer may be able to avoid allocating any memory for the depth buffer, since it is only used for rendering operations which occur on-chip.

Using a VkRenderPass

Now that we have a VkRenderPass and a VkFramebuffer, we can use them in the rendering process.

vkCmdBeginRenderPass

To begin a render pass instance in a command buffer, call vkCmdBeginRenderPass():

`void` `vkCmdBeginRenderPass()`
`VkCommandBuffer`	`commandBuffer`	Command buffer into which to insert the render pass
`const VkRenderPassBeginInfo*`	`pRenderPassBegin`	Arguments
`VkSubpassContents`	`contents`	Indication whether secondary command buffers are in use (see below)

A render pass can only begin (and end) in a primary command buffer.

Once a render pass has begun on a command buffer, subsequent commands submitted to that command buffer will execute within the first (and in the case of our example, only) subpass of the render pass instance. In our simple case, we could use just the one command buffer and record rendering commands directly into it. In this case, contents should be VK_SUBPASS_CONTENTS_INLINE.

VkRenderPassBeginInfo

As with many functions, Vulkan uses an info structure for reusability and extensibility.

`struct` `VkRenderPassBeginInfo`
`VkStructureType`	`sType`	Must be `VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO`
`void*`	`pNext`	Used for extensions; must be `NULL` if no extension is used which adds to this struct
`VkRenderPass`	`renderPass`	The render pass description created by `vkCreateRenderPass()`
`VkFramebuffer`	`framebuffer`	The framebuffer containing the images for rendering, created by `vkCreateFramebuffer()`
`VkRect2D`	`renderArea`	Bounds of the rectangular area affected by the render pass
`uint32_t`	`clearValueCount`	Number of clear values
`const VkClearValue*`	`pClearValues`	Values used for clearing attachments (array of size `clearValueCount)`

renderArea is used for rendering a subset of the framebuffer, for example for partial updates of dirty areas of the screen. The application is responsible for clipping rendering to this area, and rendering to less than the entire screen can invoke a performance hit if the area being drawn is not aligned as can be determined by vkGetRenderAreaGranularity() - which for a tile-based renderer might be expected to correspond to the alignment of the tile grid. For most purposes, the render area can be set to the full width and height of the framebuffer.

pClearValues is indexed by the attachment number and used if the attachment has a loadOp of VK_ATTACHMENT_LOAD_OP_CLEAR. In the case of our simple example, we clear the depth attachment at the start of rendering, and the depth attachment is at index 1 in our attachment array - so we need pClearValues[1] to represent the value to which we want to clear the depth buffer.

`union`	`VkClearValue`
`VkClearColorValue`	`color`	Value used when clearing color buffers
`VkClearDepthStencilValue`	`depthStencil`	Value used when clearing depth/stencil buffers

VkClearColorValue is a union of arrays of various channel types, with the format chosen by the attachment format being cleared. VkClearDepthStencilValue always has a float depth value, and a uint32_t stencil value. For our simple example, only the float depth value is relevant, and should be set to the depth value we want for our rendering.

vkCmdEndRenderPass

After the last rendering commands for the render pass instance have been submitted to the command buffer, the application must end the render pass instance:

`void` `vkCmdEndRenderPass()`
`VkCommandBuffer`	`commandBuffer`

In this example, if we have been recording commands direct to the primary command buffer, the command buffer looks like this:

Command buffer

Previous render pass...

Current render pass

vkCmdBeginRenderPass()

vkCmdBind*...

vkCmdDraw*... etc.

vkCmdEndRenderPass()

Next render pass...

Multiple render passes can be inserted into the same command buffer, so long as one is ended before the next is begun. A render pass must both begin and end within a single primary command buffer (that is, a render pass cannot span multiple primary command buffers), so parallelism in command buffer building in this approach relies on parallel building of multiple render passes. In many rendering frameworks, this level of parallelism is still enough to allow the CPU cores to stay busy, and simplifies the task of resource management and state tracking.

Render passes and secondary command buffers

In some rendering scenarios, a large amount of work needs to be performed within a single rendering pass. For example, a large number of characters may be managed and animated by their own threads, but all appear on screen at once. This complicates the task of optimizing rendering order and minimizing state changes, but can still be necessary in some highly-parallel systems.

Vulkan's solution to this is to make use of secondary command buffers, which (for graphics rendering) are executed inside a render pass. A secondary command buffer is created by vkAllocateCommandBuffers() using a VkCommandBufferAllocateInfo with a level member of VK_COMMAND_BUFFER_LEVEL_SECONDARY.

Beginning a secondary command buffer

For graphics, the VkCommandBufferBeginInfo argument of vkBeginCommandBuffer when creating a secondary command buffer must have a valid pInheritanceInfo field:

`VkResult` `vkBeginCommandBuffer()`
`VkCommandBuffer`	`commandBuffer`	Command buffer to start using
`const VkCommandBufferBeginInfo*`	`pBeginInfo`	Arguments

`struct` `VkCommandBufferBeginInfo`
`VkStructureType`	`sType`	`VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO`
`const void*`	`pNext`	For extensions, should be `NULL`
`VkCommandBufferUsageFlags`	`flags`	Usage flags (see below)
`const VkCommandBufferInheritanceInfo*`	`pInheritanceInfo`	Inherited info (`NULL` for a primary command buffer, must be valid for a secondary)

`VkCommandBufferBeginInfo::flags`

flags has the following bit values:

`enum` `VkCommandBufferUsageFlagBits`
`VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT`	Set if the command buffer will only ever be used once (potential optimization)
`VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT`	Must be set for secondary graphics command buffers
`VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT`	Set if the command buffer will be submitted more than once concurrently (set only if needed when reusing command buffers)

Secondary command buffers can also be used for compute, and in this case their operations do not fall within a render pass. For graphics, we must set RENDER_PASS_CONTINUE_BIT, may be able to set ONE_TIME_SUBMIT_BIT, and may need to set SIMULTANEOUS_USE_BIT. These options affect the way secondary command buffers are implemented - for example, some may make the difference between whether a separate copy must be made of a secondary command buffer before use, or whether the existing copy may be used indirectly.

`VkCommandBufferBeginInfo::pInheritanceInfo`

pInheritanceInfo is used to allow the secondary command buffer to be configured correctly for the render pass:

`struct` `VkCommandBufferInheritanceInfo`
`VkStructureType`	`sType`	`VK_STRUCTURE_TYPE_COMMAND_BUFFER_INHERITANCE_INFO`
`void*`	`pNext`	Used for extensions, and should be `NULL` unless needed
`VkRenderPass`	`renderPass`	The render pass (or a compatible one) which will be active when the command buffer is used
`uint32_t`	`subpass`	The subpass of the render pass that this command buffer will be used in
`VkFramebuffer`	`framebuffer`	The framebuffer to be used (if known), or `VK_NULL_HANDLE` if unknown
`VkBool32`	`occlusionQueryEnable`	Should be `VK_TRUE` if the primary command buffer might have a query active, and `VK_FALSE` otherwise
`VkQueryControlFlags`	`queryFlags`	Queries that can be used in the primary command buffer when this secondary command buffer executes; `0` if unused
`VkQueryPipelineStatisticFlags`	`pipelineStatistics`	Set of pipeline statistics that can be counted by a query; `0` if pipeline statistics queries are disabled

If the framebuffer is known at the time the command buffer is recorded (for example, if the same framebuffer is always used for generating a shadow map) then providing an explicit framebuffer may be more efficient; otherwise (if the framebuffer argument is VK_NULL_HANDLE) the framebuffer is determined by the render pass in the primary command buffer, which allows secondary command buffers to be reused with different (compatible) framebuffers determined by the primary command buffer that is using the secondary command buffer.

Rendering commands are recorded into the secondary command buffer in the same way as for a primary command buffer, and having multiple secondary command buffers allows multiple threads to record rendering commands concurrently without need for synchronization.

Invoking a secondary command buffer

When the secondary command buffers have been recorded, they can be invoked in a "parent" primary command buffer with vkCmdExecuteCommands():

`void` `vkCmdExecuteCommands()`
`VkCommandBuffer`	`commandBuffer`	Primary command buffer to be recorded into
`uint32_t`	`commandBufferCount`	Number of secondary command buffers to submit
`const VkCommandBuffer*`	`pCommandBuffers`	Array of `commandBufferCount` secondary command buffers to execute (in increasing array index order)

Secondary command buffers inside a subpass

Using the above techniques, work may be distributed as in the following example:

Thread 1

Record secondary
command buffer A
(frame 2)

Record secondary
command buffer A
(frame 3)

...

Thread 2

Record secondary
command buffer B
(frame 2)

Record secondary
command buffer B
(frame 3)

...

Thread 3

Record secondary
command buffer C
(frame 2)

Record secondary
command buffer C
(frame 3)

...

Thread 4

Primary command buffer (frame 1)
`vkCmdBeginRenderPass()`
`vkCmdExecuteCommands(A)`
`vkCmdExecuteCommands(B)`
`vkCmdExecuteCommands(C)`
`vkCmdEndRenderPass()`

Primary command buffer (frame 2)
`vkCmdBeginRenderPass()`
`vkCmdExecuteCommands(A)`
`vkCmdExecuteCommands(B)`
`vkCmdExecuteCommands(C)`
`vkCmdEndRenderPass()`

Primary command buffer (frame 3)
`vkCmdBeginRenderPass()`
`vkCmdExecuteCommands(A)`
`vkCmdExecuteCommands(B)`
`vkCmdExecuteCommands(C)`
`vkCmdEndRenderPass()`

Recording the primary command buffer should be faster than recording a significant amount of work into the secondary command buffers. However, there is typically some cost - especially for implementations which require the secondary command buffers to be copied into the primary command buffer. This approach also assumes that the secondary command buffers are at least double-buffered, and that the threads are suitably synchronized.

Since primary command buffers can be recorded in parallel and vkQueueSubmit() allows multiple command buffers to be submitted efficiently, exposing parallelism across secondary command buffers is not necessary in many applications, so this technique should be matched to the rendering work load. Note that it can also be possible to re-use secondary command buffers, although again this may carry some driver overhead (hopefully less than recording anew). Command buffer reuse should be used selectively, allowing for other optimizations such as frustum culling.

Destroying a VkRenderPass

Once a render pass is no longer needed, it can be deleted as follows:

`void` `vkDestroyRenderPass()`
`VkDevice`	`device`
`VkRenderPass`	`renderPass`
`const VkAllocationCallbacks*`	`pAllocator`

Note that it is up to the user to ensure that nothing is still rendering which referred to the render pass at the point vkDestroyRenderPass() is called - for example by using vkWaitForFences() with a VkFence handle previously passed to vkQueueSubmit().

Multi-sampling

Tiled rendering also provides a low-bandwidth way to implement antialiasing: we can render to the tiles normally, but average pixel values as part of the operation of writing the tile memory; this downsampling step is known as "resolving" the tile buffer.

Vulkan has the concept of a number of samples associated with an image. In a simple implementation the image might have several values stored at each pixel location; more complex implementations have compressed schemes. Therefore an image has a number of samples associated with it at image creation time. For multi-sampled rendering in Vulkan, the multi-sampled image is treated separately from the final single-sampled image; this provides separate control over what values need to reach memory, since - like the depth buffer - the multi-sampled image may only need to be accessed during the processing of a tile. For this reason, if the multi-sampled image is not required after the render pass, it can be created with VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT and bound to an allocation created with VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT, as described above for depth buffers. The multi-sampled attachment storeOp can then be set to VK_ATTACHMENT_STORE_OP_DONT_CARE in the VkAttachmentDescription, so that (at least on tiled renderers) the full multi-sampled attachment does not need to be written to memory, which can save a lot of bandwidth.

To control multi-sampling, the index of an attached image view (in the pAttachments array of the VkFramebufferCreateInfo) with more than one sample should be used in the VkSubpassDescription's pColorAttachment array, and the index of a corresponding image view with exactly one sample should be placed in the corresponding index of the pResolveAttachments array; the multi-sampled image is then resolved to the single-sampled image at the end of the current sub-pass. To use pResolveAttachments for some attachments but not others, the entry in the pResolveAttachments array can be set to VK_ATTACHMENT_UNUSED to avoid resolving the corresponding multi-sampled image.

For example, if we had three multi-sampled attachments and only wanted the first and third to be resolved to single-sampled form, the VkSubpassDescription may have the following entries:

Index	`pColorAttachments[]`	`pResolveAttachment[]`
0	Index of first multi-sampled attachment	Index of first single-sampled attachment
1	Index of second multi-sampled attachment	`VK_ATTACHMENT_UNUSED`
2	Index of third multi-sampled attachment	Index of second single-sampled attachment

Remember that if we don't want to resolve any attachments in the subpass, pResolveAttachments can simply be set to NULL. Multi-sampled images can also be resolved to a single-sample image with vkCmdResolveImage() - but this happens outside the render pass and requires a separate access to memory, so it is a much less efficient solution if it can be avoided. Note that you can write both the resolved and multi-sampled images out of the same render pass by setting the storeOp of both attachments to VK_ATTACHMENT_STORE_OP_STORE.

Resolving an image outside a render pass

On some occasions, the attachment containing all samples may need to be written to memory for later processing (for example, use in a later render pass as an input attachment). It is possible to resolve a multi-sampled image to a single-sampled one without using it as an attachment in a render pass using the vkCmdResolveImage() command.

However, please bear in mind that this should be the exception to normal rendering, not the default approach. Writing out the multi-sampled attachment to off-chip memory (rather than using VK_ATTACHMENT_STORE_OP_DONT_CARE) has a high bandwidth cost, and vkCmdResolveImage() itself must then read all this data back, process it, and write the single-sampled output. It is very much more efficient to perform resolve operations inside a render pass where possible.

Multiple subpasses

The render pass mechanism described so far is quite verbose for use with a single subpass. The reason for this is the flexibility that it provides when when using multiple subpasses.

Some rendering techniques, notably deferred shading and deferred lighting, traverse the scene geometry once to create a frame buffer, then use the rendering results in the framebuffer for further rendering operations. The same can be said for, for example, applying tone mapping effects after rendering. In a tiled renderer, because each of these operations requires access only to the current pixel and not the entire framebuffer, all of these operations can be performed consecutively on a per-tile basis, avoiding the need to write intermediate values out to memory. This can provide a significant bandwidth (and therefore power and performance) improvement. There is a graphical example of how deferred shading is evaluated on a tiler towards then end of the Understanding Tiling article.

Note that because the render area size is defined by the width and height fields of the VkFrameBufferCreateInfo object, the render area of each attachment is effectively the same size, and this is true for all subpasses in a render pass. If a rendering technique requires reading values outside the current fragment area (which on a tiler would mean accessing rendered data outside the currently-rendering tile), separate render passes must be used.

Taking the example of deferred lighting, we might render the scene in three "subpasses":

The first subpass renders the geometry and stores the depth, normal vector and specular spread function.
The second subpass renders each light's bounds, accumulating a specular and diffuse color for each light that is calculated with the position, normal and specular spread function from the first subpass.
Finally, the scene geometry is processed again with conventional forward shading, picking up the light contributions from the results of the second subpass.

Since the shading in the first subpass is highly simplistic, the shader run-time cost can be significantly reduced in this approach, although the degree of shader parallelism in the final subpass may still depend on fragment coverage. The related deferred shading technique can allow for better shader parallelism at the cost of reduced flexibility and increasing intermediate storage requirements.

Multiple attachments for multiple subpasses

In our deferred lighting example, the depth buffer is used in all three subpasses; it should only be updated by the first, but the lighting subpass needs the depth attachment both to provide an accurate bounds for a light and to calculate the shading position in world space, and the final rendering pass can inherit the depth buffer to avoid unnecessary overdraw.

In this case, our render pass might use the following attachments:

ID	Field	Value	Notes
0	`flags`	`0`	Reserved
	`samples`	`1`	Single-sampled
	`format`	`VK_IMAGE_FORMAT_B8G8R8A8_UNORM`	`loadOp`	`VK_ATTACHMENT_LOAD_OP_DONT_CARE`	Assuming this will be completely overwritten
	`storeOp`	`VK_ATTACHMENT_STORE_OP_DONT_CARE`	Intermediate storage (not written)
	`stencilLoadOp`	`VK_ATTACHMENT_LOAD_OP_DONT_CARE`	Unused
	`stencilStoreOp`	`VK_ATTACHMENT_STORE_OP_DONT_CARE`	Unused
	`initialLayout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`	Rendering to it
	`finalLayout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`	Rendering to it
	1	`flags`	`0`	Reserved
`samples`		`1`	Single-sampled
`format`		`VK_IMAGE_FORMAT_D16_UNORM`	Depth
`loadOp`		`VK_ATTACHMENT_LOAD_OP_CLEAR`	Need empty depth buffer before use
`storeOp`		`VK_ATTACHMENT_STORE_OP_DONT_CARE`	Intermediate storage (not written)
`stencilLoadOp`		`VK_ATTACHMENT_LOAD_OP_DONT_CARE`	Unused
`stencilStoreOp`		`VK_ATTACHMENT_STORE_OP_DONT_CARE`	Unused
`initialLayout`		`VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL`	Rendering to it
`finalLayout`		`VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL`	Rendering to it
2	`flags`	`0`	Reserved
	`samples`	`1`	Single-sampled
	`format`	`VK_IMAGE_FORMAT_B8G8R8A8_UNORM`	Accumulated diffuse lighting contribution
	`loadOp`	`VK_ATTACHMENT_LOAD_OP_CLEAR`	Accumulating, so start at 0
	`storeOp`	`VK_ATTACHMENT_STORE_OP_DONT_CARE`	Intermediate storage (not written)
	`stencilLoadOp`	`VK_ATTACHMENT_LOAD_OP_DONT_CARE`	Unused
	`stencilStoreOp`	`VK_ATTACHMENT_STORE_OP_DONT_CARE`	Unused
	`initialLayout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`	Rendering to it
	`finalLayout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`	Rendering to it
3	`flags`	`0`	Reserved
	`samples`	`1`	Single-sampled
	`format`	`VK_IMAGE_FORMAT_B8G8R8A8_UNORM`	Accumulated specular lighting contribution
	`loadOp`	`VK_ATTACHMENT_LOAD_OP_CLEAR`	Accumulating, so start with 0
	`storeOp`	`VK_ATTACHMENT_STORE_OP_DONT_CARE`	Intermediate storage (not written)
	`stencilLoadOp`	`VK_ATTACHMENT_LOAD_OP_DONT_CARE`	Unused
	`stencilStoreOp`	`VK_ATTACHMENT_STORE_OP_DONT_CARE`	Unused
	`initialLayout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`	Rendering to it
	`finalLayout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`	Rendering to it
4	`flags`	`0`	Reserved
	`samples`	`1`	Single-sampled
	`format`	`VK_IMAGE_FORMAT_B8G8R8A8_UNORM`	Final output of rendering
	`loadOp`	`VK_ATTACHMENT_LOAD_OP_DONT_CARE`	Assuming rendering the whole frame
	`storeOp`	`VK_ATTACHMENT_STORE_OP_STORE`	Write output of rendering
	`stencilLoadOp`	`VK_ATTACHMENT_LOAD_OP_DONT_CARE`	Unused
	`stencilStoreOp`	`VK_ATTACHMENT_STORE_OP_DONT_CARE`	Unused
	`initialLayout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`	Rendering to it
	`finalLayout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`	Rendering to it

That is:

Attachment 0 holds the surface normal and specular factor output by the first subpass, and used by the second subpass.
Attachment 1 holds the depth buffer for the scene, and applies to all three subpasses.
Attachment 2 holds the diffuse contributions from light sources output by the second subpass and read by the third.
Attachment 3 holds the specular contributions from light sources output by the second subpass and read by the third.
Attachment 4 holds the final result of rendering generated by the third subpass.

Relating attachments to subpasses

To associate the way these attachments are used with each subpass, we need a more complex array of VkSubpassDescription objects to pass to the pSubpasses member of our VkRenderPassCreateInfo object:

pSubpasses[0]

flags 0

pipelineBindPoint VK_PIPELINE_BIND_POINT_GRAPHICS

inputAttachmentCount 0

pInputAttachments NULL

colorAttachmentCount 1

pColorAttachments

pColorAttachments[0]

`attachment`	`0` (normal + specularity)
`layout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`

pResolveAttachments NULL

pDepthStencilAttachment

*pDepthStencilAttachment

`attachment`	`1` (depth)
`layout`	`VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL`

preserveAttachmentCount 0

pPreserveAttachments NULL

pSubpasses[1]

flags 0

pipelineBindPoint VK_PIPELINE_BIND_POINT_GRAPHICS

inputAttachmentCount 1

pInputAttachments

pInputAttachments[0]

`attachment`	`0` (normal + specularity)
`layout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`

colorAttachmentCount 2

pColorAttachments

pColorAttachments[0]

`attachment`	`2` (diffuse lighting)
`layout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`

pColorAttachments[1]

`attachment`	`3` (specular lighting)
`layout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`

pResolveAttachments NULL

pDepthStencilAttachment

*pDepthStencilAttachment

`attachment`	`1` (depth)
`layout`	`VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL`

preserveAttachmentCount 0

pPreserveAttachments NULL

pSubpasses[2]

flags 0

pipelineBindPoint VK_PIPELINE_BIND_POINT_GRAPHICS

inputAttachmentCount 2

pInputAttachments

pInputAttachments[0]

`attachment`	`2` (diffuse lighting)
`layout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`

pInputAttachments[1]

`attachment`	`3` (specular lighting)
`layout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`

colorAttachmentCount 1

pColorAttachments

pColorAttachments[0]

`attachment`	`4` (final output)
`layout`	`VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL`

pResolveAttachments NULL

pDepthStencilAttachment

*pDepthStencilAttachment

`attachment`	`1` (depth)
`layout`	`VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL`

preserveAttachmentCount 0

pPreserveAttachments NULL

Since all but the final output color attachment in this example are used only as intermediate values, they can be created with the VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT set, and be bound to memory allocated with VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT. Tiling hardware typically has limitations on the number and type of attachments which can be kept in flight concurrently, so despite this optimization, it is possible that implementations will have to spill intermediate results to main memory.

More complex arrangements of subpasses are possible. If an attachment is not used during a subpass, but is needed in previous and subsequent subpasses, the attachment should appear in the pPreserveAttachments array of the subpass. Implementations can change the order in which subpasses are evaluated (while preserving dependencies) in order to reduce the need for spilling. In the above example, attachment 0 is not preserved, and the implementation may use the same internal tile memory for both it and the final output attachment. It is also possible to use multi-sampling with these approaches, but this complicates the intermediate read operations and may make it more likely that tilers will have to spill to external memory.

Subpass dependencies

When multiple subpasses are in use, the driver needs to be told the relationship between them. A subpass can depend on operations which were submitted outside the current render pass, or be the source on which later rendering depends. Most commonly, the need is to ensure that the fragment shader from an earlier subpass has completed rendering (to the current tile, on a tiler) before the next subpass starts to try to read that data. An array of subpass dependencies - if there are any - is passed to VkRenderPassCreateInfo, defining a set of dependencies between "source" (the thing being waited on) and "destination" (the thing doing the waiting). Each subpass dependency is defined as follows:

`struct` `VkSubpassDependency`
`uint32_t`	`srcSubpass`	The index of the render pass being depended upon by dstSubpass
`uint32_t`	`dstSubpass`	The index of the render pass depending on srcSubpass
`VkPipelineStageFlags`	`srcStageMask`	What pipeline stage must have completed for the dependency
`VkPipelineStageFlags`	`dstStageMask`	What pipeline stage is waiting on the dependency
`VkAccessFlagBits`	`srcAccessMask`	What access scopes are influence the dependency
`VkAccessFlagBits`	`dstAccessMask`	What access scopes are waiting on the dependency
`VkDependencyFlags`	`dependencyFlag`	Other configuration about the dependency

Typically, for dependencies between fragment writes and fragment shader reads, we might expect the following settings:

`srcStageMask`	`VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT`	Fragment data has been written
`dstStageMask`	`VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT`	Don't start shading until data is available
`srcAccessMask`	`VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT`	Waiting for color data to be written
`dstAccessMask`	`VK_ACCESS_SHADER_READ_BIT`	Don't read things from the shader before ready
`dependencyFlag`	`VK_DEPENDENCY_BY_REGION_BIT`	Only need the current fragment (or tile) synchronized, not the whole framebuffer

In the cases of our deferred lighting example, we have three subpasses, and we have dependencies between the first and second and between the second and third. That is, we need to set the dependencyCount member of our VkRenderPassCreateInfo to 2, and set the pDependencies member of our VkRenderPassCreateInfo to point to the following array:

`pDependencies[0]`	`srcSubpass`	`0`
	`dstSubpass`	`1`
	`srcStageMask`	`VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT`
	`dstStageMask`	`VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT`
	`srcAccessMask`	`VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT`
	`dstAccessMask`	`VK_ACCESS_SHADER_READ_BIT`
	`dependencyFlag`	`VK_DEPENDENCY_BY_REGION_BIT`
`pDependencies[1]`	`srcSubpass`	`1`
	`dstSubpass`	`2`
	`srcStageMask`	`VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT`
	`dstStageMask`	`VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT`
	`srcAccessMask`	`VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT`
	`dstAccessMask`	`VK_ACCESS_SHADER_READ_BIT`
	`dependencyFlag`	`VK_DEPENDENCY_BY_REGION_BIT`

Using subpasses in a command buffer

When recording to a VkCommandBuffer, we described above that vkCmdBeginRenderPass() and vkCmdEndRenderPass() are used to wrap the render pass operations. After vkCmdBeginRenderPass() is called, subsequent commands are applied to the first subpass within the render pass.

To move operations to subsequent subpasses, vkCmdNextSubpass() should be called. Each call of this function moves operations to the next subpass index, in increasing order, until vkCmdEndRenderPass() is called. Synchronization between access to attachments described in subpass dependencies is handled automatically.

Using subpasses in shaders

In SPIR-V, the contents of an input attachment can be accessed with the OpImageRead operation, with an OpTypeImage that has a dim argument of SubpassData. The coordinate argument of the OpImageRead must be (0,0), and corresponds to accessing the input attachment at the current fragment location. When multi-sampling, the sample operand to OpImageRead can be used to access separate samples at the current fragment.

In GLSL, this functionality is exposed through the subpassLoad() function, with subpassInput types for the subpasses.

Summary

The Vulkan API acknowledges the fact that modern rendering technique may perform multiple passes over the same image data, and is designed to ensure that these approaches are explicitly and efficiently supported on modern graphics hardware. The unfortunate consequence of this expressivity is the complexity of the description and the verbosity of simple examples, although the overhead in a practical, optimized renderer should be less significant.

In Vulkan, the render pass is an explicit concept within which rendering operations execute. A VkFrameBuffer, with a list of associated attachments, is associated with the render pass when rendering work is recorded into a VkCommandBuffer. The render pass is divided into one or more subpasses, with explicitly-defined interactions between them. This explicit configuration VkRenderPass object can be shared between rendering operations, which can limit the impact on real-world, complex applications. Providing this additional information to a driver can allow significantly improved memory overhead, especially on tiled architectures, without the unpredictability of the heuristics applied to achieve good performance in more traditional APIs.

Additional reading

A simplified version of the content of this article may be found in a presentation on the subject at a UK developer event.

Introduction to Vulkan Render Passes

Render Passes

Subpasses

Creating a VkRenderPass

vkCreateRenderPass

VkRenderPassCreateInfo

VkAttachmentDescription

VkSubpassDescription

VkAttachmentReference

Example render pass (complete create info)

Creating a VkFrameBuffer

vkCreateFrameBuffer

VkFramebufferCreateInfo

Using a VkRenderPass

vkCmdBeginRenderPass

VkRenderPassBeginInfo

vkCmdEndRenderPass

Render passes and secondary command buffers

Beginning a secondary command buffer

VkCommandBufferBeginInfo::flags

VkCommandBufferBeginInfo::pInheritanceInfo

Invoking a secondary command buffer

Secondary command buffers inside a subpass

Destroying a VkRenderPass

Multi-sampling

Resolving an image outside a render pass

Multiple subpasses

Multiple attachments for multiple subpasses

Relating attachments to subpasses

Subpass dependencies

Using subpasses in a command buffer

Using subpasses in shaders

Summary

Additional reading

Manage Your Cookies

Essential Cookies

Analytical/Performance Cookies

Functionality Cookies

Advertising Cookies

Preferences Submitted

`VkCommandBufferBeginInfo::flags`

`VkCommandBufferBeginInfo::pInheritanceInfo`