Introduction to Vulkan Render Passes
Vulkan graphics rendering is organized into render passes and subpasses. This article provides an introduction to these concepts and how to use them in the Vulkan API. If you haven’t done so, it is recommended that you read the article on ‘GPU Framebuffer Memory’ before reading this article.
Render Passes
When a GPU renders a scene, it is configured with one or more render targets, or framebuffer attachments in Khronos terminology. The size and format of the attachments determine how graphics work is configured across the parallelism available on all modern GPUs. For example, on a tile-based renderer, the set of attachments is used to determine the way the image is divided into tiles. In Vulkan, a render pass is the set of attachments, the way they are used, and the rendering work that is performed using them. In a traditional API, a change to a new render pass might correspond to binding a new framebuffer.
Subpasses
During normal rendering, it is not possible for a fragment shader to access the attachments to which it is currently rendering: GPUs have optimized hardware for writing to the attachments, and accessing the attachment interferes with this. However, some common rendering techniques such as deferred shading rely on being able to access the result of previous rendering during shading. For a tile-based renderer, the results of previous rendering can efficiently stay on-chip if subsequent rendering operations are at the same resolution, and if only the data in the pixel currently being rendered is needed (accessing different pixels may require access to values outside the current tile, which breaks this optimization). In order to help optimize deferred shading on tile-based renderers, Vulkan splits the rendering operations of a render pass into subpasses. All subpasses in a render pass share the same resolution and tile arrangement, and as a result, they can access the results of previous subpass.
In Vulkan, a render pass consists of one or more subpasses; for simple rendering operations, there may be only a single subpass in a render pass.
Creating a VkRenderPass
In Vulkan, a render pass is described by an (opaque) VkRenderPass object. This provides a template that is used when beginning a render pass inside a command buffer. The render pass is used with a compatible VkFrameBuffer object, which represents the set of images that will be used as attachments during execution of the render pass.
vkCreateRenderPass
Like many driver objects in Vulkan, a VkRenderPass
object is created with a corresponding create function, VkCreateRenderPass()
:
VkResult vkCreateRenderPass()
| ||
---|---|---|
VkDevice
|
device
|
Logical device used for rendering (from vkCreateDevice )
|
const VkRenderPassCreateInfo*
|
pCreateInfo
|
Parameters for creation |
const VkAllocationCallbacks*
|
pAllocator
|
Host memory allocation callback (can be NULL )
|
VkRenderPass*
|
pRenderPass
|
Resulting render pass handle |
As with many Vulkan creation functions, most parameters are passed through a creation structure. This approach makes it more efficient to create multiple identical objects, and provides a way to support type-safe additional parameters through extensions.
Many creation methods in Vulkan offer a call-back for applications which wish to track host-side memory usage. While important for applications that wish to have precise control over resource allocation, and useful for debugging, in most cases this callback can be left as NULL
to rely on the driver's default memory allocation scheme.
As with other Vulkan creation functions, the function returns an error code if anything goes wrong - although more information may be available through validation layers if the problem is an application error. The newly-created render pass description is returned via the pRenderPass
pointer.
The interesting parameters are contained in the pCreateInfo
structure.
VkRenderPassCreateInfo
struct VkRenderPassCreateInfo()
| ||
---|---|---|
VkStructureType
|
sType
|
Used for type safety and extensions, must be VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO
|
const void*
|
pNext
|
Allows extensions to provide extra parameters, must be NULL if not needed by an extension
|
VkRenderPassCreateFlags
|
flags
|
Reserved for future use (should be 0 )
|
uint32_t
|
attachmentCount
|
Number of framebuffer attachments used in this render pass (across all subpasses) |
const VkAttachmentDescription*
|
pAttachments
|
Description of the attachments (array of size attachmentCount )
|
uint32_t
|
subpassCount
|
Number of subpasses |
const VkSubpassDescription*
|
pSubpasses
|
Description of subpasses (array of size subpassCount )
|
uint32_t
|
dependencyCount
|
Number of dependencies between subpass pairs |
const VkSubpassDependency*
|
pDependencies
|
Descriptions of dependencies between subpasses (array of size dependencyCount )
|
For the purposes of this article, we will begin with a simple rendering operation with only a single subpass (a render pass always consists of at least one subpass). In this case, subpassCount
can be 1
and dependencyCount
can be 0
(so pDependencies
can be NULL
- we'll come back to describe how else dependencies are used below).
VkAttachmentDescription
An attachment corresponds to a single Vulkan VkImageView
. A description of the attachment is provided to the render pass creation, which allows the render pass to be configured appropriately; the actual images to be used are provided when the render pass is used, via the VkFrameBuffer
. It is possible to associate multiple attachments with a render pass; these may be used for example as multiple render targets, or in separate subpasses. More commonly, a color framebuffer and a depth buffer are separate attachments in Vulkan. Therefore the pAttachments
member of VkRenderPassCreateInfo
points to an array of attachmentCount
elements.
struct VkAttachmentDescription
| ||
---|---|---|
VkAttachmentDescriptionFlags
|
flags
|
Used by extensions; can be 0
|
VkFormat
|
format
|
Image format of the attachment |
VkSampleCountFlagBits
|
samples
|
Number of samples in the attachment (used for multi-sampling) |
VkAttachmentLoadOp
|
loadOp
|
What should be done to access the attachment before rendering |
VkAttachmentStoreOp
|
storeOp
|
What should be done with the attachment after rendering |
VkAttachmentLoadOp
|
stencilLoadOp
|
In the case of a depth/stencil attachment, how to access the stencil contents before rendering |
VkAttachmentStoreOp
|
stencilStoreOp
|
In the case of a depth/stencil attachment, what should be done with the stencil after rendering |
VkImageLayout
|
initialLayout
|
Layout of the attachment when first used in the render pass |
VkImageLayout
|
finalLayout
|
Layout of the attachment after use in the render pass |
For a simple rendering operation, we might decide to create two attachments:
Color attachment (pAttachments[0])
| Depth attachment (pAttachments[1])
| |
---|---|---|
flags
| VK_IMAGE_FORMAT_B8G8R8A8_UNORM
| VK_IMAGE_FORMAT_D16_UNORM
|
format
| 0
| 0
|
samples
| 1
| 1
|
loadOp
| VK_ATTACHMENT_LOAD_OP_DONT_CARE
| VK_ATTACHMENT_LOAD_OP_CLEAR
|
storeOp
| VK_ATTACHMENT_STORE_OP_STORE
| VK_ATTACHMENT_STORE_OP_DONT_CARE
|
stencilLoadOp
| VK_ATTACHMENT_LOAD_OP_DONT_CARE
| VK_ATTACHMENT_LOAD_OP_DONT_CARE
|
stencilStoreOp
| VK_ATTACHMENT_STORE_OP_DONT_CARE
| VK_ATTACHMENT_STORE_OP_DONT_CARE
|
initialLayout
| VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
| VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
|
finalLayout
| VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
| VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
|
Stencil is special because the combined depth/stencil attachment is a single attachment. Here, we aren't using stencil, so the stencilLoadOp
and stencilStoreOp
are irrelevant. Note that a "DONT_CARE
" store op doesn’t guarantee not to touch the memory, because while they may not access memory on a tile-based renderer, an immediate-mode renderer may actually use memory to implement them during rendering; similarly, a "DONT_CARE
" load op avoids the need to read the previous frame buffer contents in a tiler, but also avoids the need to perform an explicit clear of the memory which may be costly for an immediate-mode renderer.
Note: We're assuming that the images have been transitioned from
VK_IMAGE_LAYOUT_UNDEFINED
(on creation) to
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
and
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
before we use them, for example by using a VkImageMemoryBarrier
.
There is a complication to this mechanism to be aware of: Consider the example of drawing a scene with two render passes, the second of which uses the results of the first (written with STORE_OP_STORE
) as an input (LOAD_OP_LOAD
) input attachment but does not write to it. If this input attachment is still wanted after the second render pass, it must still have STORE_OP_STORE
associated with it: using STORE_OP_DONT_CARE
causes some hardware to perform an optimization and discard the attachment content after the second render pass, even though the first render pass used STORE_OP_STORE
. You may think of this as a cache discard of the output of the first render pass, where the cache line was previously considered to be valid. This is potentially a good performance enhancement, but it does mean that users need to be prepared for surprising behavior!
VkSubpassDescription
struct VkSubpassDescription
| ||
---|---|---|
VkSubpassDescriptionFlags
|
flags
|
Reserved for future use, must be 0
|
VkPipelineBindPoint
|
pipelineBindPoint
|
Should be VK_PIPELINE_BIND_POINT_GRAPHICS
|
uint32_t
|
inputAttachmentCount
|
Number of input attachments to this subpass |
const VkAttachmentReference*
|
pInputAttachments
|
Array of input attachments read by this subpass (array of size inputAttachmentCount )
|
uint32_t
|
colorAttachmentCount
|
Number of output attachments for this subpass |
const VkAttachmentReference*
|
pColorAttachments
|
Array of color attachments written to by this subpass (array of size colorAttachmentCount )
|
const VkAttachmentReference*
|
pResolveAttachments
|
Attachments for antialiasing (NULL or array of size colorAttachmentCount )
|
const VkAttachmentReference*
|
pDepthStencilAttachment
|
One attachment reference describing the depth/stencil attachment |
uint32_t
|
preserveAttachmentCount
|
Number of attachments preserved across this subpass |
const uint32_t*
|
pPreserveAttachments
|
Array of attachment indices preserved across this subpass, of size preserveAttachmentCount , or NULL
|
In our first example, we only have a single subpass, and we'll render to it directly. We won't use pResolveAttachments
(so we can set it to NULL
) and we do not need to preserve any attachments (so preserveAttachmentCount
can be 0 and pPreserveAttachments
can be NULL
). The fields we don't need now will be described in more detail below, but in our simple case we can configure the (single) subpass. Before we get there, we have one more level of Vulkan object to worry about:
VkAttachmentReference
struct VkAttachmentReference
| ||
---|---|---|
uint32_t
|
attachment
|
Id of the attachment (index into VkRenderpassCreateInfo::pAttachments )
|
VkImageLayout
|
layout
|
Layout of the attachment during the subpass |
In the example we're walking through, we have two attachments in total:
pColorAttachments[0]
| pDepthStencilAttachment
| |
---|---|---|
attachment
| 0
| 1
|
layout
| VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
| VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
|
The layout can change between subpasses of a render pass, hence the need to describe it on a per-subpass basis.
Example render pass (complete create info)
In summary, in the simple render pass we've been using as an example, we have the following two attachments:
Attachment | At start of render pass | At end of render pass |
---|---|---|
Color attachmentVK_IMAGE_FORMAT_B8G8R8A8_UNORM
|
VK_ATTACHMENT_LOAD_OP_DONT_CARE (All pixels will be overwritten) |
VK_ATTACHMENT_STORE_OP_STORE (Write pixels to memory after rendering) |
Depth(/stencil) attachmentVK_IMAGE_FORMAT_D16_UNORM
|
VK_ATTACHMENT_LOAD_OP_CLEAR (Don't want previous depth values) |
VK_ATTACHMENT_STORE_OP_DONT_CARE (Don't need depth after rendering) |
In total, then, our simple render pass looks like this:
*pCreateInfo
| |||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
sType
|
VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO
|
||||||||||||||||||||||||||||||||||||||||
pNext
|
NULL
|
||||||||||||||||||||||||||||||||||||||||
flags
|
0
|
||||||||||||||||||||||||||||||||||||||||
attachmentCount
|
2
|
||||||||||||||||||||||||||||||||||||||||
pAttachments
|
|
||||||||||||||||||||||||||||||||||||||||
subpassCount
|
1
|
||||||||||||||||||||||||||||||||||||||||
pSubpasses
|
|
||||||||||||||||||||||||||||||||||||||||
dependencyCount
|
0
|
||||||||||||||||||||||||||||||||||||||||
pDependencies
|
NULL
|
Fortunately, since render passes can be reused, you may not need to do this too often. We'll see later the flexibility exposed by this mechanism.
Creating a VkFrameBuffer
A VkRenderPass
is a template for how a render pass will be used. When we use the render pass, we need to provide the actual images which are to be used for rendering. The mechanism containing references to the actual images is a VkFramebuffer
, which contains all the attachments used by the render pass.
vkCreateFrameBuffer
As with vkCreateRenderPass
for a vkRenderPass
, a VkFramebuffer
is created with vkCreateFramebuffer()
:
VkResult vkCreateFramebuffer()
| ||
---|---|---|
VkDevice
|
device
|
Logical device used for rendering (from vkCreateDevice )
|
const VkFramebufferCreateInfo*
|
pCreateInfo
|
Parameters for creation |
const VkAllocationCallbacks*
|
pAllocator
|
Host memory allocation callback (can be NULL )
|
VkFramebuffer*
|
pFramebuffer
|
Resulting frame buffer handle |
Again, to allow extensibility and reusability, the parameters are passed through a pCreateInfo
pointer. (Yes, here we go again!)
VkFramebufferCreateInfo
struct VkFramebufferCreateInfo
| ||
---|---|---|
VkStructureType
|
sType
|
Used for type safety and extensions, must be VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO
|
const void*
|
pNext
|
Used for extensions; NULL if no extensions are used to add parameters
|
VkFramebufferCreateFlags
|
flags
|
Reserved for future use, must be 0
|
VkRenderPass
|
renderPass
|
The render pass (or a compatible one) with which the framebuffer will be used |
uint32_t
|
attachmentCount
|
Number of attachments used in the render pass |
const VkImageView*
|
pAttachments
|
Array of image views, which refer to actual images; array is of size attachmentCount
|
uint32_t
|
width
|
Width of framebuffer |
uint32_t
|
height
|
Height of framebuffer |
uint32_t
|
layers
|
Number of layers in framebuffer |
Note that all the attachments used in the framebuffer are of the same width, height and number of layers - but that this is independent of the render pass, so the same render pass can be used with framebuffers of different sizes.
For our simple example, we need two image views: one referring to a VK_IMAGE_FORMAT_B8G8R8A8_UNORM
image and one referring to a VK_IMAGE_FORMAT_D16_UNORM
image. For efficiency, since we typically don't need the depth buffer to persist after rendering, the D16
image can be created with VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT
in its usage flags, and can be bound to memory with the VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT
set. In this case, a tile-based renderer may be able to avoid allocating any memory for the depth buffer, since it is only used for rendering operations which occur on-chip.
Using a VkRenderPass
Now that we have a VkRenderPass
and a VkFramebuffer
, we can use them in the rendering process.
vkCmdBeginRenderPass
To begin a render pass instance in a command buffer, call vkCmdBeginRenderPass()
:
void vkCmdBeginRenderPass()
| ||
---|---|---|
VkCommandBuffer
|
commandBuffer
|
Command buffer into which to insert the render pass |
const VkRenderPassBeginInfo*
|
pRenderPassBegin
|
Arguments |
VkSubpassContents
|
contents
|
Indication whether secondary command buffers are in use (see below) |
A render pass can only begin (and end) in a primary command buffer.
Once a render pass has begun on a command buffer, subsequent commands submitted to that command buffer will execute within the first (and in the case of our example, only) subpass of the render pass instance. In our simple case, we could use just the one command buffer and record rendering commands directly into it. In this case, contents
should be VK_SUBPASS_CONTENTS_INLINE
.
VkRenderPassBeginInfo
As with many functions, Vulkan uses an info structure for reusability and extensibility.
struct VkRenderPassBeginInfo
| ||
---|---|---|
VkStructureType
|
sType
|
Must be VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO
|
void*
|
pNext
|
Used for extensions; must be NULL if no extension is used which adds to this struct
|
VkRenderPass
|
renderPass
|
The render pass description created by vkCreateRenderPass()
|
VkFramebuffer
|
framebuffer
|
The framebuffer containing the images for rendering, created by vkCreateFramebuffer()
|
VkRect2D
|
renderArea
|
Bounds of the rectangular area affected by the render pass |
uint32_t
|
clearValueCount
|
Number of clear values |
const VkClearValue*
|
pClearValues
|
Values used for clearing attachments (array of size clearValueCount)
|
renderArea
is used for rendering a subset of the framebuffer, for example for partial updates of dirty areas of the screen. The application is responsible for clipping rendering to this area, and rendering to less than the entire screen can invoke a performance hit if the area being drawn is not aligned as can be determined by vkGetRenderAreaGranularity()
- which for a tile-based renderer might be expected to correspond to the alignment of the tile grid. For most purposes, the render area can be set to the full width and height of the framebuffer.
pClearValues
is indexed by the attachment number and used if the attachment has a loadOp
of VK_ATTACHMENT_LOAD_OP_CLEAR
. In the case of our simple example, we clear the depth attachment at the start of rendering, and the depth attachment is at index 1
in our attachment array - so we need pClearValues[1]
to represent the value to which we want to clear the depth buffer.
union
| VkClearValue
| |
---|---|---|
VkClearColorValue
| color
| Value used when clearing color buffers |
VkClearDepthStencilValue
| depthStencil
| Value used when clearing depth/stencil buffers |
VkClearColorValue
is a union of arrays of various channel types, with the format chosen by the attachment format being cleared. VkClearDepthStencilValue
always has a float
depth value, and a uint32_t
stencil
value. For our simple example, only the float
depth
value is relevant, and should be set to the depth value we want for our rendering.
vkCmdEndRenderPass
After the last rendering commands for the render pass instance have been submitted to the command buffer, the application must end the render pass instance:
void vkCmdEndRenderPass()
|
|
---|---|
VkCommandBuffer
|
commandBuffer
|
In this example, if we have been recording commands direct to the primary command buffer, the command buffer looks like this:
Command buffer | |||||||||
---|---|---|---|---|---|---|---|---|---|
|
Multiple render passes can be inserted into the same command buffer, so long as one is ended before the next is begun. A render pass must both begin and end within a single primary command buffer (that is, a render pass cannot span multiple primary command buffers), so parallelism in command buffer building in this approach relies on parallel building of multiple render passes. In many rendering frameworks, this level of parallelism is still enough to allow the CPU cores to stay busy, and simplifies the task of resource management and state tracking.
Render passes and secondary command buffers
In some rendering scenarios, a large amount of work needs to be performed within a single rendering pass. For example, a large number of characters may be managed and animated by their own threads, but all appear on screen at once. This complicates the task of optimizing rendering order and minimizing state changes, but can still be necessary in some highly-parallel systems.
Vulkan's solution to this is to make use of secondary command buffers, which (for graphics rendering) are executed inside a render pass. A secondary command buffer is created by vkAllocateCommandBuffers()
using a VkCommandBufferAllocateInfo
with a level member of VK_COMMAND_BUFFER_LEVEL_SECONDARY
.
Beginning a secondary command buffer
For graphics, the VkCommandBufferBeginInfo
argument of vkBeginCommandBuffer
when creating a secondary command buffer must have a valid pInheritanceInfo
field:
VkResult vkBeginCommandBuffer()
|
||
---|---|---|
VkCommandBuffer
|
commandBuffer
|
Command buffer to start using |
const VkCommandBufferBeginInfo*
|
pBeginInfo
|
Arguments |
struct VkCommandBufferBeginInfo
|
||
---|---|---|
VkStructureType
|
sType
|
VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO
|
const void*
|
pNext
|
For extensions, should be NULL
|
VkCommandBufferUsageFlags
|
flags
|
Usage flags (see below) |
const VkCommandBufferInheritanceInfo*
|
pInheritanceInfo
|
Inherited info (NULL for a primary command buffer, must be valid for a secondary)
|
VkCommandBufferBeginInfo::flags
flags
has the following bit values:
enum VkCommandBufferUsageFlagBits
|
|
---|---|
VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT
|
Set if the command buffer will only ever be used once (potential optimization) |
VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT
|
Must be set for secondary graphics command buffers |
VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT
|
Set if the command buffer will be submitted more than once concurrently (set only if needed when reusing command buffers) |
Secondary command buffers can also be used for compute, and in this case their operations do not fall within a render pass. For graphics, we must set RENDER_PASS_CONTINUE_BIT
, may be able to set ONE_TIME_SUBMIT_BIT
, and may need to set SIMULTANEOUS_USE_BIT
. These options affect the way secondary command buffers are implemented - for example, some may make the difference between whether a separate copy must be made of a secondary command buffer before use, or whether the existing copy may be used indirectly.
VkCommandBufferBeginInfo::pInheritanceInfo
pInheritanceInfo
is used to allow the secondary command buffer to be configured correctly for the render pass:
struct VkCommandBufferInheritanceInfo
|
||
---|---|---|
VkStructureType
|
sType
|
VK_STRUCTURE_TYPE_COMMAND_BUFFER_INHERITANCE_INFO
|
void*
|
pNext
|
Used for extensions, and should be NULL unless needed
|
VkRenderPass
|
renderPass
|
The render pass (or a compatible one) which will be active when the command buffer is used |
uint32_t
|
subpass
|
The subpass of the render pass that this command buffer will be used in |
VkFramebuffer
|
framebuffer
|
The framebuffer to be used (if known), or VK_NULL_HANDLE if unknown
|
VkBool32
|
occlusionQueryEnable
|
Should be VK_TRUE if the primary command buffer might have a query active, and VK_FALSE otherwise
|
VkQueryControlFlags
|
queryFlags
|
Queries that can be used in the primary command buffer when this secondary command buffer executes; 0 if unused
|
VkQueryPipelineStatisticFlags
|
pipelineStatistics
|
Set of pipeline statistics that can be counted by a query; 0 if pipeline statistics queries are disabled
|
If the framebuffer is known at the time the command buffer is recorded (for example, if the same framebuffer is always used for generating a shadow map) then providing an explicit framebuffer may be more efficient; otherwise (if the framebuffer
argument is VK_NULL_HANDLE
) the framebuffer is determined by the render pass in the primary command buffer, which allows secondary command buffers to be reused with different (compatible) framebuffers determined by the primary command buffer that is using the secondary command buffer.
Rendering commands are recorded into the secondary command buffer in the same way as for a primary command buffer, and having multiple secondary command buffers allows multiple threads to record rendering commands concurrently without need for synchronization.
Invoking a secondary command buffer
When the secondary command buffers have been recorded, they can be invoked in a "parent" primary command buffer with vkCmdExecuteCommands()
:
void vkCmdExecuteCommands()
|
||
---|---|---|
VkCommandBuffer
|
commandBuffer
|
Primary command buffer to be recorded into |
uint32_t
|
commandBufferCount
|
Number of secondary command buffers to submit |
const VkCommandBuffer*
|
pCommandBuffers
|
Array of commandBufferCount secondary command buffers to execute (in increasing array index order)
|
Secondary command buffers inside a subpass
Using the above techniques, work may be distributed as in the following example:
Thread 1 | Record secondary command buffer A (frame 2) |
Record secondary command buffer A (frame 3) |
... | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Thread 2 | Record secondary command buffer B (frame 2) |
Record secondary command buffer B (frame 3) |
... | ||||||||||||||||||
Thread 3 | Record secondary command buffer C (frame 2) |
Record secondary command buffer C (frame 3) |
... | ||||||||||||||||||
Thread 4 |
|
|
|
Recording the primary command buffer should be faster than recording a significant amount of work into the secondary command buffers. However, there is typically some cost - especially for implementations which require the secondary command buffers to be copied into the primary command buffer. This approach also assumes that the secondary command buffers are at least double-buffered, and that the threads are suitably synchronized.
Since primary command buffers can be recorded in parallel and vkQueueSubmit()
allows multiple command buffers to be submitted efficiently, exposing parallelism across secondary command buffers is not necessary in many applications, so this technique should be matched to the rendering work load. Note that it can also be possible to re-use secondary command buffers, although again this may carry some driver overhead (hopefully less than recording anew). Command buffer reuse should be used selectively, allowing for other optimizations such as frustum culling.
Destroying a VkRenderPass
Once a render pass is no longer needed, it can be deleted as follows:
void vkDestroyRenderPass()
|
|
---|---|
VkDevice
|
device
|
VkRenderPass
|
renderPass
|
const VkAllocationCallbacks*
|
pAllocator
|
Note that it is up to the user to ensure that nothing is still rendering which referred to the render pass at the point vkDestroyRenderPass()
is called - for example by using vkWaitForFences()
with a VkFence
handle previously passed to vkQueueSubmit()
.
Multi-sampling
Tiled rendering also provides a low-bandwidth way to implement antialiasing: we can render to the tiles normally, but average pixel values as part of the operation of writing the tile memory; this downsampling step is known as "resolving" the tile buffer.
Vulkan has the concept of a number of samples
associated with an image. In a simple implementation the image might have several values stored at each pixel location; more complex implementations have compressed schemes. Therefore an image has a number of samples
associated with it at image creation time. For multi-sampled rendering in Vulkan, the multi-sampled image is treated separately from the final single-sampled image; this provides separate control over what values need to reach memory, since - like the depth buffer - the multi-sampled image may only need to be accessed during the processing of a tile. For this reason, if the multi-sampled image is not required after the render pass, it can be created with VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT
and bound to an allocation created with VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT
, as described above for depth buffers. The multi-sampled attachment storeOp
can then be set to VK_ATTACHMENT_STORE_OP_DONT_CARE
in the VkAttachmentDescription
, so that (at least on tiled renderers) the full multi-sampled attachment does not need to be written to memory, which can save a lot of bandwidth.
To control multi-sampling, the index of an attached image view (in the pAttachments
array of the VkFramebufferCreateInfo
) with more than one sample should be used in the VkSubpassDescription
's pColorAttachment
array, and the index of a corresponding image view with exactly one sample should be placed in the corresponding index of the pResolveAttachments
array; the multi-sampled image is then resolved to the single-sampled image at the end of the current sub-pass. To use pResolveAttachments
for some attachments but not others, the entry in the pResolveAttachments
array can be set to VK_ATTACHMENT_UNUSED
to avoid resolving the corresponding multi-sampled image.
For example, if we had three multi-sampled attachments and only wanted the first and third to be resolved to single-sampled form, the VkSubpassDescription
may have the following entries:
Index | pColorAttachments[]
| pResolveAttachment[]
|
---|---|---|
0 | Index of first multi-sampled attachment | Index of first single-sampled attachment |
1 | Index of second multi-sampled attachment | VK_ATTACHMENT_UNUSED
|
2 | Index of third multi-sampled attachment | Index of second single-sampled attachment |
Remember that if we don't want to resolve any attachments in the subpass, pResolveAttachments
can simply be set to NULL
. Multi-sampled images can also be resolved to a single-sample image with vkCmdResolveImage()
- but this happens outside the render pass and requires a separate access to memory, so it is a much less efficient solution if it can be avoided. Note that you can write both the resolved and multi-sampled images out of the same render pass by setting the storeOp
of both attachments to VK_ATTACHMENT_STORE_OP_STORE
.
Resolving an image outside a render pass
On some occasions, the attachment containing all samples may need to be written to memory for later processing (for example, use in a later render pass as an input attachment). It is possible to resolve a multi-sampled image to a single-sampled one without using it as an attachment in a render pass using the vkCmdResolveImage()
command.
However, please bear in mind that this should be the exception to normal rendering, not the default approach. Writing out the multi-sampled attachment to off-chip memory (rather than using VK_ATTACHMENT_STORE_OP_DONT_CARE
) has a high bandwidth cost, and vkCmdResolveImage()
itself must then read all this data back, process it, and write the single-sampled output. It is very much more efficient to perform resolve operations inside a render pass where possible.
Multiple subpasses
The render pass mechanism described so far is quite verbose for use with a single subpass. The reason for this is the flexibility that it provides when when using multiple subpasses.
Some rendering techniques, notably deferred shading and deferred lighting, traverse the scene geometry once to create a frame buffer, then use the rendering results in the framebuffer for further rendering operations. The same can be said for, for example, applying tone mapping effects after rendering. In a tiled renderer, because each of these operations requires access only to the current pixel and not the entire framebuffer, all of these operations can be performed consecutively on a per-tile basis, avoiding the need to write intermediate values out to memory. This can provide a significant bandwidth (and therefore power and performance) improvement. There is a graphical example of how deferred shading is evaluated on a tiler towards then end of the Understanding Tiling article.
Note that because the render area size is defined by the width
and height
fields of the VkFrameBufferCreateInfo
object, the render area of each attachment is effectively the same size, and this is true for all subpasses in a render pass. If a rendering technique requires reading values outside the current fragment area (which on a tiler would mean accessing rendered data outside the currently-rendering tile), separate render passes must be used.
Taking the example of deferred lighting, we might render the scene in three "subpasses":
-
The first subpass renders the geometry and stores the depth, normal vector and specular spread function.
-
The second subpass renders each light's bounds, accumulating a specular and diffuse color for each light that is calculated with the position, normal and specular spread function from the first subpass.
-
Finally, the scene geometry is processed again with conventional forward shading, picking up the light contributions from the results of the second subpass.
Since the shading in the first subpass is highly simplistic, the shader run-time cost can be significantly reduced in this approach, although the degree of shader parallelism in the final subpass may still depend on fragment coverage. The related deferred shading technique can allow for better shader parallelism at the cost of reduced flexibility and increasing intermediate storage requirements.
Multiple attachments for multiple subpasses
In our deferred lighting example, the depth buffer is used in all three subpasses; it should only be updated by the first, but the lighting subpass needs the depth attachment both to provide an accurate bounds for a light and to calculate the shading position in world space, and the final rendering pass can inherit the depth buffer to avoid unnecessary overdraw.
In this case, our render pass might use the following attachments:
ID | Field | Value | Notes | |
---|---|---|---|---|
0 | flags
|
0
|
Reserved | |
samples
|
1
|
Single-sampled | ||
format
|
VK_IMAGE_FORMAT_B8G8R8A8_UNORM
|
loadOp
|
VK_ATTACHMENT_LOAD_OP_DONT_CARE
|
Assuming this will be completely overwritten |
storeOp
|
VK_ATTACHMENT_STORE_OP_DONT_CARE
|
Intermediate storage (not written) | ||
stencilLoadOp
|
VK_ATTACHMENT_LOAD_OP_DONT_CARE
|
Unused | ||
stencilStoreOp
|
VK_ATTACHMENT_STORE_OP_DONT_CARE
|
Unused | ||
initialLayout
|
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
|
Rendering to it | ||
finalLayout
|
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
|
Rendering to it | ||
1 | flags
|
0
|
Reserved | |
samples
|
1
|
Single-sampled | ||
format
|
VK_IMAGE_FORMAT_D16_UNORM
|
Depth | ||
loadOp
|
VK_ATTACHMENT_LOAD_OP_CLEAR
|
Need empty depth buffer before use | ||
storeOp
|
VK_ATTACHMENT_STORE_OP_DONT_CARE
|
Intermediate storage (not written) | ||
stencilLoadOp
|
VK_ATTACHMENT_LOAD_OP_DONT_CARE
|
Unused | ||
stencilStoreOp
|
VK_ATTACHMENT_STORE_OP_DONT_CARE
|
Unused | ||
initialLayout
|
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
|
Rendering to it | ||
finalLayout
|
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
|
Rendering to it | ||
2 | flags
|
0
|
Reserved | |
samples
|
1
|
Single-sampled | ||
format
|
VK_IMAGE_FORMAT_B8G8R8A8_UNORM
|
Accumulated diffuse lighting contribution | ||
loadOp
|
VK_ATTACHMENT_LOAD_OP_CLEAR
|
Accumulating, so start at 0 | ||
storeOp
|
VK_ATTACHMENT_STORE_OP_DONT_CARE
|
Intermediate storage (not written) | ||
stencilLoadOp
|
VK_ATTACHMENT_LOAD_OP_DONT_CARE
|
Unused | ||
stencilStoreOp
|
VK_ATTACHMENT_STORE_OP_DONT_CARE
|
Unused | ||
initialLayout
|
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
|
Rendering to it | ||
finalLayout
|
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
|
Rendering to it | ||
3 | flags
|
0
|
Reserved | |
samples
|
1
|
Single-sampled | ||
format
|
VK_IMAGE_FORMAT_B8G8R8A8_UNORM
|
Accumulated specular lighting contribution | ||
loadOp
|
VK_ATTACHMENT_LOAD_OP_CLEAR
|
Accumulating, so start with 0 | ||
storeOp
|
VK_ATTACHMENT_STORE_OP_DONT_CARE
|
Intermediate storage (not written) | ||
stencilLoadOp
|
VK_ATTACHMENT_LOAD_OP_DONT_CARE
|
Unused | ||
stencilStoreOp
|
VK_ATTACHMENT_STORE_OP_DONT_CARE
|
Unused | ||
initialLayout
|
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
|
Rendering to it | ||
finalLayout
|
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
|
Rendering to it | ||
4 | flags
|
0
|
Reserved | |
samples
|
1
|
Single-sampled | ||
format
|
VK_IMAGE_FORMAT_B8G8R8A8_UNORM
|
Final output of rendering | ||
loadOp
|
VK_ATTACHMENT_LOAD_OP_DONT_CARE
|
Assuming rendering the whole frame | ||
storeOp
|
VK_ATTACHMENT_STORE_OP_STORE
|
Write output of rendering | ||
stencilLoadOp
|
VK_ATTACHMENT_LOAD_OP_DONT_CARE
|
Unused | ||
stencilStoreOp
|
VK_ATTACHMENT_STORE_OP_DONT_CARE
|
Unused | ||
initialLayout
|
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
|
Rendering to it | ||
finalLayout
|
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
|
Rendering to it |
That is:
-
Attachment 0 holds the surface normal and specular factor output by the first subpass, and used by the second subpass.
-
Attachment 1 holds the depth buffer for the scene, and applies to all three subpasses.
-
Attachment 2 holds the diffuse contributions from light sources output by the second subpass and read by the third.
-
Attachment 3 holds the specular contributions from light sources output by the second subpass and read by the third.
-
Attachment 4 holds the final result of rendering generated by the third subpass.
Relating attachments to subpasses
To associate the way these attachments are used with each subpass, we need a more complex array of VkSubpassDescription
objects to pass to the pSubpasses
member of our VkRenderPassCreateInfo
object:
pSubpasses[0]
|
|
||||||||||||||||||||||||||||||||||||||||||||
pSubpasses[1]
|
|
||||||||||||||||||||||||||||||||||||||||||||
pSubpasses[2]
|
|
Since all but the final output color attachment in this example are used only as intermediate values, they can be created with the VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT
set, and be bound to memory allocated with VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT
. Tiling hardware typically has limitations on the number and type of attachments which can be kept in flight concurrently, so despite this optimization, it is possible that implementations will have to spill intermediate results to main memory.
More complex arrangements of subpasses are possible. If an attachment is not used during a subpass, but is needed in previous and subsequent subpasses, the attachment should appear in the pPreserveAttachments
array of the subpass. Implementations can change the order in which subpasses are evaluated (while preserving dependencies) in order to reduce the need for spilling. In the above example, attachment 0 is not preserved, and the implementation may use the same internal tile memory for both it and the final output attachment. It is also possible to use multi-sampling with these approaches, but this complicates the intermediate read operations and may make it more likely that tilers will have to spill to external memory.
Subpass dependencies
When multiple subpasses are in use, the driver needs to be told the relationship between them. A subpass can depend on operations which were submitted outside the current render pass, or be the source on which later rendering depends. Most commonly, the need is to ensure that the fragment shader from an earlier subpass has completed rendering (to the current tile, on a tiler) before the next subpass starts to try to read that data. An array of subpass dependencies - if there are any - is passed to VkRenderPassCreateInfo
, defining a set of dependencies between "source" (the thing being waited on) and "destination" (the thing doing the waiting). Each subpass dependency is defined as follows:
struct VkSubpassDependency
|
||
---|---|---|
uint32_t
|
srcSubpass
|
The index of the render pass being depended upon by dstSubpass |
uint32_t
|
dstSubpass
|
The index of the render pass depending on srcSubpass |
VkPipelineStageFlags
|
srcStageMask
|
What pipeline stage must have completed for the dependency |
VkPipelineStageFlags
|
dstStageMask
|
What pipeline stage is waiting on the dependency |
VkAccessFlagBits
|
srcAccessMask
|
What access scopes are influence the dependency |
VkAccessFlagBits
|
dstAccessMask
|
What access scopes are waiting on the dependency |
VkDependencyFlags
|
dependencyFlag
|
Other configuration about the dependency |
Typically, for dependencies between fragment writes and fragment shader reads, we might expect the following settings:
srcStageMask
|
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT
|
Fragment data has been written |
dstStageMask
|
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT
|
Don't start shading until data is available |
srcAccessMask
|
VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT
|
Waiting for color data to be written |
dstAccessMask
|
VK_ACCESS_SHADER_READ_BIT
|
Don't read things from the shader before ready |
dependencyFlag
|
VK_DEPENDENCY_BY_REGION_BIT
|
Only need the current fragment (or tile) synchronized, not the whole framebuffer |
In the cases of our deferred lighting example, we have three subpasses, and we have dependencies between the first and second and between the second and third. That is, we need to set the dependencyCount
member of our VkRenderPassCreateInfo
to 2, and set the pDependencies
member of our VkRenderPassCreateInfo
to point to the following array:
pDependencies[0]
|
srcSubpass
|
0
|
dstSubpass
|
1
|
|
srcStageMask
|
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT
|
|
dstStageMask
|
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT
|
|
srcAccessMask
|
VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT
|
|
dstAccessMask
|
VK_ACCESS_SHADER_READ_BIT
|
|
dependencyFlag
|
VK_DEPENDENCY_BY_REGION_BIT
|
|
pDependencies[1]
|
srcSubpass
|
1
|
dstSubpass
|
2
|
|
srcStageMask
|
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT
|
|
dstStageMask
|
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT
|
|
srcAccessMask
|
VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT
|
|
dstAccessMask
|
VK_ACCESS_SHADER_READ_BIT
|
|
dependencyFlag
|
VK_DEPENDENCY_BY_REGION_BIT
|
Using subpasses in a command buffer
When recording to a VkCommandBuffer
, we described above that vkCmdBeginRenderPass()
and vkCmdEndRenderPass()
are used to wrap the render pass operations. After vkCmdBeginRenderPass()
is called, subsequent commands are applied to the first subpass within the render pass.
To move operations to subsequent subpasses, vkCmdNextSubpass()
should be called. Each call of this function moves operations to the next subpass index, in increasing order, until vkCmdEndRenderPass()
is called. Synchronization between access to attachments described in subpass dependencies is handled automatically.
Using subpasses in shaders
In SPIR-V, the contents of an input attachment can be accessed with the OpImageRead
operation, with an OpTypeImage
that has a dim argument of SubpassData
. The coordinate argument of the OpImageRead
must be (0,0), and corresponds to accessing the input attachment at the current fragment location. When multi-sampling, the sample operand to OpImageRead
can be used to access separate samples at the current fragment.
In GLSL, this functionality is exposed through the subpassLoad()
function, with subpassInput
types for the subpasses.
Summary
The Vulkan API acknowledges the fact that modern rendering technique may perform multiple passes over the same image data, and is designed to ensure that these approaches are explicitly and efficiently supported on modern graphics hardware. The unfortunate consequence of this expressivity is the complexity of the description and the verbosity of simple examples, although the overhead in a practical, optimized renderer should be less significant.
In Vulkan, the render pass is an explicit concept within which rendering operations execute. A VkFrameBuffer
, with a list of associated attachments, is associated with the render pass when rendering work is recorded into a VkCommandBuffer
. The render pass is divided into one or more subpasses, with explicitly-defined interactions between them. This explicit configuration VkRenderPass
object can be shared between rendering operations, which can limit the impact on real-world, complex applications. Providing this additional information to a driver can allow significantly improved memory overhead, especially on tiled architectures, without the unpredictability of the heuristics applied to achieve good performance in more traditional APIs.
Additional reading
A simplified version of the content of this article may be found in a presentation on the subject at a UK developer event.