Step 1: Testing out the base sample app

The base sample app uses the following classic simple approach for performance optimization:

  • Pipeline barriers are submitted with full synchronization

  • Render passes use complete layouts and dependencies

  • Single subpass in every render pass

Build and run base sample app

  1. To build and launch the base sample app given, select the build configuration 1-BASE in Visual Studio.

  2. Press Launch showcase or press Ctrl + F5. Then, check the device for the running application.

    Alternatively, run the sample app using the command prompt with the following command:

  3. Check the performance of the application using the GPU Watch


The result of the GPU Watch is that the performance is poor with only 8 FPS.

Step 2: Optimization using Barriers and Render Pass

Barriers are needed to synchronize the GPU resource usage. On this section, try to modify the pipeline barriers to minimize the wait and improve the render pass logic. The shading passes in the showcase or sample app uses geometry textures which are generated at the previous step as seen in the figure below. Before using textures, the shading pass has to wait for all the writes to finish. The transition of resources between the textures and the shading pass is the barrier which we are going to focus on this section.

The base sample app uses full synchronization as a classic method for its optimization. This forces further stages to wait before execution. However, what we wanted is to reduce the waiting time by using more relaxed barriers as seen below.

Additionally, base demo has separate render pass for every light source. The better approach is to use single render pass for all light source at once. Starting and stopping render pass has performance impact.

Implementing Barriers

  1. Build and run 2-BARRIERS from the project file in Visual Studio.

    Alternatively, run the sample app using the command prompt with the following command: cmake\build-step2.bat

  2. Check the changed parts of the code which are wrapped with ENABLE_FAST_BARRIERS macro.

    #elif defined(ENABLE_FAST_BARRIERS)
         VkPiqelineStageFlags SourceStages = VK_PIPELINE_STAGE_TOP_PIPE_BIT ;
         VkPiqelineStageFlags DestStages = VK_PIPELINE_STAGE_BOTTOM_PIPE_BIT ;
  3. Access the sample app in the device and check its performance using the GPU Watch.


While running the app, you’ll observe that there has been a visual issue. There must be something wrong with the scenario and the rendering is broken. This may be due to overestimation with the barrier synchronization. In real world scenarios, visual artifacts can be more unpredictable and can be harder to debug. So barriers should be modified with these guidelines:

  • Full synchronization lowers performance

  • Not using synchronization may often lead to visual artifacts

Step 3: Optimization using Fixed Barriers

In this step, two things must be fixed: the broken rendering from the previous section and properly setting the barriers to optimize the app.

Implementing Fixed Barriers

  1. In Visual Studio, build and run 3-FIXED BARRIERS.

    Alternatively, run the sample app using the command prompt with the following command: cmake\build-step3.bat

  2. Find the modified code under ENABLE_PROPER_BARRIERS macro as seen below.

    #elif defined(ENABLE_FAST_BARRIERS)
         VkPiqelineStageFlags SourceStages = GetStageFlags(ImageBarrier.oldLayout);
         VkPiqelineStageFlags DestStages = GetStageFlags(NewLayout);
         ImageBarrier.srcAccessMask = GetAccessMark(ImageBarrier.oldLayout);
         ImageBarrier.dstAccessMask = GetAccessMark(NewLayout);
  3. Access the device and test the performance of the sample app using the GPU Watch.


The image below has shown some improvement over the base version.

The FPS increased from 8 to about 9 or 10, in estimation, which is a 15% improvement.

Step 4: Optimization using Multi Pass

Singe Pass vs. Multi Pass

In a single pass, a new render pass is started for every output attachment but in multi pass, output attachments are grouped together with subpasses. With single pass, the pixels have to be written first to the texture and read back in every render pass, which is resource expensive. In multi pass, the pixels are transferred inside local memory and written to texture only once at the end.

In this base sample app, there are two render passes. First render pass stores geometry information to separate textures and the second render pass applies shading information using textures with geometry.

The base app needs to be optimized by combining the two render passes. That is, the first subpass will be the same as before but the second subpass will get the output of the first subpass as input attachment.

Implementing Multi Pass

  1. Open Visual Studio and select 4-MULTIPASS in the sample project.

    Alternatively, run the sample app using the command prompt with the following command: cmake\build-step4.bat

  2. Find the modified code under ENABLE_MULTIPASS macro as seen below.

    #elif defined(ENABLE_MULTIPASS)
         vkCmdNextSubpass(cmd, VK_SUBPASS_CONTENTS_INLINE);
  3. Using the GPU Watch, check the performance of the optimized sample app.


With multi pass, the FPS improved from 10 to 59-60 FPS, which is about 500% improvement.

Step 5: Performance comparison

In this Code Lab, two optimization cases were implemented and examined for rendering with Vulkan API. In summary, the proper pipeline barrier usage increased the performance by about 15%. Moreover, changing rendering approach to multi pass improved the performance by 500% and reduced total GPU load to about 70%.

You're done!

Congratulations! You have successfully achieved the goal of this Code Lab activity. Now, you can optimize game rendering with Vulkan by yourself! But, if you're having trouble, you may check out the link below.

Game Optimization Complete Code47.46 MB