Game Optimazation with Vulkan

Overview

Vulkan API

Vulkan is a low-overhead and cross-platform 3D graphics API. Vulkan targets high-performance real-time 3D graphics applications such as video games and interactive media across platforms. Compared with OpenGL and Metal, Vulkan is intended to offer a higher performance and more balanced CPU/GPU usage. In general, Vulkan is said to induce anywhere from a marginal to polynomial speedup in runtime relative to other APIs if implemented properly on the same hardware. Majority of current Samsung devices support Vulkan API.

Showcase (Sample App)

The showcase or sample app on which the performance will be tested is shown below with the following characteristics:

  • Scene with a lot of light sources

  • Deferred shading (heavy for mobile in general case)

  • Several optimization approaches

Figure 1 Sample App

Figure 1 Sample App

GPU Watch

Samsung delivers its own profiling tool called GPU. The profiling stats have the following information:

  1. 1. FPS (Frame Per Second) counters

    • Current FPS

    • Average FPS

  2. 2. CPU / GPU load

    • CPU load

    • GPU load

    • GPU clock frequency

  3. 3. Frame info

    • Render pass stats

    • Vertex / Fragment shader load

    Figure 2 GPU Watch

    Figure 2 GPU Watch

Pipeline Barriers and Render passes

In Vulkan, pipeline barriers and render passes are part of synchronization mechanisms. These methods can be used to optimize graphic performance of apps. Pipeline barriers are mainly concerned on the GPU resources transitioning including the source and destination of the resources. Render passes contain a set of framebuffer attachments, how it’s being used, and the rendering work being done.

Objective

  • Learn how the rendering with Vulkan API can be optimized by using 2 methods: fixed barriers and multi pass.

  • Compare performance results for every optimization

Setup

The following software and tools are needed to optimize apps for Vulkan:

  • Visual Studio 2017 ( latest update)

  • Android NDK (15+, tested on r15c)

  • Android SDK (install platform 21)

  • CMake

  • Ninja build

  • Java SE Development Kit (JDK) 8

Refer to the appendix of this document to install the software and tools listed above.

Initial Setup

Enable the GPU Watch on your mobile phone through the following:

Figure 3 Enabling GPU Watch

Figure 3 Enabling GPU Watch

Application Development

1. Testing out the Base Sample App

The base sample app uses the following classic simple approach for performance optimization:

  • Pipeline barriers are submitted with full synchronization

  • Render passes use complete layouts and dependencies

  • Single subpass in every render pass

    A. Build and Run Base Sample App

    Step 1. To build and launch the base sample app given, select the build configuration 1-BASE in Visual Studio.

    Figure 4 Select 1-BASE

    Figure 4 Select 1-BASE

    Step 2. Press Launch showcase or press Ctrl + F5. Then, check the device for the running application.

    Figure 5 Run the app

    Figure 5 Run the app

    Alternatively, run the sample app using the command prompt with the following command: cmake\build-step1.bat

    Step 3. Check the performance of the application using the GPU Watch

    B. Results and Analysis

    The result of the GPU Watch that the performance is poor with only 8 FPS.

    Figure 6 Sample App Performance

    Figure 6 Sample App Performance

2. Optimization using Barriers and RenderPass

Barriers are needed to synchronize the GPU resource usage. On this section, try to modify the pipeline barriers to minimize the wait and improve the render pass logic. The shading passes in the showcase or sample app uses geometry textures which are generated at the previous step as seen in the figure below. Before using textures, the shading pass has to wait for all the writes to finish. The transition of resources between the textures and the shading pass is the barrier which we are going to focus on this section.

Figure 7 Illustration of Barriers

Figure 7 Illustration of Barriers

The base sample app uses full synchronization as a classic method for its optimization. This forces further stages to wait before execution. However, what we wanted is to reduce the waiting time by using more relaxed barriers as seen below.

Figure 8 Implementing Barriers

Figure 8 Implementing Barriers

Additionally, base demo has separate render pass for every light source. The better approach is to use single render pass for all light source at once. Starting and stopping render pass has performance impact.

Figure 9 Optimized Render Pass

Figure 9 Optimized Render Pass

A. Implementing Barriers

Step 1. Build and run 2-BARRIERS from the project file in Visual Studio.

Figure 10 Launch 2-BARRIERS

Figure 10 Launch 2-BARRIERS

Alternatively, run the sample app using the command prompt with the following command: cmake\build-step2.bat

Step 2. Check the changed parts of the code which are wrapped with ENABLE_FAST_BARRIERS macro.

#elif defined(ENABLE_FAST_BARRIERS)
     VkPiqelineStageFlags SourceStages = VK_PIPELINE_STAGE_TOP_PIPE_BIT ;
	 VkPiqelineStageFlags DestStages = VK_PIPELINE_STAGE_BOTTOM_PIPE_BIT ;

Step 3. Access the sample app in the device and check its performance using the GPU Watch.

B. Results

Running app, you’ll observe that there has been a visual issue. This scenario has something wrong and the rendering is broken. This may due to overestimation with the barrier synchronization. In real world scenarios, visual artifacts can be more unpredictable and can be much harder to debug. So barriers should be modified with these guidelines:

  • Full synchronization lowers performance

  • Without using synchronization, may often lead to visual artifacts

    Figure 11 Broken Rendering

    Figure 11 Broken Rendering

3. Optimization using Fixed Barriers

In this step, two things must be fixed: the broken rendering from the previous section and properly setting the barriers to optimize the app.

A. Implementing Fixed Barriers

Step 1. In Visual Studio, build and run 3-FIXED BARRIERS.

Figure 12 Launch 3-FIXED BARRIERS

Figure 12 Launch 3-FIXED BARRIERS

Alternatively, run the sample app using the command prompt with the following command: cmake\build-step3.bat

Step 2. Find the modified code under ENABLE_PROPER_BARRIERS macro as seen below.

#elif defined(ENABLE_FAST_BARRIERS)
     VkPiqelineStageFlags SourceStages = GetStageFlags(ImageBarrier.oldLayout);
	 VkPiqelineStageFlags DestStages = GetStageFlags(NewLayout);
	 ImageBarrier.srcAccessMask = GetAccessMark(ImageBarrier.oldLayout);
	 ImageBarrier.dstAccessMask = GetAccessMark(NewLayout);

Step 3. Access the device and test the performance of the sample app using the GPU Watch.

B. Results and Analysiss

The image below has shown some improvement over the base version.

The FPS increased from 8 to about 9 or 10, in estimation a 15% improvement.

Figure 13 Performance of Sample App with Fixed Barriers

Figure 13 Performance of Sample App with Fixed Barriers

4. Optimization using Multi pass

A. Singe pass vs. Multi pass

In a single pass, a new render pass is started for every output attachment, but in multi pass output attachments are grouped together with subpasses. With single pass, the pixels have to be written first to the texture and read back in every render pass, which is resource expensive. In multi pass, the pixels are transferred inside local memory and written to texture only once at the end.

In this base sample app, there are two render passes. First render pass stores geometry information to separate textures and the second render pass applies shading information using textures with geometry.

The base app needs to be optimized by combining the two render passes. That is, the first subpass will be the same as before but the second subpass will get the output of the first subpass as input attachment.

Figure 14 Single pass vs Multi pass

Figure 14 Single pass vs Multi pass

B. Implementing Multi pass

Step 1. Open Visual Studio and select 4-MULTIPASS in the sample project.

Figure 15 Launch 4-MULTIPASS

Figure 15 Launch 4-MULTIPASS

Alternatively, run the sample app using the command prompt with the following command: cmake\build-step4.bat

Step 2. Find the modified code under ENABLE_MULTIPASS macro as seen below.

#elif defined(ENABLE_MULTIPASS)
     vkCmdNextSubpass(cmd, VK_SUBPASS_CONTENTS_INLINE);
	 ApplyLightingMultipass(cmd);
	 framebufferFinal[frameBufferIndex]->End(cmd);
#endif

Step 3. Using the GPU Watch, check the performance of the optimized sample app.

C. Results

With multi pass, the FPS improved from 10 to 59-60 FPS which is about 500% improvement.

Figure 16 Launch 4-MULTIPASS

Figure 16 Launch 4-MULTIPASS

5. Performance Comparison

In this Code Lab, two optimization cases were implemented and examined for rendering with Vulkan API. In summary, the proper pipeline barrier usage increased the performance by about 15%. Moreover, changing rendering approach to multi pass improved the performance by 500% and reduced total GPU load to about 70%.

Figure 17 Performance Results Summary

Figure 17 Performance Results Summary

Appendix

  1. 1. Set environment variables:

    • JAVA_HOME to point to JDK installation

    • NDK_ROOT to point to Android NDK installation

    • ANDROID_HOME to point to Android SDK installation

    • Add to PATH CMake bin folder

    • Add to PATH Ninja installation folder (or copy ninja.exe to CMake bin folder)

    • Add to PATH %JAVA_HOME%\bin

    • Add to PATH %ANDROID_HOME%\tools and %ANDROID_HOME%\platform-tools

  2. 2. How to build and launch project from Visual Studio:

    • Connect mobile device for testing

    • Select one of the configurations (1-BASE, 2-BARRIERS, etc.)

    • Select menu "Debug / Start Without Debugging" (or press Ctrl+F5)

    • This will build, upload and launch application on the connected device

    • Notice: Visual Studio may tell that deploy failed - this error can be ignored

  3. 3. How to build and launch project from terminal:

    • Run <ShowcaseRoot>/cmake/build-stepN.bat where N is the showcase configuration step.

    • Run <ShowcaseRoot>/cmake/deploy.bat to upload APK to device.

    Note

    The next steps are only needed for simplification for Code Lab attendees

  4. 4. How to add "Launch showcase" button to Visual Studio:

    • Go to menu "Tools / Customize / Commands"

    • Select "Toolbar" and "Standard"

    • Click "Add Command"

    • Pick "Debug / Start Without Debugging"

    • Press "Ok"

    • Select newly added item and click "Modify Selection"

    • Change "Name" to "Launch showcase" and select style "Image and Text"

    • Position of the item can be modified by "Move Down" and "Move Up"

  5. 5. How to add "Toggle GPUWatch" button to Visual Studio:

    • Go to menu "Tools / External Tools"

    • Click "Add"

    • Change "Title" to "Toggle GPUWatch"

    • Change "Command" to \cmake\toggle_gpuwatch.bat

    • Remember position of newly added command (first index is 1)

    • Press "Ok"

    • Go to menu "Tools / Customize / Commands"

    • Select "Toolbar" and "Standard"

    • Click "Add Command"

    • Pick "Tools / External Command N", where N is the index of the command added earlier

    • Press "Ok"