L2M - Binding and Format Optimization


Lineage 2 Mobile is one of the top revenue games in Google Play. It is an MMORPG, made by NCSoft in Korea.
We worked for almost 4 months with almost of GameDev colleagues online and offline. We integrated optimizations from previous projects and also made new optimizations and also analyzed performance differences based on content changes to allow us to make well-informed choices to optimize that content.

This article will introduce 2 changes, one is related to Vulkan, and the other is related to general rendering.



Optimized Vertex / Index buffer bound


First, is an optimization related to binding. As you probably know, GLES is a state machine, so once bound, the vertex and index buffer state remain in place. Vulkan, in contrast, is not actually a state machine, instead the state is saved in a command buffer. So, in a command buffer if the same vertex or index buffer is used as during the command buffer recording, then it is okay to call bind just once.


Before - bind with both vertex / index buffer


After - bind without both vertex / index buffer


The engine will try not to bind the same vertex/index buffer from previous draw calls under proper conditions (i.e. same pipeline, same CommandBuffer, same frame number) since the Vertex and Index Buffers are already recorded in the Command Buffer for the coming draw.

We made a performance comparison, to check the same condition we fixed CPU and GPU frequency. It has a 1fps benefit.


Device SM-G973F API Async Creation Original
Chipset Mali G76 FPS

42

40



Format Optimization


If it is possible to use a lower bitrate format, it can have benefits for performance...
We found ‘CustomDepthStencil’ texture had 4 color channels, but only 1 color channel is read on the shader side. In this case, it is okay to change from RGBA to only R and doing so can reduce bandwidth demands by a factor of four.

MaterialFloat Stencil = Texture2DSample(MobileSceneTextures.MobileCustomStencilTexture, MobileSceneTextures.MobileCustomStencilTextureSampler, UV).r * 255.0;
Stencil = floor(Stencil + 0.5);



After implementation, we can see the framebuffer color channel is changed as well. Even the color changes from gray to red, but as only 1 color channel is used, the final result is the same.


Original After

MobileCustomStencil-1440x720 1 mips-B8G8R8A8_UNORM


2D Color Attachment 816-1440x720 1 mips-R8_UNORM


Device SM-G973U API Vulkan R8 Vulkan Original
Chipset Adreno 640 FPS

53

52


Sometimes changing to a format which has a low bitrate can help performance even if that change might eliminate some color data.
There is a ‘PostProcessMaterial’ pass, which is for drawing the characters’ outlines. We found it used 4xFP16, but it could be reduced to a packed 32 bit format because it is just used for drawing the outline.


Original 2D Color Attachment 29382 - 1736x824 1 mips - Rl6Gl6Bl6Al6_FLOAT
After 2D Color Attachment 28890 - 1736x824 1 mips - Rl1Gl1Bl0_FLOAT

Device SM-G975F API Optimized Original
Chipset Mali G76 FPS

35

34


If we set maximum fps to 30, we can see the actual difference for GPU usage. It has a 2% benefit.


Device SM-G975F API Optimized Original
Chipset Mali G76 GPU Usage

96.13

98.13


The scene result is not exactly the same, but the difference is hard to recognize. If we change the format for base rendering, it would be easy to recognize but as this is for drawing characters’ outline so it is usually hard to see.



When changing an asset format, it is wise to check that the format supports GPU Driver compression such as ARM AFBC or Qualcomm UWBC. If the format does not support compression, it could decrease performance.



Conclusion



We integrate all our optimizations, we get 2~4 fps to benefit compared with GLES.


Device Chipset Vulkan GLES
S9 Mali(SM-G965F) Mali G72 38

34

S10 Mali(SM-G975F) Mali G76 43

41


On lower-end devices such as S8 it has more benefit. It is checked with max fps 30.


Device Chipset Vulkan GLES
S8 Mali(SM-G955F) Mali G71 26

18

S8 Adreno(SM-G955U) Adreno 540 30

22


Both of these optimizations are fairly well known and it is easy to think each will have only minor benefits... We all like to find those huge optimizations which make a game run twice as fast - but those are rare, and, instead, diligently working through changes like these while not so glamorous is often the main opportunity to improve the user experience. Additionally, the ease of implementing these changes meant that it was a fully justified choice.



Thanks to the GameDev Engineers : Alon Or-bach, Aton Sinyavskiy, Dohyun Kim, Fedir Nekrasov, Igor Nazarov, Inae Kim, Joonyong Park, Junsik Kong, Lewis Gordon, Kostiantyn Drabeniuk, Munseong Kang, Nataliia Kozoriz, Oleksii Vasylenko, Serhii Pavliv, Seunghwan Lee