As mobile technology progresses, the quality and performance of graphical applications and games continues to improve.
Over time, this means we continue to see large jumps in the performance and quality of games on mobile.
In this article, we are going to discuss one such jump, in which high fidelity graphics can be run at 96hz in the latest
installment of Microsoft’s Forza franchise; Xbox Game Studios’ Forza Street, developed by Electric Square and published
by Turn 10 Studios.
We will be discussing some of the challenges of developing a mobile title to perform at such a challenging frame rate,
and some of the engine changes that enabled this performance in Forza Street. These discussions will be focused on the
Forza Street project, but advice will be general enough for anyone to take it into their own projects.
The Samsung Galaxy S20 is the first series of devices by Samsung to introduce a display with a refresh rate of greater than 60Hz.
The S20 introduces support for both 96Hz and 120Hz, allowing it to far surpass the traditional limits of mobile devices.
To reach these levels of performance, typically reserved for desktop and console gaming, applications must be carefully optimized
in order to decrease the time required to render each frame. These frame-time boundaries get progressively harder to hit as the
desired FPS increases, and progressively tighter optimizations must be made.
While 120Hz is possible for Forza Street, the potential compromises between quality and performance lead the team to opt for
96Hz while in-game. This allowed the application to deliver this next tier of performance, while still maintaining the graphical
quality users expect from a Forza title.Traditionally, rendering a triple-A quality title at 60Hz on mobile is already a challenge.
To push performance limits as Forza Street has done, the bottlenecks initially preventing that step up to 96Hz had to be identified
Initially, due to a large number of draw calls, it was clear that the application was CPU bound, primarily on the render thread.
Since the application was run entirely on the OpenGLES API, moving to Vulkan was an obvious choice. After switching to Vulkan,
and adding several optimizations to the Vulkan renderer, the performance gain is quite apparent.
However, once the graphics options originally desired were added back in, primarily MSAA and higher screen resolution,
the application became very clearly GPU bound. On Arm architecture in particular, it is expected that 4x MSAA should have
very little impact. Despite that, many applications still encounter very big performance deficits when enabling it.
This is often less to do with the actual fragment processing cost of enabling MSAA, but due to the cost of transferring
the full MSAA image between tile memory and main memory between passes. This was also the case in Forza Street.
Two situations were the primary concern here. Firstly, the screen copy reflection pass made use of the main render target,
at full MSAA resolution. Secondly, the main colour pass and translucency pass both made use of the same buffer, which had to be
transferred between the two. These in total meant that the MSAA colour and depth buffers were being transferred to and from
tiled memory several times, for 3 separate passes. “Ouch!”
To remedy this, we first need to assign independent targets to the reflections pass. It can sample the previous frames’ resolved
target, then copy directly to a single sampled target suitable for sampling in the main colour pass. Similarly, an independent
depth target can also be used. Secondly, we can attempt to combine the main colour pass and the translucency pass. To do this,
we made use of Vulkan subpasses. Vulkan subpasses allow us to keep the memory on-tile throughout multiple passes, while still
allowing us to transfer the render target formats as necessary between the passes.
Combined, these changes allowed us to mitigate much of the performance cost previously faced by moving large MSAA buffers between
the GPU’s fast tiled memory and the device main memory. These changes go a significant distance to allow the device to render the
extra frames on a high speed display device, and they give a hefty performance boost to any device using MSAA.
Notable, on the Samsung S10 device (SM-G975F), on max graphics settings the application is no longer completely bottlenecked
by the GPU.
On the S20+ US model, using Vulkan and the culmination of CPU and GPU optimizations, 96 FPS is now achieved with room to spare.
The S10 series now also performs at its maximum performance, 60 FPS, quite comfortably.
A key part of the technology used in the latest Samsung High Speed Display devices is the ability to not only go above conventional mobile refresh rate limits, but also to dynamically modify this limit at application runtime. A common misconception is the belief that, should a device have the ability to run at 120 frames per second, it should always do so. However, not only is this often not practical on mobile devices, it is also often not required, and may serve only to waste power. The ideal approach here is to make use of the maximum refresh rate only when it has the greatest benefit.
In the case of Forza Street, the game can be described in quite distinct sections. There is the metagame section, which involves much of the 2D UI for currency management, settings, menus and race selection. There is the garage section, where the player can admire their favourite cars, and make various modifications to them. There are loading screens in-between the other sections, and finally the most important section; the race.
All of these sections do not share the same performance requirements and also do not share the same benefits from improvements in
rendering quality. Using the Samsung GameSDK (add hyperlink)), it is possible for developers to request changes in the game’s
quality at runtime, either through modifying game defined variables, such as resolution and rendering distances, or by making
requests to device-specific values, such as core frequencies or device refresh rate. For Forza Street, we primarily focused on
modifying the resolution and the device refresh rate.
There are a couple of considerations in deciding which section should have which refresh rate and resolution. Firstly, battery is
ever a primary concern on mobile devices. Policies must find a way to conserve battery usage, or have cooldown periods in-between
heavy usage periods to prevent the device from overheating. Secondly, with the remaining performance budget, it is possible to
prioritize refresh rate over resolution or vice versa, depending on the visuals and interactivity of the section.
Multiple policies can be changed and tested quickly, to determine which is preferred by those playing the game. To combat the
battery issue, low impact sections of the game must be identified, during which low refresh rates and/or resolutions can be used.
The obvious candidate for this is during load screens. These can be rendered at 30 fps without any difference in user experience.
The next debatably low impact section of the game is the garage. The garage has to look good, but may not require high refresh
rates to accomplish that, since user interaction is fairly minimal. For that reason, the garage can start at 60 fps, decreasing
if and when the device gets hotter. However, the garage is where players are expected to admire their cars. For that reason,
it is allocated the highest resolution.
The final consideration is where to allocate the highest refresh rate, in the metagame or the race. Initially the idea that both
could run at 96 fps is very enticing. However due to the load times in Forza Street being quite short, this meant there wasn’t
enough cool-down time, and the device could overheat. The obvious choice then may seem to be the race, since that is the focus
of the gameplay. However after some local user testing, there were mixed conclusions drawn. The increased interactivity of the
higher refresh rate has a very high impact on the UI sections of the game, where interactivity is more important for tapping through
the menus. Therefore two final policies were drawn, as below:
Heat level 2 is rarely if ever reached on the S20(SM-G986U) and is purely a precaution. The game is expected to perform for significant periods of time at Heat Level 0, before moving down to Heat Level 1.
Forza Street now stands as a testament to the possibilities of mobile gaming on modern hardware. With the help of Vulkan,
Samsung Galaxy S20 and future Galaxy devices now utilise their high speed display support to showcase the title at a frame-rate
previously impossible. Through careful management and the use of Variable Refresh Rate in the Samsung Galaxy GameSDK,
it is possible to bring this new level of performance to one of the most graphically intense titles on the market, in its most
graphically intense scenes, all while maintaining a sustainable level of power consumption.
The Galaxy S20 series supports high speed refresh rate with options for multiple Vsync intervals, including 96hz and 120hz.
This allows games to break the traditional limits of 60hz on Android, and create a whole new level of interactivity, reactivity
and smoothness for mobile gaming. This level of performance is now a very real possibility for many mobile titles in the future,
and marks a monumental step in bringing desktop quality gaming to mobile devices.
Thanks to the GameDev Engineers : Dohyun Kim, Seunhwan Lee, Joonyong Park, Benjamin Mitchell, Michael Parkin-White, Lewis Gordon, Kostiantyn Drabeniuk, Junsik Kong, Inae Kim, Munseoung Kang, Oleksii Vasylenko