Filter
-
Content Type
-
Category
Mobile/Wearable
Visual Display
Digital Appliance
Platform
Recommendations
Filter
Develop GameDev
docadaptive performance 1 0 mobile devices have more physical limitations compared to pc’s and consoles, this means it’s more constrained when rendering complex games by comparison the adaptive performance project was started to improve the performance of these games under these stricter constraints on mobile devices in the version 1 0 we focused on how to reduce unnecessary power consumption on the device without impacting the games performance, because the battery consumption and the performance management are big parts of the device performance limitations to manage these constraints we implemented 3 features power manager bottleneck detection & auto performance control custom scaler using device thermal feedback details of each implementation are as follows power manager power manager pm was implemented to avoid thermal throttling which is suddenly dropping the performance when the device temperature is high and to extend battery time to achieve these goals, the pm predicts the optimal cpu/gpu levels for the games needs and sets the hardware to those levels the graph above shows the structure of the power manager system and its workflows the operation order of the pm is as follows compare and calculate previous frame information and real time performance status information like device temperature which is retrieved from gamesdk on the device check the current status of the device through the calculated frame information and the performance status information of the device in this process, it checks based on the information from gamesdk, if the device is achieving target framerate and if the game has a cpu or gpu bottleneck find proper power levels that can reduce power consumption without lowering performance auto performance controller transmit the appropriate cpu and gpu levels that are found through the auto performance controller to device while the game is playing on the device, power manager in adaptive performance repeats the above order and provides appropriate behavior for changing device situations bottleneck detection & auto performance control bottleneck detection is the process that identifies which component is delaying the overall rendering pipeline of a game in adaptive performance, it finds bottleneck points using frame time and cpu/gpu time information with this information we can narrow down whether it’s the cpu or the gpu that is stopping the game from reaching the target framerate this is identified by calculating and comparing the cpu and gpu frame time with the overall frame time in case of gpu time, more accurate information can be obtained through gamesdk and it makes it easier to find bottleneck points this way, once you know whether the cpu or gpu is causing the bottleneck, you can take appropriate action for each situation for example if the gpu is a bottleneck then we can reduce the cpu level which will in turn lower temperature and vice versa if the cpu is the bottleneck you can lower the gpu level reduce the temperature for that component if the device already has a low temperature, pm can also adjust the device to use the gpu as much as the game needs in addition if the cpu and gpu are surpassing the target framerate then we can reduce the cpu and gpu level to lower the power consumption, thereby increasing battery usage time adaptive performance provides an auto power management system like above it is called auto performance control with the auto performance control system, the developer can automatically communicate with the device and optimize the performance of the game according to the device’s status by just turning on the auto performance control system without any additional work configuring cpu/gpu levels in adaptive performance, when turning on auto performance mode, it will automatically activate bottleneck detection and set the proper cpu/gpu setting through the auto performance control system, which is recommended the example code below explains how to decrease heat and power consumption using auto performance control when the developer turns on auto performance control mode true and sets the targetframerate that they want, the auto performance control system helps to achieve appropriate performance by controlling the power level public void entermenu { if !ap active return; application targetframerate = 30; // enable automatic regulation of cpu and gpu level by adaptive performance var ctrl = ap deviceperformancecontrol; ctrl automaticperformancecontrol = true; } public void enterbenchmark { var ctrl = ap deviceperformancecontrol; // set higher cpu and gpu level when benchmarking a level ctrl cpulevel = ctrl maxcpuperformancelevel; ctrl gpulevel = ctrl maxgpuperformancelevel; } it is possible that developers can set the cpu/gpu levels by themselves using adaptive performance apis, but it is not recommended what they can adjust directly is the abstract level value, not the actual operating frequency number as this may cause unintended battery consumption and performance degradation problems depending on the performance difference each device has for example, a certain level value set for device a may be adversely affected in device b for different reasons therefore, rather than setting the level directly by the developers, it is recommended to use the auto performance control provided by the experienced developers of unity and samsung for achieving the best performance in any devices nevertheless, if developers want to manually set cpu, gpu levels, set instance deviceperformancecontrol automaticperformancecontrol to false, and they can change cpu/gpu level value through instance deviceperformancecontrol cpulevel and instance deviceperformancecontrol gpulevel again, this is not the recommended method note that even if the user sets the cpu/gpu level, the level value is not guaranteed to be applied or maintained due to device status or policy control custom scaler with device thermal feedback if game developers want to get not only auto performance control with the power management service but also an additional performance improvement then they can use the custom scaler the adaptive performance api notifies the temperature information of the current terminal through the warning level and provides more detailed temperature level through this, the developer can see the temperature change and throttling control time of the device with the warning level signal provided by adaptive performance if the developer controls the quality scaling of the desired part by utilizing this timing information, it can preemptively manage the heat generation before the point of heat generation control of the terminal and allow the battery to be used for a long time the below example is an implementation sample that controls global lod bias as a custom scale factor using temperature information using unityengine; using unityengine adaptiveperformance; public class adaptivelod monobehaviour { private iadaptiveperformance ap = null; void start { ap = holder instance; if !ap active return; qualitysettings lodbias = 1 0f; ap thermalstatus thermalevent += onthermalevent; } void onthermalevent thermalmetrics ev { switch ev warninglevel { case warninglevel nowarning qualitysettings lodbias = 1; break; case warninglevel throttlingimminent if ev temperaturelevel > 0 8f qualitysettings lodbias = 0 75f; else qualitysettings lodbias = 1 0f; break; case warninglevel throttling qualitysettings lodbias = 0 5f; break; } } } adaptive performance 1 0 performance result until now, we’ve looked into the key functions of adaptive performance 1 0 below is the adpative performance effect result that we got from unity’s megacity demo the above blue graph is the case of using adaptive performance and its auto performance control system, and the red one is the case of not using adaptive performance the target fps in both cases are 30, but you can see the results of maintaining 30 fps for a longer period of time adaptive performance is much more stable ※ you can find supported version of adaptive performance 1 0 , more detailed user guide and faqs as follows unity editor version adaptive performance package version unity 2018 lts+ 1 1 9 unity 2020 1+ unity 2019 lts 1 2 0 detailed user guide from unity click unity adaptive performance faqs click
Develop GameDev
docadaptive performance 2 0 in the 2 0 version released in august 2020 - 12months after adaptive performance 1 0 first released-, we focused on what we missed in 1 0 we improved 3 aspects of content control and enhanced the usability for developers indexer and quality scaler unity editor ux/ui simulator in the 1 0 version, developers had to implement a real-time content control module by themselves, but in 2 0 we implemented a quality manager system it provides various built-in scalers for developers we improved the ux/ui of unity editor to make it easier for developers to use adaptive performance there is also a simulator to help developers test adaptive performance easily without connecting devices details of each implementation are as follows unity editor ux/ui when you install the adaptive performance package and the adaptive performance samsung android package , you can find out that [adaptive performance] is added on project settings to use this properly you have to select providers first if you use it on your device, select the samsung android provides as follows in the case of the simulator, you have to select device simulator provider on pc mac & linux standalone settings if you are finished selecting providers, you can use adaptive performance simply through the adaptive performance in project settings auto performance mode is a part of the power manager which is provided from 1 0 it controls performance automatically to achieve best performance and reduces battery consumption by changing the cpu/gpu levels if you check auto performance mode in settings and activate indexer settings, you can set the scalers you want to use with simple checkboxes and scaling bars in this case, a scaler setting is applied for the entire project if you want to set it manually for each scene, you can do a more detailed setting by attaching adaptive performance setting to an object in the scene simulator the simulator has been added in ap 2 0 so we can test operations in different thermal settings as well as the scaler values all without a device to use the simulator, you have to download “device simulator package” from the package manager then, as same as unity editor ux / ui above, you can set each scaler at project settings > adaptive performance > simulator when you are done editing the scaler settings, open the simulator with window > general > device simulator through this simulator, you can check adaptive performance operation controlled by the temperature and the bottleneck situation indexer and quality scaler the indexer is a quality management system for adaptive performance, it controls the game quality depending on the temperature and the performance status of the device the scaler is a tool that controls the quality of a scene based on the values below target current bottleneck lowest level lowest visual impact the scaler can only operate when the indexer is activated and you can make this setting in the project setting built-in scalers here are built-in scalers that adaptive performance 2 0 provides general render adaptive framerate automatically control the application’s framerate within the defined min/max ranges adaptive lod change the lod bias based on the thermal and performance load adaptive resolution automatically control the screen resolution of the application by the defined scale universal render pipeline scalers the scalers below require the universal render pipeline urp and directly changes the settings in the urp so they will not have any effect when using any other render pipelines adaptive batching toggle dynamic batching based on thermal and performance load adaptive lut change lut bias of the urp adaptive msaa change the anti aliasing quality bias of urp this scaler only affects the camera’s post processing subpixel morphological anti-aliasing smaa quality level adaptive shadow cascades change the main light shadow cascades count bias of the urp adaptive shadow distance change the max shadow distance multiplier of urp adaptive shadow quality change the shadow quality bias of the urp adaptive shadow map resolution change the main light shadowmap resolution multiplier of the urp adaptive sorting skip the front-to back sorting of urp we provide built in scalers to make it easier for developers to control the quality of the content developers can use built-in scaler easy and safely because they can choose which quality and how much they would control in basic ui custom scaler developers can make their own custom scaler, too you can create a new class which inherits adaptiveperformancescaler below is an example of texturequalityscaler public class texturequalityscaler adaptiveperformancescaler { public override scalervisualimpact visualimpact => scalervisualimpact high; public override scalertarget target => scalertarget gpu; public override int maxlevel => 2; int m_defaulttexturequality; protected override void ondisabled { qualitysettings mastertexturelimit = m_defaulttexturequality; } protected override void onenabled { m_defaulttexturequality = qualitysettings mastertexturelimit; } protected override void onlevel { switch currentlevel { case 0 qualitysettings mastertexturelimit = 0; break; case 1 qualitysettings mastertexturelimit = 1; break; case 2 qualitysettings mastertexturelimit = 2; break; } } } samsung provider scaler when you select samsung provider scaler in the project settings, you can use adaptive vrr scaler which is samsung specific feature related with vrr variable refresh rate currently, the galaxy s20 supports vrr among the galaxy devices basically you can use automatic vrr with vrr supported devices it can be enabled in the project settings and you can save battery consumption with this by setting refresh rate automatically in samsung provider, when you enable both automatic vrr and adaptive framerate at the same time, it operates as the refresh rate closest to the achievable framerate, meaning the framerate between the min/max value in automatic vrr and adaptive framerate settings through this, you can set target frame per scenes and control framerate within the ranges you want so game users can play without feeling uncomfortable due to vrr adjustment you can also check adaptive vrr’s operation through adaptive framerate samples what will change with quality scaler game developers don’t like quality control of content because it might cause deteriorated game service quality but quality manager 2 0 provides additional options to minimize visual impact through scaler control each scaler has visual impact options that are used to retain the quality of a scene when the performance is controlled the images below are unity's boat attack demo, and you can compare how much you see visual impact by lod levels if you see enlarged parts, you can find out that there are some missing trees when lod level 3 as you can see above, some scalers have immediately recognizable visual impact, but some scalers barely have any below is an example of texture quality level differences you might not see any differences with your eyes, but if you see renderdoc capture, you can find that real texture size are diminished as you can see above two examples, visual impacts may vary depending on the type of scaler, but different visual impacts may occur depending on how the range is set in the same scaler therefore, it is recommended that developers properly balance performance and quality by themselves ※ you can find supported version of adaptive performance 2 0 , more detailed user guide and faqs as follows unity editor version adaptive performance package version unity 2020 2 0a7+ unity 2020 1 0b5+unity 1019 3 11f1 2 0 0 detailed user guide from unity click unity adaptive performance faqs click
events iot, health, game, design, mobile, galaxy watch, foldable
blogthe samsung developer conference 2023 (sdc23) happened on october 5, 2023, at moscone north in san francisco and online. among the many exciting activities at the conference for developers and tech enthusiasts, code lab offered a unique opportunity to learn about the latest samsung sdks and tools. code lab is a hands-on learning experience, providing participants with a platform to explore the diverse world of samsung development. code lab activities are accessible for developers of all skill levels and interests, ensuring that everyone, from beginners to experts, can find something exciting to explore. covering a wide array of topics within the code lab, the conference catered to the diverse interests of the participants. here's a quick look at some of the sdc23 topics: 1. smartthings participants had the chance to build a matter iot app using the smartthings home api and create virtual devices that they could control using the smartthings app or their own iot apps. they also learned how to develop a smartthings find-compatible device. these topics are all about connecting and enhancing the smart home experience. 2. galaxy z participants, who are interested in foldable technology, were able to develop a widget for the flex window. this topic opens new possibilities in app design and user interaction. 3. samsung wallet participants learned to integrate the "add to samsung wallet" button into sample partner services. they also learned to implement in-app payment into a sample merchant app using the samsung pay sdk. these topics focus on enhancing the mobile wallet experience for samsung users. 4. gamedev game developers and enthusiasts had the opportunity to optimize game performance with adaptive performance in unity. they also learned to implement flex mode into unity games for foldable phones. these topics offer insights into the gaming industry's latest trends and technologies. 5. watch face studio code lab also provided an activity for participants to create a watch face design with customized styles using watch face studio. participants also learned how to convert the watch face design for galaxy z flip5's flex window display using the good lock plugin. 6. samsung health the health-focused code lab topics covered measuring skin temperature on galaxy watch and transferring heart rate data from galaxy watch to a mobile device with the samsung privileged health sdk. participants also learned how to create health research apps using the samsung health stack. these topics provide valuable insights into the health and fitness tech landscape. from creating virtual devices to building health-related apps, participants left the conference with new knowledge they could apply to their development projects. the samsung developer conference is a celebration of innovation and collaboration in the tech world. with a diverse range of topics in code lab, participants were equipped with the tools and knowledge to push the boundaries of what is possible in samsung's ecosystem. though sdc23 has ended, the innovation lives on! whether you missed the event or just want to try other activities, you can visit the code lab page anytime, anywhere. we can't wait to see you and the innovations that will emerge from this conference in the coming years. see you at sdc24!
Christopher Marquez
Connect Samsung Developer Conference
webyour impact made our sdc23 shine! samsung's innovations include bixby, knox, smartthings, and tizen.see sdc23 for a connected ecosystem with multi-device experiences. samsung developer conference 2023 thu, oct 5, 2023 10:00 am ptmoscone north in san francisco and online video thumbanil highlights though sdc23 has ended, the innovation lives on! whether you missed the event or just want to revisit the highlights, you can watch the excitement on demand. keynote discover samsung’s broad ecosystem of powerful, next-level tech and hear how samsung is building toward a smarter, safer, and more personally connected future. view keynote sessions view sessions dive into the future of connected customer experiences through tech sessions by developers offering further insight into the innovations introduced in the keynote. gamepad on tizen tv mega session screen experience, game, developer program, tizen this session provides valuable tips and techniques for game application developers and gamepad manufacturers. hdr10+ gaming mega session screen experience, game the hdr10+ gaming panel discussion covers an overview of hdr10+ gaming and how game developers can support it. games with samsung galaxy mega session mobile experience, game, android, mobile the latest in mobile gaming development technologies, responsive ui for flex mode, and mobile cloud gaming. exploring the digital health ecosystem: samsung health as digital front door mega session health experience, health, wearable, mobile new samsung health features, samsung privileged health sdk, and collaboration for research with samsung health stack. smartthings and matter tech session platform innovation, iot, open source, developer program get a brief introduction to matter, new enhancements with smartthings, and new developer tools that make it easy to integrate your devices. what's new and next in watch face studio 2023 tech session mobile experience, wearable, design, mobile let's learn the main new features of watch face studio 2023 and enjoy the new watch face studio plugin experience. speakers check out the speakers who joined us at sdc23 to share their experience and expertise, and get a sense of what you can expect from next year’s sdc event. view speakers code labs view code lab get hands-on with the latest development features through new code lab topics and samples introduced for sdc23. smartthings matter: build a matter iot app with smartthings home api 25 mins start smartthings develop a smartthings find-compatible device 30 mins start foldable develop a widget for flex window 25 mins start samsung wallet integrate 'add to samsung wallet' button into partner services 30 mins start gamedev galaxy z implement flex mode into a unity game 30 mins start watch face studio customize styles of a watch face with watch face studio 30 mins start tech square talk with product experts, experience innovations in tech square. catch up on new updates from samsung platforms and os like smartthings, knox and tizen, mobile & screen experience, home & health experience, sustainability. view tech square samsung c-lab meet six passionate entrepreneurs and start-ups accelerated by samsung c-lab, an in-house venture and start-up acceleration program. these start-ups are making waves in the healthcare and ai industries, and are here to showcase their latest innovations. view samsung c-lab prior years watch highlights of selected sessions from sdc events in last samsung developer conference. sdc22 october 12, 2022moscone north and onlinesan francisco, california sdc21 october 26, 2021online sdc19 october 29–30, 2019mcenery convention centersan jose, california sdc18 november 8-9, 2018moscone westsan francisco, california sdc17 october 18-19, 2017moscone westsan francisco, california sdc16 april 27-28, 2016moscone westsan francisco, california
tutorials game
bloggame performance can vary between different android devices, because they use different chipsets, and gpus based on different mali architectures. your game may render at 60fps on a galaxy s20+, but how will it perform on the galaxy a51 5g? the a51 5g has good specs, but you may want to consider runtime changes based on the underlying hardware if your game is pushing the limits on flagship hardware. similarly, you may need to optimize your game to run at lower frame rates on the galaxy j and galaxy m models, which are often found in emerging markets. players quickly lose interest in a game if they experience drops in frame rate, or slow load times. they won’t play your game on long journeys if it drains battery or overheats the device. to reliably deploy a game globally, and ensure a great user experience, you need to performance test on a wide range of android devices. the arm mobile studio family of performance analysis tools provide games studios with a comprehensive game analysis workflow for android, giving information and advice at appropriate levels of detail, for technical artists, graphics developers, performance analysts and project leaders. monitor cpu and gpu activity arm streamline captures a comprehensive profile of your game running on an unrooted android device, and visualizes the cpu and gpu performance counter activity as you run your test scenario. you can see exactly how the cpu and gpu workloads are handled by the device, which helps you locate problem areas that might explain frame rate drops or thermal problems. however, spotting performance issues using streamline can be time-consuming, unless you know exactly what you’re looking for. your team may have varying levels of experience or expertise, so interpreting this data can be difficult. introducing performance advisor arm performance advisor is a lightweight reporting tool that transforms a streamline capture into a simple report that describes how your game performed on the device, and alerts you to problem areas that you should consider optimizing. it can be used by your whole team on a regular basis, to spot trends and diagnose problems early in the development cycle, when you’re best placed to do something about it. performance advisor reports the application frame rate, cpu load and gpu load, as well as content metrics about the workload running on the mali gpu. if it detects problematic areas, performance advisor tells you whether it’s the cpu or gpu that is struggling to process your application, and links to optimization advice to help you rectify it. a frame rate analysis chart shows how the application performed over time. the background color of the chart indicates how the game performed. when you’re hitting your target frame rate, the chart background is green. in this example, most of the chart is blue, telling us the gpu in the device is struggling to process fragment workloads. performance advisor can capture screenshots of your game, at the point that fps drops below a given threshold. this helps you to identify which content might be causing the problem. this provides valuable context when debugging and can allow some common elements to be spotted if repeated slowdowns occur. you can then investigate these frames more closely with graphics analyzer, to see exactly which graphics api calls were executing at that point. if you want to separately evaluate how different parts of the game performed, for example, loading screens, menu selection, and different levels or gameplay scenarios, you can annotate these regions in your game so that performance advisor can report data for them independently. performance budgeting because they have different performance expectations, it’s a good idea to set your own performance budgets for each device. for example, if you know the top frequency for the gpu in the device, and you have a target frame rate, you can calculate the absolute limit of gpu cost per frame. gpu cost per frame = gpu top frequency / target frame rate when you generate a report with performance advisor, you can pass in your performance budgets, which are then shown in the charts, so you can easily see if you’ve broken one. in the example below, we can see correlation between high numbers of execution engine cycles and drops in fps. performance advisor tells us that the gpu is busy with arithmetic operations, and that the shaders could be too complex. the report provides a link to an advice page on the arm developer website, that explains how to reduce arithmetic load in shaders. more charts include key performance metrics such as cpu cycles per frame and gpu bandwidth per frame, reported for read and write access. there are also charts showing the content workload , draw calls, primitives and pixels per frame, and the level of overdraw per pixel. download an example performance advisor report automated performance analysis it’s far easier to fix problems as they arise than is it to patch problems later on. performance advisor’s key application performance metrics are useful to monitor over daily runs, to see how changes to your application affect performance during development. arm mobile studio professional includes headless ci support - so you can easily deploy large-scale automated performance testing across multiple devices. using a device farm with a ci workflow, you can generate performance data and optimization advice automatically, every night, for several android devices. as you check in code, you can easily monitor how your content performs against your performance budgets over time, and raise alerts when you start to approach or break those budgets. the professional edition also enables you to build bespoke data dashboards from the data that been collected. performance advisor's machine-readable json reports can be imported into any json-compatible database and visualization platform, such as the elk stack. compare metrics between test runs to quickly determine which changes impacted performance, and which type of workload is the likely cause for a regression. query the data and compare performance against specific targets to identify optimization next steps. read more about how to integrate arm mobile studio into a ci workflow on the arm developer website. resources arm publishes various resources on their developer website, to help you optimize performance: optimization advice – quick reference to help you avoid common problems. mali best practices guide – comprehensive guide describing in detail how to ensure your content runs well on mali gpus. developer guides such as those for technical artists, covering best practises for geometry, textures, materials and shaders. download the mali gpu datasheet to see the different features and capabilities of arm mali gpus from the midgard-based mali-t720, to the latest valhall-based mali-g78. for detailed descriptions of all the performance counters you can analyze in each mali gpu refer to the mali gpu counter reference. get arm mobile studio arm mobile studio is free to use for interactive performance analysis. to use it headlessly in your ci workflow, you need an arm mobile studio professional license. download arm mobile studio
Arm Developer
tutorials game, mobile
blogthe samsung developers team works with many companies in the mobile and gaming ecosystems. we're excited to support our partner, arm, as they bring timely and relevant content to developers looking to build games and high-performance experiences. this vulkan extensions series will help developers get the most out of the new and game-changing vulkan extensions on samsung mobile devices. as i mentioned previously, android is enabling a host of useful new vulkan extensions for mobile. these new extensions are set to improve the state of graphics apis for modern applications, enabling new use cases and changing how developers can design graphics renderers going forward. these extensions will be available across various android smartphones, including the new samsung galaxy s21, which was recently launched on 14 january. existing samsung galaxy s models, such as the samsung galaxy s20, also allow upgrades to android r. i have already discussed two of these extensions in previous blogs - maintenance extensions and legacy support extensions. however, there are three further vulkan extensions for android that i believe are ‘game changers’. in the first of three blogs, i will explore these individual game changer extensions – what they do, why they can be useful and how to use them. the goal here is to not provide complete samples, but there should be enough to get you started. the first vulkan extension is ‘descriptor indexing.’ descriptor indexing can be available in handsets prior to android r release. to check what android devices are available with 'descriptor indexing' check here. you can also directly view the khronos group/ vulkan samples that are relevant to this blog here. vk_ext_descriptor_indexing introduction in recent years, we have seen graphics apis greatly evolve in their resource binding flexibility. all modern graphics apis now have some answer to how we can access a large swathes of resources in a shader. bindless a common buzzword that is thrown around in modern rendering tech is “bindless”. the core philosophy is that resources like textures and buffers are accessed through simple indices or pointers, and not singular “resource bindings”. to pass down resources to our shaders, we do not really bind them like in the graphics apis of old. simply write a descriptor to some memory and a shader can come in and read it later. this means the api machinery to drive this is kept to a minimum. this is a fundamental shift away from the older style where our rendering loop looked something like: render_scene() { foreach(drawable) { command_buffer->update_descriptors(drawable); command_buffer->draw(); } } now it looks more like: render_scene() { command_buffer->bind_large_descriptor_heap(); large_descriptor_heap->write_global_descriptors(scene, lighting, shadowmaps); foreach(drawable) { offset = large_descriptor_heap->allocate_and_write_descriptors(drawable); command_buffer->push_descriptor_heap_offsets(offset); command_buffer->draw(); } } since we have free-form access to resources now, it is much simpler to take advantage of features like multi-draw or other gpu driven approaches. we no longer require the cpu to rebind descriptor sets between draw calls like we used to. going forward when we look at ray-tracing, this style of design is going to be mandatory since shooting a ray means we can hit anything, so all descriptors are potentially used. it is useful to start thinking about designing for this pattern going forward. the other side of the coin with this feature is that it is easier to shoot yourself in the foot. it is easy to access the wrong resource, but as i will get to later, there are tools available to help you along the way. vk_ext_descriptor_indexing features this extension is a large one and landed in vulkan 1.2 as a core feature. to enable bindless algorithms, there are two major features exposed by this extension. non-uniform indexing of resources how resources are accessed has evolved quite a lot over the years. hardware capabilities used to be quite limited, with a tiny bank of descriptors being visible to shaders at any one time. in more modern hardware however, shaders can access descriptors freely from memory and the limits are somewhat theoretical. constant indexing arrays of resources have been with us for a long time, but mostly as syntactic sugar, where we can only index into arrays with a constant index. this is equivalent to not using arrays at all from a compiler point of view. layout(set = 0, binding = 0) uniform sampler2d textures[4]; const int constant_value = 2; color = texture(textures[constant_value], uv); hlsl in d3d11 has this restriction as well, but it has been more relaxed about it, since it only requires that the index is constant after optimization passes are run. dynamic indexing as an optional feature, dynamic indexing allows applications to perform dynamic indexing into arrays of resources. this allows for a very restricted form of bindless. outside compute shaders however, using this feature correctly is quite awkward, due to the requirement of the resource index being dynamically uniform. dynamically uniform is a somewhat intricate subject, and the details are left to the accompanying sample in khronosgroup/vulkan-samples. non-uniform indexing most hardware assumes that the resource index is dynamically uniform, as this has been the restriction in apis for a long time. if you are not accessing resources with a dynamically uniform index, you must notify the compiler of your intent. the rationale here is that hardware is optimized for dynamically uniform (or subgroup uniform) indices, so there is often an internal loop emitted by either compiler or hardware to handle every unique index that is used. this means performance tends to depend a bit on how divergent resource indices are. #extension gl_ext_nonuniform_qualifier : require layout(set = 0, binding = 0) uniform texture2d tex[]; layout(set = 1, binding = 0) uniform sampler sampler; color = texture(nonuniformext(sampler2d(tex[index], sampler)), uv); in hlsl, there is a similar mechanism where you use nonuniformresourceindex, for example. texture2d<float4> textures[] : register(t0, space0); samplerstate samp : register(s0, space0); float4 color = textures[nonuniformresourceindex(index)].sample(samp, uv); all descriptor types can make use of this feature, not just textures, which is quite handy! the nonuniformext qualifier removes the requirement to use dynamically uniform indices. see the code sample for more detail. update-after-bind a key component to make the bindless style work is that we do not have to … bind descriptor sets all the time. with the update-after-bind feature, we effectively block the driver from consuming descriptors at command recording time, which gives a lot of flexibility back to the application. the shader consumes descriptors as they are used and the application can freely update descriptors, even from multiple threads. to enable, update-after-bind we modify the vkdescriptorsetlayout by adding new binding flags. the way to do this is somewhat verbose, but at least update-after-bind is something that is generally used for just one or two descriptor set layouts throughout most applications: vkdescriptorsetlayoutcreateinfo info = { … }; info.flags = vk_descriptor_set_layout_create_update_after_bind_pool_bit_ext; const vkdescriptorbindingflagsext flags = vk_descriptor_binding_variable_descriptor_count_bit_ext | vk_descriptor_binding_partially_bound_bit_ext | vk_descriptor_binding_update_after_bind_bit_ext | vk_descriptor_binding_update_unused_while_pending_bit_ext; vkdescriptorsetlayoutbindingflagscreateinfoext binding_flags = { … }; binding_flags.bindingcount = info.bindingcount; binding_flags.pbindingflags = &flags; info.pnext = &binding_flags; for each pbinding entry, we have a corresponding flags field where we can specify various flags. the descriptor_indexing extension has very fine-grained support, but update_after_bind_bit and variable_descriptor_count_bit are the most interesting ones to discuss. variable_descriptor_count deserves special attention as it makes descriptor management far more flexible. having to use a fixed array size can be somewhat awkward, since in a common usage pattern with a large descriptor heap, there is no natural upper limit to how many descriptors we want to use. we could settle for some arbitrarily high limit like 500k, but that means all descriptor sets we allocate have to be of that size and all pipelines have to be tied to that specific number. this is not necessarily what we want, and variable_descriptor_count allows us to allocate just the number of descriptors we need per descriptor set. this makes it far more practical to use multiple bindless descriptor sets. when allocating a descriptor set, we pass down the actual number of descriptors to allocate: vkdescriptorsetvariabledescriptorcountallocateinfoext variable_info = { … }; variable_info.stype = vk_structure_type_descriptor_set_variable_descriptor_count_allocate_info_ext; variable_info.descriptorsetcount = 1; allocate_info.pnext = &variable_info; variable_info.pdescriptorcounts = &numdescriptorsstreaming; vk_check(vkallocatedescriptorsets(get_device().get_handle(), &allocate_info, &descriptors.descriptor_set_update_after_bind)); gpu-assisted validation and debugging when we enter the world of descriptor indexing, there is a flipside where debugging and validation is much more difficult. the major benefit of the older binding models is that it is fairly easy for validation layers and debuggers to know what is going on. this is because the number of available resources to a shader is small and focused. with update_after_bind in particular, we do not know anything at draw time, which makes this awkward. it is possible to enable gpu assisted validation in the khronos validation layers. this lets you catch issues like: "unassigned-descriptor uninitialized: validation error: [ unassigned-descriptor uninitialized ] object 0: handle = 0x55625acf5600, type = vk_object_type_queue; | messageid = 0x893513c7 | descriptor index 67 is uninitialized__. command buffer (0x55625b184d60). draw index 0x4. pipeline (0x520000000052). shader module (0x510000000051). shader instruction index = 59. stage = fragment. fragment coord (x,y) = (944.5, 0.5). unable to find spir-v opline for source information. build shader with debug info to get source information." or: "unassigned-descriptor uninitialized: validation error: [ unassigned-descriptor uninitialized ] object 0: handle = 0x55625acf5600, type = vk_object_type_queue; | messageid = 0x893513c7 | descriptor index 131 is uninitialized__. command buffer (0x55625b1893c0). draw index 0x4. pipeline (0x520000000052). shader module (0x510000000051). shader instruction index = 59. stage = fragment. fragment coord (x,y) = (944.5, 0.5). unable to find spir-v opline for source information. build shader with debug info to get source information." renderdoc supports debugging descriptor indexing through shader instrumentation, and this allows you to inspect which resources were accessed. when you have several thousand resources bound to a pipeline, this feature is critical to make any sense of the inputs. if we are using the update-after-bind style, we can inspect the exact resources we used. in a non-uniform indexing style, we can inspect all unique resources we used. conclusion descriptor indexing unlocks many design possibilities in your engine and is a real game changer for modern rendering techniques. use with care, and make sure to take advantage of all debugging tools available to you. you need them. this blog has explored the first vulkan extension game changer, with two more parts in this game changer blog series still to come. the next part will focus on ‘buffer device address’ and how developers can use this new feature to enhance their games. follow up thanks to hans-kristian arntzen and the team at arm for bringing this great content to the samsung developers community. we hope you find this information about vulkan extensions useful for developing your upcoming mobile games. the original version of this article can be viewed at arm community. the samsung developers site has many resources for developers looking to build for and integrate with samsung devices and services. stay in touch with the latest news by creating a free account or by subscribing to our monthly newsletter. visit the marketing resources page for information on promoting and distributing your apps and games. finally, our developer forum is an excellent way to stay up-to-date on all things related to the galaxy ecosystem.
Arm Developers
tutorials game, mobile
blogthe samsung developers team works with many companies in the mobile and gaming ecosystems. we're excited to support our friends, arm, as they bring timely and relevant content to developers looking to build games and high-performance experiences. this best practices series will help developers get the most out of the 3d hardware on samsung mobile devices. developing games is a true cross-disciplinary experience for developers, requiring both technical and creative skills to bring their gaming project to life. but all too often, the performance and visual needs of a project can be at odds. leading technology provider of processor ip, arm has developed artists’ best practices for mobile game development where game developers learn tips on creating performance-focused 3d assets, 2d assets, and scenes for mobile applications. before you cut those stunning visuals, get maximum benefit from arm's best practices by reviewing these four topics: geometry, texturing, materials and shaders, and lighting. geometry to get a project performing well on as many devices as possible, the geometry consideration of a game should be taken seriously and optimized as much as possible. this section identifies what you need to know about using geometry properly on mobile devices. on mobile, how you use vertices matters more then almost any other platform. tips around how to avoid micro triangles and long thin triangles are great first steps in gaining performance. the next big step is to use level of details (lod). an lod system uses a lower-poly version of the model as an object moves further away from the camera. this helps keep the vertex count down and gives control over how objects look far away to the artist. this otherwise would be left to the gpu, trying its best to render a high number of vertices in only a few pixels, costing the performance of the project. to learn more, check real-time 3d art best practices: geometry. texturing textures make up 2d ui and are also mapped to the surface of 3d objects. learning about texturing best practices can bring big benefits to your game! even a straightforward technique such as texture aliasing, where you build multiple smaller textures into one larger texture, can bring a major performance gain for a project. you should understand what happens to a texture when the application runs. when the texture is exported, the common texture format is a png, jpg, or tga file. however, when the application is running, each texture is converted to specific compression formats that are designed to be read faster on the gpu. using the astc texture compression option not only helps your project’s performance, but also lets your textures look better. to learn other texturing best practices, such as texture filtering and channel packing, check real-time 3d art best practices: texturing. materials and shaders materials and shaders determine how 3d objects and visual effects appear on the screen. become familiar with what they do and how to optimize them. pair materials with texture atlas’s, allowing multiple objects in the same scene to share textures and materials. the game engine batches this object when drawing them to screen, saving bandwidth and increasing performance. when choosing shaders, use the simplest shader possible (like unlit) and avoid using unnecessary features. if you are authoring shaders, avoid complicated match operations (like sin, pow, cos, and noise). if you are in doubt about your shaders’ performance, arm provides tools to perform profiling on your shaders with the mali offline shader compiler. there is a lot more to learn, so check out real-time 3d art best practices: materials and shaders for more information. lighting in most games, lighting can be one of the most critical parts of a visual style. lighting can set the mood, lead game play, and identify threats and objectives. this can make or break the visuals of a game. but lighting can quickly be at odds with the performance needs of the project. to help avoid this hard choice, learn about the difference between static and dynamic light, optimization of light, how to fake lighting, and the benefits of the different type and settings of lights. often on mobile, it is worth faking as much as possible when it comes to shadows. real time shadows are expensive! dynamic objects often try using a 3d mesh, plane, or quad with a dark shadow texture for a shadow rather than resorting to dynamic lights. for dynamic game objects, where you cannot fake lighting, use light probes. these have the same benefits of light maps and can be calculated offline. a light probe stores the light that passes through empty space in your scene. this data can then be used to light dynamic objects, which helps integrate them visually with lightmapped objects throughout your scene. lighting is a large topic with lots of possible optimizations. read more at real-time 3d art best practices in unity: lighting. arm and samsung devices arm’s cortex-a cpus and mali gpus power the world’s smartphones, with mali gpus powering mobile graphics. this means you can find arm gpus in an extensive list of popular samsung devices, including the samsung galaxy a51 and galaxy s21. arm provides practical tips and advice for teams developing real time 3d or 2d content for arm-based devices. mobile game performance analysis has never been more important every year mobile gaming grows! it is now worth 77.2 billion us dollars in revenue in 2020. growth in this sector is expected to continue in 2021 and beyond. with more mobile devices coming out each year, it is important for your content to be able to run on as many devices as possible, while providing players with the best possible experience. the artist best practices is just one part of the educational materials from arm. alongside these best practices, you can explore the unity learn course, arm & unity presents: 3d art optimization for mobile applications. this course includes a downloadable project that shows off the many benefits of using the best practices. for more advanced users, check out arm’s mali gpu best practices guide and learn about performance analysis with arm mobile studio. follow up thanks to joe rozek and the team at arm for bringing these great ideas to the samsung developers community. we hope you put these best practices into effect on your upcoming mobile games. the samsung developers site has many resources for developers looking to build for and integrate with samsung devices and services. stay in touch with the latest news by creating a free account or by subscribing to our monthly newsletter. visit the marketing resources page for information on promoting and distributing your apps and games. finally, our developer forum is an excellent way to stay up-to-date on all things related to the galaxy ecosystem.
Arm Developers
tutorials game, mobile
blogthe samsung developers team works with many companies in the mobile and gaming ecosystems. we're excited to support our partner, arm, as they bring timely and relevant content to developers looking to build games and high-performance experiences. this vulkan extensions series will help developers get the most out of the new and game-changing vulkan extensions on samsung mobile devices. android is enabling a host of useful new vulkan extensions for mobile. these new extensions are set to improve the state of graphics apis for modern applications, enabling new use cases and changing how developers can design graphics renderers going forward. in particular, in android r, there has been a whole set of vulkan extensions added. these extensions will be available across various android smartphones, including the samsung galaxy s21, which was recently launched on 14 january. existing samsung galaxy s models, such as the samsung galaxy s20, also allow upgrades to android r. one of these new vulkan extensions for mobile are ‘maintenance extensions’. these plug up various holes in the vulkan specification. mostly, a lack of these extensions can be worked around, but it is annoying for application developers to do so. having these extensions means less friction overall, which is a very good thing. vk_khr_uniform_buffer_standard_layout this extension is a quiet one, but i still feel it has a lot of impact since it removes a fundamental restriction for applications. getting to data efficiently is the lifeblood of gpu programming. one thing i have seen trip up developers again and again are the antiquated rules for how uniform buffers (ubo) are laid out in memory. for whatever reason, ubos have been stuck with annoying alignment rules which go back to ancient times, yet ssbos have nice alignment rules. why? as an example, let us assume we want to send an array of floats to a shader: #version 450 layout(set = 0, binding = 0, std140) uniform ubo { float values[1024]; }; layout(location = 0) out vec4 fragcolor; layout(location = 0) flat in int vindex; void main() { fragcolor = vec4(values[vindex]); } if you are not used to graphics api idiosyncrasies, this looks fine, but danger lurks around the corner. any array in a ubo will be padded out to have 16 byte elements, meaning the only way to have a tightly packed ubo is to use vec4 arrays. somehow, legacy hardware was hardwired for this assumption. ssbos never had this problem. std140 vs std430 you might have run into these weird layout qualifiers in glsl. they reference some rather old glsl versions. std140 refers to glsl 1.40, which was introduced in opengl 3.1, and it was the version uniform buffers were introduced to opengl. the std140 packing rules define how variables are packed into buffers. the main quirks of std140 are: vectors are aligned to their size. notoriously, a vec3 is aligned to 16 bytes, which have tripped up countless programmers over the years, but this is just the nature of vectors in general. hardware tends to like aligned access to vectors. array element sizes are aligned to 16 bytes. this one makes it very wasteful to use arrays of float and vec2. the array quirk mirrors hlsl’s cbuffer. after all, both opengl and d3d mapped to the same hardware. essentially, the assumption i am making here is that hardware was only able to load 16 bytes at a time with 16 byte alignment. to extract scalars, you could always do that after the load. std430 was introduced in glsl 4.30 in opengl 4.3 and was designed to be used with ssbos. std430 removed the array element alignment rule, which means that with std430, we can express this efficiently: #version 450 layout(set = 0, binding = 0, std430) readonly buffer ssbo { float values[1024]; }; layout(location = 0) out vec4 fragcolor; layout(location = 0) flat in int vindex; void main() { fragcolor = vec4(values[vindex]); } basically, the new extension enables std430 layout for use with ubos as well. #version 450 #extension gl_ext_scalar_block_layout : require layout(set = 0, binding = 0, std430) uniform ubo { float values[1024]; }; layout(location = 0) out vec4 fragcolor; layout(location = 0) flat in int vindex; void main() { fragcolor = vec4(values[vindex]); } why not just use ssbos then? on some architectures, yes, that is a valid workaround. however, some architectures also have special caches which are designed specifically for ubos. improving memory layouts of ubos is still valuable. gl_ext_scalar_block_layout? the vulkan glsl extension which supports std430 ubos goes a little further and supports the scalar layout as well. this is a completely relaxed layout scheme where alignment requirements are essentially gone, however, that requires a different vulkan extension to work. vk_khr_separate_depth_stencil_layouts depth-stencil images are weird in general. it is natural to think of these two aspects as separate images. however, the reality is that some gpu architectures like to pack depth and stencil together into one image, especially with d24s8 formats. expressing image layouts with depth and stencil formats have therefore been somewhat awkward in vulkan, especially if you want to make one aspect read-only and keep another aspect as read/write, for example. in vulkan 1.0, both depth and stencil needed to be in the same image layout. this means that you are either doing read-only depth-stencil or read/write depth-stencil. this was quickly identified as not being good enough for certain use cases. there are valid use cases where depth is read-only while stencil is read/write in deferred rendering for example. eventually, vk_khr_maintenance2 added support for some mixed image layouts which lets us express read-only depth, read/write stencil, and vice versa: vk_image_layout_depth_attachment_stencil_read_only_optimal_khr vk_image_layout_depth_read_only_stencil_attachment_optimal_khr usually, this is good enough, but there is a significant caveat to this approach, which is that depth and stencil layouts must be specified and transitioned together. this means that it is not possible to render to a depth aspect, while transitioning the stencil aspect concurrently, since changing image layouts is a write operation. if the engine is not designed to couple depths and stencil together, it causes a lot of friction in implementation. what this extension does is completely decouple image layouts for depth and stencil aspects and makes it possible to modify the depth or stencil image layouts in complete isolation. for example: vkimagememorybarrier barrier = {…}; normally, we would have to specify both depth and stencil aspects for depth-stencil images. now, we can completely ignore what stencil is doing and only modify depth image layout. barrier.subresourcerange.aspectmask = vk_image_aspect_depth_bit; barrier.oldlayout = vk_image_layout_depth_attachment_optimal_khr; barrier.newlayout = vk_image_layout_depth_read_only_optimal; similarly, in vk_khr_create_renderpass2, there are extension structures where you can specify stencil layouts separately from the depth layout if you wish. typedef struct vkattachmentdescriptionstencillayout { vkstructuretype stype; void* pnext; vkimagelayout stencilinitiallayout; vkimagelayout stencilfinallayout; } vkattachmentdescriptionstencillayout; typedef struct vkattachmentreferencestencillayout { vkstructuretype stype; void* pnext; vkimagelayout stencillayout; } vkattachmentreferencestencillayout; like image memory barriers, it is possible to express layout transitions that only occur in either depth or stencil attachments. vk_khr_spirv_1_4 each core vulkan version has targeted a specific spir-v version. for vulkan 1.0, we have spir-v 1.0. for vulkan 1.1, we have spir-v 1.3, and for vulkan 1.2 we have spir-v 1.5. spir-v 1.4 was an interim version between vulkan 1.1 and 1.2 which added some nice features, but the usefulness of this extension is largely meant for developers who like to target spir-v themselves. developers using glsl or hlsl might not find much use for this extension. some highlights of spir-v 1.4 that i think are worth mentioning are listed here. opselect between composite objects opselect before spir-v 1.4 only supports selecting between scalars and vectors. spir-v 1.4 thus allows you to express this kind of code easily with a simple opselect: mystruct s = cond ? mystruct(1, 2, 3) : mystruct(4, 5, 6); opcopylogical there are scenarios in high-level languages where you load a struct from a buffer and then place it in a function variable. if you have ever looked at spir-v code for this kind of scenario, glslang would copy each element of the struct one by one, which generates bloated spir-v code. this is because the struct type that lives in a buffer and a struct type for a function variable are not necessarily the same. offset decorations are the major culprits here. copying objects in spir-v only works when the types are exactly the same, not “almost the same”. opcopylogical fixes this problem where you can copy objects of types which are the same except for decorations. advanced loop control hints spir-v 1.4 adds ways to express partial unrolling, how many iterations are expected, and such advanced hints, which can help a driver optimize better using knowledge it otherwise would not have. there is no way to express these in normal shading languages yet, but it does not seem difficult to add support for it. explicit look-up tables describing look-up tables was a bit awkward in spir-v. the natural way to do this in spir-v 1.3 is to declare an array with private storage scope with an initializer, access chain into it and load from it. however, there was never a way to express that a global variable is const, which relies on compilers to be a little smart. as a case study, let us see what glslang emits when using vulkan 1.1 target environment: #version 450 layout(location = 0) out float fragcolor; layout(location = 0) flat in int vindex; const float lut[4] = float[](1.0, 2.0, 3.0, 4.0); void main() { fragcolor = lut[vindex]; } %float_1 = opconstant %float 1 %float_2 = opconstant %float 2 %float_3 = opconstant %float 3 %float_4 = opconstant %float 4 %16 = opconstantcomposite %_arr_float_uint_4 %float_1 %float_2 %float_3 %float_4 this is super weird code, but it is easy for compilers to promote to a lut. if the compiler can prove there are no readers before the opstore, and only one opstore can statically happen, compiler can optimize it to const lut. %indexable = opvariable %_ptr_function__arr_float_uint_4 function opstore %indexable %16 %24 = opaccesschain %_ptr_function_float %indexable %index %25 = opload %float %24 in spir-v 1.4, the nonwritable decoration can also be used with private and function storage variables. add an initializer, and we get something that looks far more reasonable and obvious: opdecorate %indexable nonwritable %16 = opconstantcomposite %_arr_float_uint_4 %float_1 %float_2 %float_3 %float_4 // initialize an array with a constant expression and mark it as nonwritable. // this is trivially a lut. %indexable = opvariable %_ptr_function__arr_float_uint_4 function %16 %24 = opaccesschain %_ptr_function_float %indexable %index %25 = opload %float %24 vk_khr_shader_subgroup_extended_types this extension fixes a hole in vulkan subgroup support. when subgroups were introduced, it was only possible to use subgroup operations on 32-bit values. however, with 16-bit arithmetic getting more popular, especially float16, there are use cases where you would want to use subgroup operations on smaller arithmetic types, making this kind of shader possible: #version 450 // subgroupadd #extension gl_khr_shader_subgroup_arithmetic : require for fp16 arithmetic: #extension gl_ext_shader_explicit_arithmetic_types_float16 : require for subgroup operations on fp16: #extension gl_ext_shader_subgroup_extended_types_float16 : require layout(location = 0) out f16vec4 fragcolor; layout(location = 0) in f16vec4 vcolor; void main() { fragcolor = subgroupadd(vcolor); } vk_khr_imageless_framebuffer in most engines, using vkframebuffer objects can feel a bit awkward, since most engine abstractions are based around some idea of: myrenderapi::bindrendertargets(colorattachments, depthstencilattachment) in this model, vkframebuffer objects introduce a lot of friction, since engines would almost certainly end up with either one of two strategies: create a vkframebuffer for every render pass, free later. maintain a hashmap of all observed attachment and render-pass combinations. unfortunately, there are some … reasons why vkframebuffer exists in the first place, but vk_khr_imageless_framebuffer at least removes the largest pain point. this is needing to know the exact vkimageviews that we are going to use before we actually start rendering. with imageless frame buffers, we can defer the exact vkimageviews we are going to render into until vkcmdbeginrenderpass. however, the frame buffer itself still needs to know about certain metadata ahead of time. some drivers need to know this information unfortunately. first, we set the vk_framebuffer_create_imageless_bit flag in vkcreateframebuffer. this removes the need to set pattachments. instead, we specify some parameters for each attachment. we pass down this structure as a pnext: typedef struct vkframebufferattachmentscreateinfo { vkstructuretype stype; const void* pnext; uint32_t attachmentimageinfocount; const vkframebufferattachmentimageinfo* pattachmentimageinfos; } vkframebufferattachmentscreateinfo; typedef struct vkframebufferattachmentimageinfo { vkstructuretype stype; const void* pnext; vkimagecreateflags flags; vkimageusageflags usage; uint32_t width; uint32_t height; uint32_t layercount; uint32_t viewformatcount; const vkformat* pviewformats; } vkframebufferattachmentimageinfo; essentially, we need to specify almost everything that vkcreateimage would specify. the only thing we avoid is having to know the exact image views we need to use. to begin a render pass which uses imageless frame buffer, we pass down this struct in vkcmdbeginrenderpass instead: typedef struct vkrenderpassattachmentbegininfo { vkstructuretype stype; const void* pnext; uint32_t attachmentcount; const vkimageview* pattachments; } vkrenderpassattachmentbegininfo; conclusions overall, i feel like this extension does not really solve the problem of having to know images up front. knowing the resolution, usage flags of all attachments up front is basically like having to know the image views up front either way. if your engine knows all this information up-front, just not the exact image views, then this extension can be useful. the number of unique vkframebuffer objects will likely go down as well, but otherwise, there is in my personal view room to greatly improve things. in the next blog on the new vulkan extensions, i explore 'legacy support extensions.' follow up thanks to hans-kristian arntzen and the team at arm for bringing this great content to the samsung developers community. we hope you find this information about vulkan extensions useful for developing your upcoming mobile games. the samsung developers site has many resources for developers looking to build for and integrate with samsung devices and services. stay in touch with the latest news by creating a free account or by subscribing to our monthly newsletter. visit the marketing resources page for information on promoting and distributing your apps and games. finally, our developer forum is an excellent way to stay up-to-date on all things related to the galaxy ecosystem.
Arm Developers
tutorials game, mobile
blogin this article i would like to introduce a hardware optimisation technique called variable rate shading (vrs) and how this technique can benefit games on mobile phones. introduction traditionally, each shaded pixel in a rendered image is being shaded individually, meaning we can shade very high details anywhere in the image which, in theory, is great. however, in practice, this can lead to wasteful gpu calculations for areas where details are less important. in some cases, you do not need 1x1 shading of pixels to produce a high quality image. for example, for those areas that represent unlit surfaces caused by shadows naturally contain less details than brighter lit areas. moreover, areas which are out of focus due to camera post-effects and areas affected by motion blur naturally do not contain high details. in these cases we could benefit from letting multiple pixels be shaded by just a single calculation (like a 2x2 or 4x4 area of pixels) without losing any noticeable visual quality. the high resolution sky texture on the left looks very much like the lower resolution sky texture on the right. this is due to the smooth colour gradients and lack of high frequency colour variation. for those reasons, there is room for a lot of optimisation. you could argue that optimisation for handheld devices, like mobile phones, is more essential than on stationary devices, like games consoles, due to a couple of reasons. firstly, the hardware on handheld devices is often less powerful than conventional hardware due to smaller size and less electrical power supply. the compact size of the hardware for handheld devices are also the reason why they are more likely to suffer from temperature issues causing thermal throttling, where the performance slows down significantly. secondly, heavy graphics in games can quickly drain your phone's battery life. so, it is crucial to keep the gpu resources to a minimum when possible. variable rate shading is a way to help doing just that. how does variable rate shading work? in principle, variable rate shading is actually a very simple method which can be implemented without having to redesign an existing rendering pipeline. there are three ways to define areas to be optimised using variable rate shading: let an attachment in the form of an image serve as a mask. execute the optimisation on a per-triangle basis. let the vrs optimisation be based on a per-draw call. use an attachment as a mask you can provide the gpu with an image that serves as a mask. the mask contains information about what areas need to be rendered in a traditional manner by shading each pixel individually, and which areas need to be optimised by shading a group of pixels at once. the image below is visualising such a mask by colour-coding different areas: the blue area does not have any optimisation applied (1x1) as this area is where the player focuses on when driving. the green area is optimised by shading four pixels (2x2) by only one shading calculation, as this area contains less details due to motion blur. the red area can be optimised even more (4x4), as it is affected by a more aggressive motion blur. the yellow and purple areas are also shaded with less shading calculations. the areas defined in the image above could be static, at least while the player is driving the boat at top speed, as the boat is positioned at the centre of the image at all times. however, the level of optimisation could be reduced when another boat is passing by or when the boat slows down and therefore the motion blur is gradually reduced. there are times where a more dynamic approach is needed, as it sometimes can be difficult to know beforehand what areas should be optimised and what areas should be shaded in a traditional manner. in those cases, it could be beneficial to generate the mask more dynamically by rendering the geometry for the scene in an extra pass. simply colour the geometric elements in the scene and pass it to the gpu as a mask for the variable rate shading optimisation. if the scene is rendered by using deferred lighting, an extra pass may not be needed as the mask could be based on the default geometry pass required for deferred shading. optimisation based on primitives another way of using variable rate shading is taking advantage of other extensions as they allow you to define geometric elements to be optimised rather than using a mask. this can be done on a per-triangle basis or simply done by a per-draw call. defining geometric elements could be a more efficient approach as there is no need for generating a mask as well as needing less memory bandwidth. for the per-triangle basis extension, you are able to define the optimisation level in the vertex shader. for the per-draw call method, the optimisation level can be defined before the draw call takes place. keep in mind that the three methods can be combined if needed. the image below is a rendering pass where all objects in a scene are shaded in different colours to define what areas should be shaded in a traditional manner (meaning no optimisation) and what areas contain less details (therefore needing less gpu calculations). the areas defined above can be defined by all three methods. in general, by breaking a scene up in layers, where the elements nearest the camera have less optimisation and layers in the background have the most optimisation, would be an effective way to go about it. the image below shows the same scene, but this time we see the final output where vrs is on and off. as you may have noticed, it is very hard to tell any difference when the vrs optimisation is turned on or off. experiences with variable rate shading so far some commercial games have already successfully implemented variable rate shading. the image below is from wolfenstein young blood. as you may have noticed, there is barely any visual difference when vrs is on or off, but you are able to tell a difference in frame rate. in fact, the game performs, on average, 10% or higher when vrs is turned on. that may not sound like a lot, but considering that it is an easy optimisation to implement, there is barely any noticeable change in the visual quality. the 10% performance boost is on top of other optimisation techniques and it is actually not a bad performance boost after all. other games have shown an even a higher performance boost. for example, gears tactics has a performance boost up to 30% when using variable rate shading. the image below is from that game. virtual reality variable rate shading can benefit virtual reality as well. not only does virtual reality by nature require two rendered images (one image for each eye), but the player who wears the virtual mask naturally pays most attention to the central area of the rendered image. the areas of the rendered image that are seen from the corner of your eye naturally do not need the same amount of details as the central area of the rendered images. that means even though a static vrs mask can be used for a reasonable overall optimisation, using an eye tracker could result in an even more efficient optimisation and therefore less noticeable quality reduction. it is crucial to have a consistent high frame rate for virtual reality. if the frame rate is not relatively consistent or the rendering performance is suffering for a consistent low frame rate, it quickly gets uncomfortable to wear a vr headset and the player might even get dizzy and feel physically sick. by reducing the gpu calculations, using variable rate shading not only boosts the frame rate, it also uses less battery for mobile devices. this is a huge win for systems like samsung gear vr where a long battery life is much appreciated as the graphics are running on a galaxy mobile phone. the image below shows a variable rate shading mask generated by eye tracking technology for a virtual reality headset. the centre of the left and right images shade pixels in a traditional manner. the other colours represent different degrees of optimisation areas. which samsung devices support variable rate shading? all hardware listed here supports variable rate shading. mobile phones: samsung galaxy s22, s22+ and s22 ultra tablets: samsung tab s8, s8+ and s8 ultra the following graphics apis, vulkan and opengl es 2.0 (and higher), both support variable rate shading. the opengl extensions for the three ways of using variable rate shading are the following: gl_ext_fragment_shading_rate_attachment for allowing to send a mask to the gpu. gl_ext_fragment_shading_rate_primitive for per-triangle basis, where writing a value to gl_primitiveshadingrateext in the vertex shader defines the level of optimisation. gl_ext_fragment_shading_rate for per-draw call, where glshadingrateext should be called to define the optimisation level. the extension that enables variable rate shading for vulkan is vk_khr_fragment_shading_rate. conclusion in this article, we have established the following: variable rate shading is a hardware feature and is fairly easy to implement as it does not require any redesign of existing rendering pipelines. variable rate shading is an optimisation technique which reduces gpu calculations by allowing a group of pixels to be shaded by the same colour rather than each pixel individually. variable rate shading is particularly useful for mobile gaming as well as samsung gear vr, as it boosts performance and prolongs battery life. the level of optimisation can be defined by passing a mask to the gpu that contains areas of different optimisation levels. some implementations have proven to boost the framerate 10% or higher, while other implementations manage to increase the frame rate up to 30%. note: some images in this post are courtesy of ul solutions. additional resources on the samsung developers site the samsung developers site has many resources for developers looking to build for and integrate with samsung devices and services. stay in touch with the latest news by creating a free account and subscribing to our monthly newsletter. visit the marketing resources page for information on promoting and distributing your apps. finally, our developer forum is an excellent way to stay up-to-date on all things related to the galaxy ecosystem.
Søren Klit Lambæk
tutorials game, mobile
blogwith the increasing popularity of foldable phones such as the galaxy z fold3 and galaxy z flip3, apps on these devices are adopting its foldable features. in this blog, you can get started on how to utilize these foldable features on android game apps. we focus on creating a java file containing an implementation of the android jetpack windowmanager library that can be imported into game engines like unity or unreal engine. this creates an interface allowing developers to retrieve information about the folding feature on the device. at the end of this blog, you can go deeper in learning by going to code lab. android jetpack windowmanager android jetpack, in their own words, is "a suite of libraries to help developers follow best practices, reduce boilerplate code, and write code that works consistently across android versions and devices so that developers can focus on the code they care about." windowmanager is one of these libraries, and is intended to help application developers support new device form factors and multi-window environments. the library had its 1.0.0 release in january 2022 for targeted foldable devices. according to its documentation, future versions will be extended to more display types and window features. creating the android jetpack windowmanager setup as previously mentioned, we are creating a java file that can be imported into either unity or unreal engine 4, to create an interface for retrieving information on the folding feature and pass it over to the native or engine side of your applications. set up the foldablehelper class and data storage class create a file called foldablehelper.java in visual studio or any source code editor. let's start off by giving it a package name of package com.samsung.android.gamedev.foldable; next, let's import all the necessary libraries and classes in this file: //android imports import android.app.activity; import android.graphics.rect; import android.os.handler; import android.os.looper; import android.util.log; //android jetpack windowmanager imports import androidx.annotation.nonnull; import androidx.core.util.consumer; import androidx.window.java.layout.windowinfotrackercallbackadapter; import androidx.window.layout.displayfeature; import androidx.window.layout.foldingfeature; import androidx.window.layout.windowinfotracker; import androidx.window.layout.windowlayoutinfo; import androidx.window.layout.windowmetrics; import androidx.window.layout.windowmetricscalculator; //java imports import java.util.list; import java.util.concurrent.executor; start by creating a class, foldablehelper, that is going to contain all of our helper functions. let's then create variables to store a callback object as well as windowinfotrackercallbackadapter and windowmetricscalculator. let's also create a temporary declaration of the native function to pass the data from java to the native side of application once we start working in the game engines. public class foldablehelper { private static layoutstatechangecallback layoutstatechangecallback; private static windowinfotrackercallbackadapter wit; private static windowmetricscalculator wmc; public static native void onlayoutchanged(foldablelayoutinfo resultinfo); } let's create a storage class to hold the data received from the windowmanager library. an instance of this class will also be passed to the native code to transfer the data. public static class foldablelayoutinfo { public static int undefined = -1; // hinge orientation public static int hinge_orientation_horizontal = 0; public static int hinge_orientation_vertical = 1; // state public static int state_flat = 0; public static int state_half_opened = 1; // occlusion type public static int occlusion_type_none = 0; public static int occlusion_type_full = 1; rect currentmetrics = new rect(); rect maxmetrics = new rect(); int hingeorientation = undefined; int state = undefined; int occlusiontype = undefined; boolean isseparating = false; rect bounds = new rect(); } initialize the windowinfotracker since we are working in java and the windowmanager library is written in kotlin, we have to use the windowinfotrackercallbackadapter. this is an interface provided by android to enable the use of the windowinfotracker from java. the window info tracker is how we receive information about any foldable features inside the window's bounds. next is to create windowmetricscalculator, which lets us retrieve the window metrics of an activity. window metrics consists of the windows' current and maximum bounds. we also create a new layoutstatechangecallback object. this object is passed into the window info tracker as a listener object and is called every time the layout of the device changes (for our purposes this is when the foldable state changes). public static void init(activity activity) { //create window info tracker wit = new windowinfotrackercallbackadapter(windowinfotracker.companion.getorcreate(activity)); //create window metrics calculator wmc = windowmetricscalculator.companion.getorcreate(); //create callback object layoutstatechangecallback = new layoutstatechangecallback(activity); } set up and attach the callback listener in this step, let's attach the layoutstatechangecallback to the windowinfotrackercallbackadapter as a listener. the addwindowlayoutinfolistener function takes three parameters: the activity to attach the listener to, an executor, and a consumer of windowlayoutinfo. we will set up the executor and consumer in a moment. the adding of the listener is kept separate from the initialization, since the first windowlayoutinfo is not emitted until activity.onstart has been called. as such, we'll likely not be needing to attach the listener until during or after onstart, but we can still set up the windowinfotracker and windowmetricscalculator ahead of time. public static void start(activity activity) { wit.addwindowlayoutinfolistener(activity, runonuithreadexecutor(), layoutstatechangecallback); } now, let's create the executor for the listener. this executor is straightforward and simply runs the command on the mainlooper of our activity. it is possible to set this up to run on a custom thread, however this is not going to be covered in this blog. for more information, we recommend checking the official documentation for the jetpack windowmanager. static executor runonuithreadexecutor() { return new myexecutor(); } static class myexecutor implements executor { handler handler = new handler(looper.getmainlooper()); @override public void execute(runnable command) { handler.post(command); } } we're going to create the basic layout of our layoutstatechangecallback. this consumes windowlayoutinfo and implements consumer<windowlayoutinfo>. for now, let's simply lay out the class and give it some functionality a little bit later. static class layoutstatechangecallback implements consumer<windowlayoutinfo> { private final activity activity; public layoutstatechangecallback(activity activity) { this.activity = activity; } } if the use of the listener is no longer needed, we want a way to remove it and the windowinfotrackercallbackadapter contains a function to do just that. public static void stop() { wit.removewindowlayoutinfolistener(layoutstatechangecallback); } this just tidies things up for us and ensures that the listener is cleaned up when we no longer need it. next, we're going to add some functionality to the layoutstatechangecallback class. we are going to process windowlayoutinfo into foldablelayoutinfo we created previously. using java native interface (jni), we are going to send that information over to the native side using the function onlayoutchanged. note: this doesn't actually do anything yet, but we cover how to set this up in unreal engine and in unity through code lab tutorials. static class layoutstatechangecallback implements consumer<windowlayoutinfo> { @override public void accept(windowlayoutinfo windowlayoutinfo) { foldablelayoutinfo resultinfo = updatelayout(windowlayoutinfo, activity); onlayoutchanged(resultinfo); } } let's implement the updatelayout function to process windowlayoutinfo and return a foldablelayoutinfo. firstly, create a foldablelayoutinfo that contains the processed information. follow this up by getting the window metrics, both maximum metrics and current metrics. private static foldablelayoutinfo updatelayout(windowlayoutinfo windowlayoutinfo, activity activity) { foldablelayoutinfo retlayoutinfo = new foldablelayoutinfo(); windowmetrics wm = wmc.computecurrentwindowmetrics(activity); retlayoutinfo.currentmetrics = wm.getbounds(); wm = wmc.computemaximumwindowmetrics(activity); retlayoutinfo.maxmetrics = wm.getbounds(); } get the displayfeatures present in the current window bounds using windowlayoutinfo.getdisplayfeatures. currently, the api only has one type of displayfeature: foldingfeatures, however in the future there will likely be more as screen types evolve. at this point, let's use a for loop to iterate through the resulting list until it finds a foldingfeature. once it detects a folding feature, it starts processing its data: orientation, state, seperation type, and its bounds. then, store these data in foldablelayoutinfo we've created at the start of the function call. you can learn more about these data by going to the jetpack windowmanager documentation. private static foldablelayoutinfo updatelayout(windowlayoutinfo windowlayoutinfo, activity activity) { foldablelayoutinfo retlayoutinfo = new foldablelayoutinfo(); windowmetrics wm = wmc.computecurrentwindowmetrics(activity); retlayoutinfo.currentmetrics = wm.getbounds(); wm = wmc.computemaximumwindowmetrics(activity); retlayoutinfo.maxmetrics = wm.getbounds(); list<displayfeature> displayfeatures = windowlayoutinfo.getdisplayfeatures(); if (!displayfeatures.isempty()) { for (displayfeature displayfeature : displayfeatures) { foldingfeature foldingfeature = (foldingfeature) displayfeature; if (foldingfeature != null) { if (foldingfeature.getorientation() == foldingfeature.orientation.horizontal) { retlayoutinfo.hingeorientation = foldablelayoutinfo.hinge_orientation_horizontal; } else { retlayoutinfo.hingeorientation = foldablelayoutinfo.hinge_orientation_vertical; } if (foldingfeature.getstate() == foldingfeature.state.flat) { retlayoutinfo.state = foldablelayoutinfo.state_flat; } else { retlayoutinfo.state = foldablelayoutinfo.state_half_opened; } if (foldingfeature.getocclusiontype() == foldingfeature.occlusiontype.none) { retlayoutinfo.occlusiontype = foldablelayoutinfo.occlusion_type_none; } else { retlayoutinfo.occlusiontype = foldablelayoutinfo.occlusion_type_full; } retlayoutinfo.isseparating = foldingfeature.isseparating(); retlayoutinfo.bounds = foldingfeature.getbounds(); return retlayoutinfo; } } } return retlayoutinfo; } if there's no folding feature detected, it simply returns the foldablelayoutinfo without setting its data leaving it with undefined (-1) values. conclusion the java file you have now created should be usable in new or existing unity and unreal engine projects, to provide access to the information on the folding feature. continue learning about it by going to the code lab tutorials showing how to use the file created here, to implement flex mode detection and usage in game applications. additional resources on the samsung developers site the samsung developers site has many resources for developers looking to build for and integrate with samsung devices and services. stay in touch with the latest news by creating a free account and subscribing to our monthly newsletter. visit the marketing resources page for information on promoting and distributing your apps. finally, our developer forum is an excellent way to stay up-to-date on all things related to the galaxy ecosystem.
Lochlann Henry Ramsay-Edwards
We use cookies to improve your experience on our website and to show you relevant advertising. Manage you settings for our cookies below.
These cookies are essential as they enable you to move around the website. This category cannot be disabled.
These cookies collect information about how you use our website. for example which pages you visit most often. All information these cookies collect is used to improve how the website works.
These cookies allow our website to remember choices you make (such as your user name, language or the region your are in) and tailor the website to provide enhanced features and content for you.
These cookies gather information about your browser habits. They remember that you've visited our website and share this information with other organizations such as advertisers.
You have successfully updated your cookie preferences.