Search | Samsung Developer

events iot, health, game, design, mobile, galaxy watch, foldable

blog

Code Lab at SDC23: A Quick Recap

the samsung developer conference 2023 (sdc23) happened on october 5, 2023, at moscone north in san francisco and online. among the many exciting activities at the conference for developers and tech enthusiasts, code lab offered a unique opportunity to learn about the latest samsung sdks and tools. code lab is a hands-on learning experience, providing participants with a platform to explore the diverse world of samsung development. code lab activities are accessible for developers of all skill levels and interests, ensuring that everyone, from beginners to experts, can find something exciting to explore. covering a wide array of topics within the code lab, the conference catered to the diverse interests of the participants. here's a quick look at some of the sdc23 topics: 1. smartthings participants had the chance to build a matter iot app using the smartthings home api and create virtual devices that they could control using the smartthings app or their own iot apps. they also learned how to develop a smartthings find-compatible device. these topics are all about connecting and enhancing the smart home experience. 2. galaxy z participants, who are interested in foldable technology, were able to develop a widget for the flex window. this topic opens new possibilities in app design and user interaction. 3. samsung wallet participants learned to integrate the "add to samsung wallet" button into sample partner services. they also learned to implement in-app payment into a sample merchant app using the samsung pay sdk. these topics focus on enhancing the mobile wallet experience for samsung users. 4. gamedev game developers and enthusiasts had the opportunity to optimize game performance with adaptive performance in unity. they also learned to implement flex mode into unity games for foldable phones. these topics offer insights into the gaming industry's latest trends and technologies. 5. watch face studio code lab also provided an activity for participants to create a watch face design with customized styles using watch face studio. participants also learned how to convert the watch face design for galaxy z flip5's flex window display using the good lock plugin. 6. samsung health the health-focused code lab topics covered measuring skin temperature on galaxy watch and transferring heart rate data from galaxy watch to a mobile device with the samsung privileged health sdk. participants also learned how to create health research apps using the samsung health stack. these topics provide valuable insights into the health and fitness tech landscape. from creating virtual devices to building health-related apps, participants left the conference with new knowledge they could apply to their development projects. the samsung developer conference is a celebration of innovation and collaboration in the tech world. with a diverse range of topics in code lab, participants were equipped with the tools and knowledge to push the boundaries of what is possible in samsung's ecosystem. though sdc23 has ended, the innovation lives on! whether you missed the event or just want to try other activities, you can visit the code lab page anytime, anywhere. we can't wait to see you and the innovations that will emerge from this conference in the coming years. see you at sdc24!

Christopher Marquez

https://developer.samsung.com/sdp/blog/en-us/2023/10/17/code-lab-at-sdc23-a-quick-recap

Connect Samsung Developer Conference

web

SDC23 | Samsung Developer

your impact made our sdc23 shine! samsung's innovations include bixby, knox, smartthings, and tizen.see sdc23 for a connected ecosystem with multi-device experiences. samsung developer conference 2023 thu, oct 5, 2023 10:00 am ptmoscone north in san francisco and online video thumbanil highlights though sdc23 has ended, the innovation lives on! whether you missed the event or just want to revisit the highlights, you can watch the excitement on demand. keynote discover samsung’s broad ecosystem of powerful, next-level tech and hear how samsung is building toward a smarter, safer, and more personally connected future. view keynote sessions view sessions dive into the future of connected customer experiences through tech sessions by developers offering further insight into the innovations introduced in the keynote. gamepad on tizen tv mega session screen experience, game, developer program, tizen this session provides valuable tips and techniques for game application developers and gamepad manufacturers. hdr10+ gaming mega session screen experience, game the hdr10+ gaming panel discussion covers an overview of hdr10+ gaming and how game developers can support it. games with samsung galaxy mega session mobile experience, game, android, mobile the latest in mobile gaming development technologies, responsive ui for flex mode, and mobile cloud gaming. exploring the digital health ecosystem: samsung health as digital front door mega session health experience, health, wearable, mobile new samsung health features, samsung privileged health sdk, and collaboration for research with samsung health stack. smartthings and matter tech session platform innovation, iot, open source, developer program get a brief introduction to matter, new enhancements with smartthings, and new developer tools that make it easy to integrate your devices. what's new and next in watch face studio 2023 tech session mobile experience, wearable, design, mobile let's learn the main new features of watch face studio 2023 and enjoy the new watch face studio plugin experience. speakers check out the speakers who joined us at sdc23 to share their experience and expertise, and get a sense of what you can expect from next year’s sdc event. view speakers code labs view code lab get hands-on with the latest development features through new code lab topics and samples introduced for sdc23. smartthings matter: build a matter iot app with smartthings home api 25 mins start smartthings develop a smartthings find-compatible device 30 mins start foldable develop a widget for flex window 25 mins start samsung wallet integrate 'add to samsung wallet' button into partner services 30 mins start gamedev galaxy z implement flex mode into a unity game 30 mins start watch face studio customize styles of a watch face with watch face studio 30 mins start tech square talk with product experts, experience innovations in tech square. catch up on new updates from samsung platforms and os like smartthings, knox and tizen, mobile & screen experience, home & health experience, sustainability. view tech square samsung c-lab meet six passionate entrepreneurs and start-ups accelerated by samsung c-lab, an in-house venture and start-up acceleration program. these start-ups are making waves in the healthcare and ai industries, and are here to showcase their latest innovations. view samsung c-lab prior years watch highlights of selected sessions from sdc events in last samsung developer conference. sdc22 october 12, 2022moscone north and onlinesan francisco, california sdc21 october 26, 2021online sdc19 october 29–30, 2019mcenery convention centersan jose, california sdc18 november 8-9, 2018moscone westsan francisco, california sdc17 october 18-19, 2017moscone westsan francisco, california sdc16 april 27-28, 2016moscone westsan francisco, california

https://developer.samsung.com/conference/sdc23

tutorials game

blog

Android Game Analysis with Arm Mobile Studio

game performance can vary between different android devices, because they use different chipsets, and gpus based on different mali architectures. your game may render at 60fps on a galaxy s20+, but how will it perform on the galaxy a51 5g? the a51 5g has good specs, but you may want to consider runtime changes based on the underlying hardware if your game is pushing the limits on flagship hardware. similarly, you may need to optimize your game to run at lower frame rates on the galaxy j and galaxy m models, which are often found in emerging markets. players quickly lose interest in a game if they experience drops in frame rate, or slow load times. they won’t play your game on long journeys if it drains battery or overheats the device. to reliably deploy a game globally, and ensure a great user experience, you need to performance test on a wide range of android devices. the arm mobile studio family of performance analysis tools provide games studios with a comprehensive game analysis workflow for android, giving information and advice at appropriate levels of detail, for technical artists, graphics developers, performance analysts and project leaders. monitor cpu and gpu activity arm streamline captures a comprehensive profile of your game running on an unrooted android device, and visualizes the cpu and gpu performance counter activity as you run your test scenario. you can see exactly how the cpu and gpu workloads are handled by the device, which helps you locate problem areas that might explain frame rate drops or thermal problems. however, spotting performance issues using streamline can be time-consuming, unless you know exactly what you’re looking for. your team may have varying levels of experience or expertise, so interpreting this data can be difficult. introducing performance advisor arm performance advisor is a lightweight reporting tool that transforms a streamline capture into a simple report that describes how your game performed on the device, and alerts you to problem areas that you should consider optimizing. it can be used by your whole team on a regular basis, to spot trends and diagnose problems early in the development cycle, when you’re best placed to do something about it. performance advisor reports the application frame rate, cpu load and gpu load, as well as content metrics about the workload running on the mali gpu. if it detects problematic areas, performance advisor tells you whether it’s the cpu or gpu that is struggling to process your application, and links to optimization advice to help you rectify it. a frame rate analysis chart shows how the application performed over time. the background color of the chart indicates how the game performed. when you’re hitting your target frame rate, the chart background is green. in this example, most of the chart is blue, telling us the gpu in the device is struggling to process fragment workloads. performance advisor can capture screenshots of your game, at the point that fps drops below a given threshold. this helps you to identify which content might be causing the problem. this provides valuable context when debugging and can allow some common elements to be spotted if repeated slowdowns occur. you can then investigate these frames more closely with graphics analyzer, to see exactly which graphics api calls were executing at that point. if you want to separately evaluate how different parts of the game performed, for example, loading screens, menu selection, and different levels or gameplay scenarios, you can annotate these regions in your game so that performance advisor can report data for them independently. performance budgeting because they have different performance expectations, it’s a good idea to set your own performance budgets for each device. for example, if you know the top frequency for the gpu in the device, and you have a target frame rate, you can calculate the absolute limit of gpu cost per frame. gpu cost per frame = gpu top frequency / target frame rate when you generate a report with performance advisor, you can pass in your performance budgets, which are then shown in the charts, so you can easily see if you’ve broken one. in the example below, we can see correlation between high numbers of execution engine cycles and drops in fps. performance advisor tells us that the gpu is busy with arithmetic operations, and that the shaders could be too complex. the report provides a link to an advice page on the arm developer website, that explains how to reduce arithmetic load in shaders. more charts include key performance metrics such as cpu cycles per frame and gpu bandwidth per frame, reported for read and write access. there are also charts showing the content workload , draw calls, primitives and pixels per frame, and the level of overdraw per pixel. download an example performance advisor report automated performance analysis it’s far easier to fix problems as they arise than is it to patch problems later on. performance advisor’s key application performance metrics are useful to monitor over daily runs, to see how changes to your application affect performance during development. arm mobile studio professional includes headless ci support - so you can easily deploy large-scale automated performance testing across multiple devices. using a device farm with a ci workflow, you can generate performance data and optimization advice automatically, every night, for several android devices. as you check in code, you can easily monitor how your content performs against your performance budgets over time, and raise alerts when you start to approach or break those budgets. the professional edition also enables you to build bespoke data dashboards from the data that been collected. performance advisor's machine-readable json reports can be imported into any json-compatible database and visualization platform, such as the elk stack. compare metrics between test runs to quickly determine which changes impacted performance, and which type of workload is the likely cause for a regression. query the data and compare performance against specific targets to identify optimization next steps. read more about how to integrate arm mobile studio into a ci workflow on the arm developer website. resources arm publishes various resources on their developer website, to help you optimize performance: optimization advice – quick reference to help you avoid common problems. mali best practices guide – comprehensive guide describing in detail how to ensure your content runs well on mali gpus. developer guides such as those for technical artists, covering best practises for geometry, textures, materials and shaders. download the mali gpu datasheet to see the different features and capabilities of arm mali gpus from the midgard-based mali-t720, to the latest valhall-based mali-g78. for detailed descriptions of all the performance counters you can analyze in each mali gpu refer to the mali gpu counter reference. get arm mobile studio arm mobile studio is free to use for interactive performance analysis. to use it headlessly in your ci workflow, you need an arm mobile studio professional license. download arm mobile studio

Arm Developer

https://developer.samsung.com/galaxy-gamedev/blog/en-us/2020/09/29/android-game-analysis-with-arm-mobile-studio

tutorials game, mobile

blog

New Game Changing Vulkan Extensions for Mobile: Descriptor Indexing

the samsung developers team works with many companies in the mobile and gaming ecosystems. we're excited to support our partner, arm, as they bring timely and relevant content to developers looking to build games and high-performance experiences. this vulkan extensions series will help developers get the most out of the new and game-changing vulkan extensions on samsung mobile devices. as i mentioned previously, android is enabling a host of useful new vulkan extensions for mobile. these new extensions are set to improve the state of graphics apis for modern applications, enabling new use cases and changing how developers can design graphics renderers going forward. these extensions will be available across various android smartphones, including the new samsung galaxy s21, which was recently launched on 14 january. existing samsung galaxy s models, such as the samsung galaxy s20, also allow upgrades to android r. i have already discussed two of these extensions in previous blogs - maintenance extensions and legacy support extensions. however, there are three further vulkan extensions for android that i believe are ‘game changers’. in the first of three blogs, i will explore these individual game changer extensions – what they do, why they can be useful and how to use them. the goal here is to not provide complete samples, but there should be enough to get you started. the first vulkan extension is ‘descriptor indexing.’ descriptor indexing can be available in handsets prior to android r release. to check what android devices are available with 'descriptor indexing' check here. you can also directly view the khronos group/ vulkan samples that are relevant to this blog here. vk_ext_descriptor_indexing introduction in recent years, we have seen graphics apis greatly evolve in their resource binding flexibility. all modern graphics apis now have some answer to how we can access a large swathes of resources in a shader. bindless a common buzzword that is thrown around in modern rendering tech is “bindless”. the core philosophy is that resources like textures and buffers are accessed through simple indices or pointers, and not singular “resource bindings”. to pass down resources to our shaders, we do not really bind them like in the graphics apis of old. simply write a descriptor to some memory and a shader can come in and read it later. this means the api machinery to drive this is kept to a minimum. this is a fundamental shift away from the older style where our rendering loop looked something like: render_scene() { foreach(drawable) { command_buffer->update_descriptors(drawable); command_buffer->draw(); } } now it looks more like: render_scene() { command_buffer->bind_large_descriptor_heap(); large_descriptor_heap->write_global_descriptors(scene, lighting, shadowmaps); foreach(drawable) { offset = large_descriptor_heap->allocate_and_write_descriptors(drawable); command_buffer->push_descriptor_heap_offsets(offset); command_buffer->draw(); } } since we have free-form access to resources now, it is much simpler to take advantage of features like multi-draw or other gpu driven approaches. we no longer require the cpu to rebind descriptor sets between draw calls like we used to. going forward when we look at ray-tracing, this style of design is going to be mandatory since shooting a ray means we can hit anything, so all descriptors are potentially used. it is useful to start thinking about designing for this pattern going forward. the other side of the coin with this feature is that it is easier to shoot yourself in the foot. it is easy to access the wrong resource, but as i will get to later, there are tools available to help you along the way. vk_ext_descriptor_indexing features this extension is a large one and landed in vulkan 1.2 as a core feature. to enable bindless algorithms, there are two major features exposed by this extension. non-uniform indexing of resources how resources are accessed has evolved quite a lot over the years. hardware capabilities used to be quite limited, with a tiny bank of descriptors being visible to shaders at any one time. in more modern hardware however, shaders can access descriptors freely from memory and the limits are somewhat theoretical. constant indexing arrays of resources have been with us for a long time, but mostly as syntactic sugar, where we can only index into arrays with a constant index. this is equivalent to not using arrays at all from a compiler point of view. layout(set = 0, binding = 0) uniform sampler2d textures[4]; const int constant_value = 2; color = texture(textures[constant_value], uv); hlsl in d3d11 has this restriction as well, but it has been more relaxed about it, since it only requires that the index is constant after optimization passes are run. dynamic indexing as an optional feature, dynamic indexing allows applications to perform dynamic indexing into arrays of resources. this allows for a very restricted form of bindless. outside compute shaders however, using this feature correctly is quite awkward, due to the requirement of the resource index being dynamically uniform. dynamically uniform is a somewhat intricate subject, and the details are left to the accompanying sample in khronosgroup/vulkan-samples. non-uniform indexing most hardware assumes that the resource index is dynamically uniform, as this has been the restriction in apis for a long time. if you are not accessing resources with a dynamically uniform index, you must notify the compiler of your intent. the rationale here is that hardware is optimized for dynamically uniform (or subgroup uniform) indices, so there is often an internal loop emitted by either compiler or hardware to handle every unique index that is used. this means performance tends to depend a bit on how divergent resource indices are. #extension gl_ext_nonuniform_qualifier : require layout(set = 0, binding = 0) uniform texture2d tex[]; layout(set = 1, binding = 0) uniform sampler sampler; color = texture(nonuniformext(sampler2d(tex[index], sampler)), uv); in hlsl, there is a similar mechanism where you use nonuniformresourceindex, for example. texture2d<float4> textures[] : register(t0, space0); samplerstate samp : register(s0, space0); float4 color = textures[nonuniformresourceindex(index)].sample(samp, uv); all descriptor types can make use of this feature, not just textures, which is quite handy! the nonuniformext qualifier removes the requirement to use dynamically uniform indices. see the code sample for more detail. update-after-bind a key component to make the bindless style work is that we do not have to … bind descriptor sets all the time. with the update-after-bind feature, we effectively block the driver from consuming descriptors at command recording time, which gives a lot of flexibility back to the application. the shader consumes descriptors as they are used and the application can freely update descriptors, even from multiple threads. to enable, update-after-bind we modify the vkdescriptorsetlayout by adding new binding flags. the way to do this is somewhat verbose, but at least update-after-bind is something that is generally used for just one or two descriptor set layouts throughout most applications: vkdescriptorsetlayoutcreateinfo info = { … }; info.flags = vk_descriptor_set_layout_create_update_after_bind_pool_bit_ext; const vkdescriptorbindingflagsext flags = vk_descriptor_binding_variable_descriptor_count_bit_ext | vk_descriptor_binding_partially_bound_bit_ext | vk_descriptor_binding_update_after_bind_bit_ext | vk_descriptor_binding_update_unused_while_pending_bit_ext; vkdescriptorsetlayoutbindingflagscreateinfoext binding_flags = { … }; binding_flags.bindingcount = info.bindingcount; binding_flags.pbindingflags = &flags; info.pnext = &binding_flags; for each pbinding entry, we have a corresponding flags field where we can specify various flags. the descriptor_indexing extension has very fine-grained support, but update_after_bind_bit and variable_descriptor_count_bit are the most interesting ones to discuss. variable_descriptor_count deserves special attention as it makes descriptor management far more flexible. having to use a fixed array size can be somewhat awkward, since in a common usage pattern with a large descriptor heap, there is no natural upper limit to how many descriptors we want to use. we could settle for some arbitrarily high limit like 500k, but that means all descriptor sets we allocate have to be of that size and all pipelines have to be tied to that specific number. this is not necessarily what we want, and variable_descriptor_count allows us to allocate just the number of descriptors we need per descriptor set. this makes it far more practical to use multiple bindless descriptor sets. when allocating a descriptor set, we pass down the actual number of descriptors to allocate: vkdescriptorsetvariabledescriptorcountallocateinfoext variable_info = { … }; variable_info.stype = vk_structure_type_descriptor_set_variable_descriptor_count_allocate_info_ext; variable_info.descriptorsetcount = 1; allocate_info.pnext = &variable_info; variable_info.pdescriptorcounts = &numdescriptorsstreaming; vk_check(vkallocatedescriptorsets(get_device().get_handle(), &allocate_info, &descriptors.descriptor_set_update_after_bind)); gpu-assisted validation and debugging when we enter the world of descriptor indexing, there is a flipside where debugging and validation is much more difficult. the major benefit of the older binding models is that it is fairly easy for validation layers and debuggers to know what is going on. this is because the number of available resources to a shader is small and focused. with update_after_bind in particular, we do not know anything at draw time, which makes this awkward. it is possible to enable gpu assisted validation in the khronos validation layers. this lets you catch issues like: "unassigned-descriptor uninitialized: validation error: [ unassigned-descriptor uninitialized ] object 0: handle = 0x55625acf5600, type = vk_object_type_queue; | messageid = 0x893513c7 | descriptor index 67 is uninitialized__. command buffer (0x55625b184d60). draw index 0x4. pipeline (0x520000000052). shader module (0x510000000051). shader instruction index = 59. stage = fragment. fragment coord (x,y) = (944.5, 0.5). unable to find spir-v opline for source information. build shader with debug info to get source information." or: "unassigned-descriptor uninitialized: validation error: [ unassigned-descriptor uninitialized ] object 0: handle = 0x55625acf5600, type = vk_object_type_queue; | messageid = 0x893513c7 | descriptor index 131 is uninitialized__. command buffer (0x55625b1893c0). draw index 0x4. pipeline (0x520000000052). shader module (0x510000000051). shader instruction index = 59. stage = fragment. fragment coord (x,y) = (944.5, 0.5). unable to find spir-v opline for source information. build shader with debug info to get source information." renderdoc supports debugging descriptor indexing through shader instrumentation, and this allows you to inspect which resources were accessed. when you have several thousand resources bound to a pipeline, this feature is critical to make any sense of the inputs. if we are using the update-after-bind style, we can inspect the exact resources we used. in a non-uniform indexing style, we can inspect all unique resources we used. conclusion descriptor indexing unlocks many design possibilities in your engine and is a real game changer for modern rendering techniques. use with care, and make sure to take advantage of all debugging tools available to you. you need them. this blog has explored the first vulkan extension game changer, with two more parts in this game changer blog series still to come. the next part will focus on ‘buffer device address’ and how developers can use this new feature to enhance their games. follow up thanks to hans-kristian arntzen and the team at arm for bringing this great content to the samsung developers community. we hope you find this information about vulkan extensions useful for developing your upcoming mobile games. the original version of this article can be viewed at arm community. the samsung developers site has many resources for developers looking to build for and integrate with samsung devices and services. stay in touch with the latest news by creating a free account or by subscribing to our monthly newsletter. visit the marketing resources page for information on promoting and distributing your apps and games. finally, our developer forum is an excellent way to stay up-to-date on all things related to the galaxy ecosystem.

Arm Developers

https://developer.samsung.com/galaxy-gamedev/blog/en-us/2021/06/28/new-game-changing-vulkan-extensions-for-mobile-descriptor-indexing

tutorials game, mobile

blog

Best Practices for Mobile Game Developers and Artists

the samsung developers team works with many companies in the mobile and gaming ecosystems. we're excited to support our friends, arm, as they bring timely and relevant content to developers looking to build games and high-performance experiences. this best practices series will help developers get the most out of the 3d hardware on samsung mobile devices. developing games is a true cross-disciplinary experience for developers, requiring both technical and creative skills to bring their gaming project to life. but all too often, the performance and visual needs of a project can be at odds. leading technology provider of processor ip, arm has developed artists’ best practices for mobile game development where game developers learn tips on creating performance-focused 3d assets, 2d assets, and scenes for mobile applications. before you cut those stunning visuals, get maximum benefit from arm's best practices by reviewing these four topics: geometry, texturing, materials and shaders, and lighting. geometry to get a project performing well on as many devices as possible, the geometry consideration of a game should be taken seriously and optimized as much as possible. this section identifies what you need to know about using geometry properly on mobile devices. on mobile, how you use vertices matters more then almost any other platform. tips around how to avoid micro triangles and long thin triangles are great first steps in gaining performance. the next big step is to use level of details (lod). an lod system uses a lower-poly version of the model as an object moves further away from the camera. this helps keep the vertex count down and gives control over how objects look far away to the artist. this otherwise would be left to the gpu, trying its best to render a high number of vertices in only a few pixels, costing the performance of the project. to learn more, check real-time 3d art best practices: geometry. texturing textures make up 2d ui and are also mapped to the surface of 3d objects. learning about texturing best practices can bring big benefits to your game! even a straightforward technique such as texture aliasing, where you build multiple smaller textures into one larger texture, can bring a major performance gain for a project. you should understand what happens to a texture when the application runs. when the texture is exported, the common texture format is a png, jpg, or tga file. however, when the application is running, each texture is converted to specific compression formats that are designed to be read faster on the gpu. using the astc texture compression option not only helps your project’s performance, but also lets your textures look better. to learn other texturing best practices, such as texture filtering and channel packing, check real-time 3d art best practices: texturing. materials and shaders materials and shaders determine how 3d objects and visual effects appear on the screen. become familiar with what they do and how to optimize them. pair materials with texture atlas’s, allowing multiple objects in the same scene to share textures and materials. the game engine batches this object when drawing them to screen, saving bandwidth and increasing performance. when choosing shaders, use the simplest shader possible (like unlit) and avoid using unnecessary features. if you are authoring shaders, avoid complicated match operations (like sin, pow, cos, and noise). if you are in doubt about your shaders’ performance, arm provides tools to perform profiling on your shaders with the mali offline shader compiler. there is a lot more to learn, so check out real-time 3d art best practices: materials and shaders for more information. lighting in most games, lighting can be one of the most critical parts of a visual style. lighting can set the mood, lead game play, and identify threats and objectives. this can make or break the visuals of a game. but lighting can quickly be at odds with the performance needs of the project. to help avoid this hard choice, learn about the difference between static and dynamic light, optimization of light, how to fake lighting, and the benefits of the different type and settings of lights. often on mobile, it is worth faking as much as possible when it comes to shadows. real time shadows are expensive! dynamic objects often try using a 3d mesh, plane, or quad with a dark shadow texture for a shadow rather than resorting to dynamic lights. for dynamic game objects, where you cannot fake lighting, use light probes. these have the same benefits of light maps and can be calculated offline. a light probe stores the light that passes through empty space in your scene. this data can then be used to light dynamic objects, which helps integrate them visually with lightmapped objects throughout your scene. lighting is a large topic with lots of possible optimizations. read more at real-time 3d art best practices in unity: lighting. arm and samsung devices arm’s cortex-a cpus and mali gpus power the world’s smartphones, with mali gpus powering mobile graphics. this means you can find arm gpus in an extensive list of popular samsung devices, including the samsung galaxy a51 and galaxy s21. arm provides practical tips and advice for teams developing real time 3d or 2d content for arm-based devices. mobile game performance analysis has never been more important every year mobile gaming grows! it is now worth 77.2 billion us dollars in revenue in 2020. growth in this sector is expected to continue in 2021 and beyond. with more mobile devices coming out each year, it is important for your content to be able to run on as many devices as possible, while providing players with the best possible experience. the artist best practices is just one part of the educational materials from arm. alongside these best practices, you can explore the unity learn course, arm & unity presents: 3d art optimization for mobile applications. this course includes a downloadable project that shows off the many benefits of using the best practices. for more advanced users, check out arm’s mali gpu best practices guide and learn about performance analysis with arm mobile studio. follow up thanks to joe rozek and the team at arm for bringing these great ideas to the samsung developers community. we hope you put these best practices into effect on your upcoming mobile games. the samsung developers site has many resources for developers looking to build for and integrate with samsung devices and services. stay in touch with the latest news by creating a free account or by subscribing to our monthly newsletter. visit the marketing resources page for information on promoting and distributing your apps and games. finally, our developer forum is an excellent way to stay up-to-date on all things related to the galaxy ecosystem.

Arm Developers

https://developer.samsung.com/galaxy-gamedev/blog/en-us/2021/05/19/best-practices-for-mobile-game-developers-and-artists

tutorials game, mobile

blog

New Vulkan Extensions for Mobile: Maintenance Extensions

the samsung developers team works with many companies in the mobile and gaming ecosystems. we're excited to support our partner, arm, as they bring timely and relevant content to developers looking to build games and high-performance experiences. this vulkan extensions series will help developers get the most out of the new and game-changing vulkan extensions on samsung mobile devices. android is enabling a host of useful new vulkan extensions for mobile. these new extensions are set to improve the state of graphics apis for modern applications, enabling new use cases and changing how developers can design graphics renderers going forward. in particular, in android r, there has been a whole set of vulkan extensions added. these extensions will be available across various android smartphones, including the samsung galaxy s21, which was recently launched on 14 january. existing samsung galaxy s models, such as the samsung galaxy s20, also allow upgrades to android r. one of these new vulkan extensions for mobile are ‘maintenance extensions’. these plug up various holes in the vulkan specification. mostly, a lack of these extensions can be worked around, but it is annoying for application developers to do so. having these extensions means less friction overall, which is a very good thing. vk_khr_uniform_buffer_standard_layout this extension is a quiet one, but i still feel it has a lot of impact since it removes a fundamental restriction for applications. getting to data efficiently is the lifeblood of gpu programming. one thing i have seen trip up developers again and again are the antiquated rules for how uniform buffers (ubo) are laid out in memory. for whatever reason, ubos have been stuck with annoying alignment rules which go back to ancient times, yet ssbos have nice alignment rules. why? as an example, let us assume we want to send an array of floats to a shader: #version 450 layout(set = 0, binding = 0, std140) uniform ubo { float values[1024]; }; layout(location = 0) out vec4 fragcolor; layout(location = 0) flat in int vindex; void main() { fragcolor = vec4(values[vindex]); } if you are not used to graphics api idiosyncrasies, this looks fine, but danger lurks around the corner. any array in a ubo will be padded out to have 16 byte elements, meaning the only way to have a tightly packed ubo is to use vec4 arrays. somehow, legacy hardware was hardwired for this assumption. ssbos never had this problem. std140 vs std430 you might have run into these weird layout qualifiers in glsl. they reference some rather old glsl versions. std140 refers to glsl 1.40, which was introduced in opengl 3.1, and it was the version uniform buffers were introduced to opengl. the std140 packing rules define how variables are packed into buffers. the main quirks of std140 are: vectors are aligned to their size. notoriously, a vec3 is aligned to 16 bytes, which have tripped up countless programmers over the years, but this is just the nature of vectors in general. hardware tends to like aligned access to vectors. array element sizes are aligned to 16 bytes. this one makes it very wasteful to use arrays of float and vec2. the array quirk mirrors hlsl’s cbuffer. after all, both opengl and d3d mapped to the same hardware. essentially, the assumption i am making here is that hardware was only able to load 16 bytes at a time with 16 byte alignment. to extract scalars, you could always do that after the load. std430 was introduced in glsl 4.30 in opengl 4.3 and was designed to be used with ssbos. std430 removed the array element alignment rule, which means that with std430, we can express this efficiently: #version 450 layout(set = 0, binding = 0, std430) readonly buffer ssbo { float values[1024]; }; layout(location = 0) out vec4 fragcolor; layout(location = 0) flat in int vindex; void main() { fragcolor = vec4(values[vindex]); } basically, the new extension enables std430 layout for use with ubos as well. #version 450 #extension gl_ext_scalar_block_layout : require layout(set = 0, binding = 0, std430) uniform ubo { float values[1024]; }; layout(location = 0) out vec4 fragcolor; layout(location = 0) flat in int vindex; void main() { fragcolor = vec4(values[vindex]); } why not just use ssbos then? on some architectures, yes, that is a valid workaround. however, some architectures also have special caches which are designed specifically for ubos. improving memory layouts of ubos is still valuable. gl_ext_scalar_block_layout? the vulkan glsl extension which supports std430 ubos goes a little further and supports the scalar layout as well. this is a completely relaxed layout scheme where alignment requirements are essentially gone, however, that requires a different vulkan extension to work. vk_khr_separate_depth_stencil_layouts depth-stencil images are weird in general. it is natural to think of these two aspects as separate images. however, the reality is that some gpu architectures like to pack depth and stencil together into one image, especially with d24s8 formats. expressing image layouts with depth and stencil formats have therefore been somewhat awkward in vulkan, especially if you want to make one aspect read-only and keep another aspect as read/write, for example. in vulkan 1.0, both depth and stencil needed to be in the same image layout. this means that you are either doing read-only depth-stencil or read/write depth-stencil. this was quickly identified as not being good enough for certain use cases. there are valid use cases where depth is read-only while stencil is read/write in deferred rendering for example. eventually, vk_khr_maintenance2 added support for some mixed image layouts which lets us express read-only depth, read/write stencil, and vice versa: vk_image_layout_depth_attachment_stencil_read_only_optimal_khr vk_image_layout_depth_read_only_stencil_attachment_optimal_khr usually, this is good enough, but there is a significant caveat to this approach, which is that depth and stencil layouts must be specified and transitioned together. this means that it is not possible to render to a depth aspect, while transitioning the stencil aspect concurrently, since changing image layouts is a write operation. if the engine is not designed to couple depths and stencil together, it causes a lot of friction in implementation. what this extension does is completely decouple image layouts for depth and stencil aspects and makes it possible to modify the depth or stencil image layouts in complete isolation. for example: vkimagememorybarrier barrier = {…}; normally, we would have to specify both depth and stencil aspects for depth-stencil images. now, we can completely ignore what stencil is doing and only modify depth image layout. barrier.subresourcerange.aspectmask = vk_image_aspect_depth_bit; barrier.oldlayout = vk_image_layout_depth_attachment_optimal_khr; barrier.newlayout = vk_image_layout_depth_read_only_optimal; similarly, in vk_khr_create_renderpass2, there are extension structures where you can specify stencil layouts separately from the depth layout if you wish. typedef struct vkattachmentdescriptionstencillayout { vkstructuretype stype; void* pnext; vkimagelayout stencilinitiallayout; vkimagelayout stencilfinallayout; } vkattachmentdescriptionstencillayout; typedef struct vkattachmentreferencestencillayout { vkstructuretype stype; void* pnext; vkimagelayout stencillayout; } vkattachmentreferencestencillayout; like image memory barriers, it is possible to express layout transitions that only occur in either depth or stencil attachments. vk_khr_spirv_1_4 each core vulkan version has targeted a specific spir-v version. for vulkan 1.0, we have spir-v 1.0. for vulkan 1.1, we have spir-v 1.3, and for vulkan 1.2 we have spir-v 1.5. spir-v 1.4 was an interim version between vulkan 1.1 and 1.2 which added some nice features, but the usefulness of this extension is largely meant for developers who like to target spir-v themselves. developers using glsl or hlsl might not find much use for this extension. some highlights of spir-v 1.4 that i think are worth mentioning are listed here. opselect between composite objects opselect before spir-v 1.4 only supports selecting between scalars and vectors. spir-v 1.4 thus allows you to express this kind of code easily with a simple opselect: mystruct s = cond ? mystruct(1, 2, 3) : mystruct(4, 5, 6); opcopylogical there are scenarios in high-level languages where you load a struct from a buffer and then place it in a function variable. if you have ever looked at spir-v code for this kind of scenario, glslang would copy each element of the struct one by one, which generates bloated spir-v code. this is because the struct type that lives in a buffer and a struct type for a function variable are not necessarily the same. offset decorations are the major culprits here. copying objects in spir-v only works when the types are exactly the same, not “almost the same”. opcopylogical fixes this problem where you can copy objects of types which are the same except for decorations. advanced loop control hints spir-v 1.4 adds ways to express partial unrolling, how many iterations are expected, and such advanced hints, which can help a driver optimize better using knowledge it otherwise would not have. there is no way to express these in normal shading languages yet, but it does not seem difficult to add support for it. explicit look-up tables describing look-up tables was a bit awkward in spir-v. the natural way to do this in spir-v 1.3 is to declare an array with private storage scope with an initializer, access chain into it and load from it. however, there was never a way to express that a global variable is const, which relies on compilers to be a little smart. as a case study, let us see what glslang emits when using vulkan 1.1 target environment: #version 450 layout(location = 0) out float fragcolor; layout(location = 0) flat in int vindex; const float lut[4] = float[](1.0, 2.0, 3.0, 4.0); void main() { fragcolor = lut[vindex]; } %float_1 = opconstant %float 1 %float_2 = opconstant %float 2 %float_3 = opconstant %float 3 %float_4 = opconstant %float 4 %16 = opconstantcomposite %_arr_float_uint_4 %float_1 %float_2 %float_3 %float_4 this is super weird code, but it is easy for compilers to promote to a lut. if the compiler can prove there are no readers before the opstore, and only one opstore can statically happen, compiler can optimize it to const lut. %indexable = opvariable %_ptr_function__arr_float_uint_4 function opstore %indexable %16 %24 = opaccesschain %_ptr_function_float %indexable %index %25 = opload %float %24 in spir-v 1.4, the nonwritable decoration can also be used with private and function storage variables. add an initializer, and we get something that looks far more reasonable and obvious: opdecorate %indexable nonwritable %16 = opconstantcomposite %_arr_float_uint_4 %float_1 %float_2 %float_3 %float_4 // initialize an array with a constant expression and mark it as nonwritable. // this is trivially a lut. %indexable = opvariable %_ptr_function__arr_float_uint_4 function %16 %24 = opaccesschain %_ptr_function_float %indexable %index %25 = opload %float %24 vk_khr_shader_subgroup_extended_types this extension fixes a hole in vulkan subgroup support. when subgroups were introduced, it was only possible to use subgroup operations on 32-bit values. however, with 16-bit arithmetic getting more popular, especially float16, there are use cases where you would want to use subgroup operations on smaller arithmetic types, making this kind of shader possible: #version 450 // subgroupadd #extension gl_khr_shader_subgroup_arithmetic : require for fp16 arithmetic: #extension gl_ext_shader_explicit_arithmetic_types_float16 : require for subgroup operations on fp16: #extension gl_ext_shader_subgroup_extended_types_float16 : require layout(location = 0) out f16vec4 fragcolor; layout(location = 0) in f16vec4 vcolor; void main() { fragcolor = subgroupadd(vcolor); } vk_khr_imageless_framebuffer in most engines, using vkframebuffer objects can feel a bit awkward, since most engine abstractions are based around some idea of: myrenderapi::bindrendertargets(colorattachments, depthstencilattachment) in this model, vkframebuffer objects introduce a lot of friction, since engines would almost certainly end up with either one of two strategies: create a vkframebuffer for every render pass, free later. maintain a hashmap of all observed attachment and render-pass combinations. unfortunately, there are some … reasons why vkframebuffer exists in the first place, but vk_khr_imageless_framebuffer at least removes the largest pain point. this is needing to know the exact vkimageviews that we are going to use before we actually start rendering. with imageless frame buffers, we can defer the exact vkimageviews we are going to render into until vkcmdbeginrenderpass. however, the frame buffer itself still needs to know about certain metadata ahead of time. some drivers need to know this information unfortunately. first, we set the vk_framebuffer_create_imageless_bit flag in vkcreateframebuffer. this removes the need to set pattachments. instead, we specify some parameters for each attachment. we pass down this structure as a pnext: typedef struct vkframebufferattachmentscreateinfo { vkstructuretype stype; const void* pnext; uint32_t attachmentimageinfocount; const vkframebufferattachmentimageinfo* pattachmentimageinfos; } vkframebufferattachmentscreateinfo; typedef struct vkframebufferattachmentimageinfo { vkstructuretype stype; const void* pnext; vkimagecreateflags flags; vkimageusageflags usage; uint32_t width; uint32_t height; uint32_t layercount; uint32_t viewformatcount; const vkformat* pviewformats; } vkframebufferattachmentimageinfo; essentially, we need to specify almost everything that vkcreateimage would specify. the only thing we avoid is having to know the exact image views we need to use. to begin a render pass which uses imageless frame buffer, we pass down this struct in vkcmdbeginrenderpass instead: typedef struct vkrenderpassattachmentbegininfo { vkstructuretype stype; const void* pnext; uint32_t attachmentcount; const vkimageview* pattachments; } vkrenderpassattachmentbegininfo; conclusions overall, i feel like this extension does not really solve the problem of having to know images up front. knowing the resolution, usage flags of all attachments up front is basically like having to know the image views up front either way. if your engine knows all this information up-front, just not the exact image views, then this extension can be useful. the number of unique vkframebuffer objects will likely go down as well, but otherwise, there is in my personal view room to greatly improve things. in the next blog on the new vulkan extensions, i explore 'legacy support extensions.' follow up thanks to hans-kristian arntzen and the team at arm for bringing this great content to the samsung developers community. we hope you find this information about vulkan extensions useful for developing your upcoming mobile games. the samsung developers site has many resources for developers looking to build for and integrate with samsung devices and services. stay in touch with the latest news by creating a free account or by subscribing to our monthly newsletter. visit the marketing resources page for information on promoting and distributing your apps and games. finally, our developer forum is an excellent way to stay up-to-date on all things related to the galaxy ecosystem.

Arm Developers

https://developer.samsung.com/galaxy-gamedev/blog/en-us/2021/06/17/new-vulkan-extensions-for-mobile-maintenance-extensions

tutorials game, mobile

blog

How to Use Jetpack WindowManager in Android Game Dev

with the increasing popularity of foldable phones such as the galaxy z fold3 and galaxy z flip3, apps on these devices are adopting its foldable features. in this blog, you can get started on how to utilize these foldable features on android game apps. we focus on creating a java file containing an implementation of the android jetpack windowmanager library that can be imported into game engines like unity or unreal engine. this creates an interface allowing developers to retrieve information about the folding feature on the device. at the end of this blog, you can go deeper in learning by going to code lab. android jetpack windowmanager android jetpack, in their own words, is "a suite of libraries to help developers follow best practices, reduce boilerplate code, and write code that works consistently across android versions and devices so that developers can focus on the code they care about." windowmanager is one of these libraries, and is intended to help application developers support new device form factors and multi-window environments. the library had its 1.0.0 release in january 2022 for targeted foldable devices. according to its documentation, future versions will be extended to more display types and window features. creating the android jetpack windowmanager setup as previously mentioned, we are creating a java file that can be imported into either unity or unreal engine 4, to create an interface for retrieving information on the folding feature and pass it over to the native or engine side of your applications. set up the foldablehelper class and data storage class create a file called foldablehelper.java in visual studio or any source code editor. let's start off by giving it a package name of package com.samsung.android.gamedev.foldable; next, let's import all the necessary libraries and classes in this file: //android imports import android.app.activity; import android.graphics.rect; import android.os.handler; import android.os.looper; import android.util.log; //android jetpack windowmanager imports import androidx.annotation.nonnull; import androidx.core.util.consumer; import androidx.window.java.layout.windowinfotrackercallbackadapter; import androidx.window.layout.displayfeature; import androidx.window.layout.foldingfeature; import androidx.window.layout.windowinfotracker; import androidx.window.layout.windowlayoutinfo; import androidx.window.layout.windowmetrics; import androidx.window.layout.windowmetricscalculator; //java imports import java.util.list; import java.util.concurrent.executor; start by creating a class, foldablehelper, that is going to contain all of our helper functions. let's then create variables to store a callback object as well as windowinfotrackercallbackadapter and windowmetricscalculator. let's also create a temporary declaration of the native function to pass the data from java to the native side of application once we start working in the game engines. public class foldablehelper { private static layoutstatechangecallback layoutstatechangecallback; private static windowinfotrackercallbackadapter wit; private static windowmetricscalculator wmc; public static native void onlayoutchanged(foldablelayoutinfo resultinfo); } let's create a storage class to hold the data received from the windowmanager library. an instance of this class will also be passed to the native code to transfer the data. public static class foldablelayoutinfo { public static int undefined = -1; // hinge orientation public static int hinge_orientation_horizontal = 0; public static int hinge_orientation_vertical = 1; // state public static int state_flat = 0; public static int state_half_opened = 1; // occlusion type public static int occlusion_type_none = 0; public static int occlusion_type_full = 1; rect currentmetrics = new rect(); rect maxmetrics = new rect(); int hingeorientation = undefined; int state = undefined; int occlusiontype = undefined; boolean isseparating = false; rect bounds = new rect(); } initialize the windowinfotracker since we are working in java and the windowmanager library is written in kotlin, we have to use the windowinfotrackercallbackadapter. this is an interface provided by android to enable the use of the windowinfotracker from java. the window info tracker is how we receive information about any foldable features inside the window's bounds. next is to create windowmetricscalculator, which lets us retrieve the window metrics of an activity. window metrics consists of the windows' current and maximum bounds. we also create a new layoutstatechangecallback object. this object is passed into the window info tracker as a listener object and is called every time the layout of the device changes (for our purposes this is when the foldable state changes). public static void init(activity activity) { //create window info tracker wit = new windowinfotrackercallbackadapter(windowinfotracker.companion.getorcreate(activity)); //create window metrics calculator wmc = windowmetricscalculator.companion.getorcreate(); //create callback object layoutstatechangecallback = new layoutstatechangecallback(activity); } set up and attach the callback listener in this step, let's attach the layoutstatechangecallback to the windowinfotrackercallbackadapter as a listener. the addwindowlayoutinfolistener function takes three parameters: the activity to attach the listener to, an executor, and a consumer of windowlayoutinfo. we will set up the executor and consumer in a moment. the adding of the listener is kept separate from the initialization, since the first windowlayoutinfo is not emitted until activity.onstart has been called. as such, we'll likely not be needing to attach the listener until during or after onstart, but we can still set up the windowinfotracker and windowmetricscalculator ahead of time. public static void start(activity activity) { wit.addwindowlayoutinfolistener(activity, runonuithreadexecutor(), layoutstatechangecallback); } now, let's create the executor for the listener. this executor is straightforward and simply runs the command on the mainlooper of our activity. it is possible to set this up to run on a custom thread, however this is not going to be covered in this blog. for more information, we recommend checking the official documentation for the jetpack windowmanager. static executor runonuithreadexecutor() { return new myexecutor(); } static class myexecutor implements executor { handler handler = new handler(looper.getmainlooper()); @override public void execute(runnable command) { handler.post(command); } } we're going to create the basic layout of our layoutstatechangecallback. this consumes windowlayoutinfo and implements consumer<windowlayoutinfo>. for now, let's simply lay out the class and give it some functionality a little bit later. static class layoutstatechangecallback implements consumer<windowlayoutinfo> { private final activity activity; public layoutstatechangecallback(activity activity) { this.activity = activity; } } if the use of the listener is no longer needed, we want a way to remove it and the windowinfotrackercallbackadapter contains a function to do just that. public static void stop() { wit.removewindowlayoutinfolistener(layoutstatechangecallback); } this just tidies things up for us and ensures that the listener is cleaned up when we no longer need it. next, we're going to add some functionality to the layoutstatechangecallback class. we are going to process windowlayoutinfo into foldablelayoutinfo we created previously. using java native interface (jni), we are going to send that information over to the native side using the function onlayoutchanged. note: this doesn't actually do anything yet, but we cover how to set this up in unreal engine and in unity through code lab tutorials. static class layoutstatechangecallback implements consumer<windowlayoutinfo> { @override public void accept(windowlayoutinfo windowlayoutinfo) { foldablelayoutinfo resultinfo = updatelayout(windowlayoutinfo, activity); onlayoutchanged(resultinfo); } } let's implement the updatelayout function to process windowlayoutinfo and return a foldablelayoutinfo. firstly, create a foldablelayoutinfo that contains the processed information. follow this up by getting the window metrics, both maximum metrics and current metrics. private static foldablelayoutinfo updatelayout(windowlayoutinfo windowlayoutinfo, activity activity) { foldablelayoutinfo retlayoutinfo = new foldablelayoutinfo(); windowmetrics wm = wmc.computecurrentwindowmetrics(activity); retlayoutinfo.currentmetrics = wm.getbounds(); wm = wmc.computemaximumwindowmetrics(activity); retlayoutinfo.maxmetrics = wm.getbounds(); } get the displayfeatures present in the current window bounds using windowlayoutinfo.getdisplayfeatures. currently, the api only has one type of displayfeature: foldingfeatures, however in the future there will likely be more as screen types evolve. at this point, let's use a for loop to iterate through the resulting list until it finds a foldingfeature. once it detects a folding feature, it starts processing its data: orientation, state, seperation type, and its bounds. then, store these data in foldablelayoutinfo we've created at the start of the function call. you can learn more about these data by going to the jetpack windowmanager documentation. private static foldablelayoutinfo updatelayout(windowlayoutinfo windowlayoutinfo, activity activity) { foldablelayoutinfo retlayoutinfo = new foldablelayoutinfo(); windowmetrics wm = wmc.computecurrentwindowmetrics(activity); retlayoutinfo.currentmetrics = wm.getbounds(); wm = wmc.computemaximumwindowmetrics(activity); retlayoutinfo.maxmetrics = wm.getbounds(); list<displayfeature> displayfeatures = windowlayoutinfo.getdisplayfeatures(); if (!displayfeatures.isempty()) { for (displayfeature displayfeature : displayfeatures) { foldingfeature foldingfeature = (foldingfeature) displayfeature; if (foldingfeature != null) { if (foldingfeature.getorientation() == foldingfeature.orientation.horizontal) { retlayoutinfo.hingeorientation = foldablelayoutinfo.hinge_orientation_horizontal; } else { retlayoutinfo.hingeorientation = foldablelayoutinfo.hinge_orientation_vertical; } if (foldingfeature.getstate() == foldingfeature.state.flat) { retlayoutinfo.state = foldablelayoutinfo.state_flat; } else { retlayoutinfo.state = foldablelayoutinfo.state_half_opened; } if (foldingfeature.getocclusiontype() == foldingfeature.occlusiontype.none) { retlayoutinfo.occlusiontype = foldablelayoutinfo.occlusion_type_none; } else { retlayoutinfo.occlusiontype = foldablelayoutinfo.occlusion_type_full; } retlayoutinfo.isseparating = foldingfeature.isseparating(); retlayoutinfo.bounds = foldingfeature.getbounds(); return retlayoutinfo; } } } return retlayoutinfo; } if there's no folding feature detected, it simply returns the foldablelayoutinfo without setting its data leaving it with undefined (-1) values. conclusion the java file you have now created should be usable in new or existing unity and unreal engine projects, to provide access to the information on the folding feature. continue learning about it by going to the code lab tutorials showing how to use the file created here, to implement flex mode detection and usage in game applications. additional resources on the samsung developers site the samsung developers site has many resources for developers looking to build for and integrate with samsung devices and services. stay in touch with the latest news by creating a free account and subscribing to our monthly newsletter. visit the marketing resources page for information on promoting and distributing your apps. finally, our developer forum is an excellent way to stay up-to-date on all things related to the galaxy ecosystem.

Lochlann Henry Ramsay-Edwards

https://developer.samsung.com/galaxy-gamedev/blog/en-us/2022/07/20/how-to-use-jetpack-window-manager-in-android-game-dev

tutorials game, mobile

blog

Using Variable Rate Shading to Improve Performance on Mobile Games

in this article i would like to introduce a hardware optimisation technique called variable rate shading (vrs) and how this technique can benefit games on mobile phones. introduction traditionally, each shaded pixel in a rendered image is being shaded individually, meaning we can shade very high details anywhere in the image which, in theory, is great. however, in practice, this can lead to wasteful gpu calculations for areas where details are less important. in some cases, you do not need 1x1 shading of pixels to produce a high quality image. for example, for those areas that represent unlit surfaces caused by shadows naturally contain less details than brighter lit areas. moreover, areas which are out of focus due to camera post-effects and areas affected by motion blur naturally do not contain high details. in these cases we could benefit from letting multiple pixels be shaded by just a single calculation (like a 2x2 or 4x4 area of pixels) without losing any noticeable visual quality. the high resolution sky texture on the left looks very much like the lower resolution sky texture on the right. this is due to the smooth colour gradients and lack of high frequency colour variation. for those reasons, there is room for a lot of optimisation. you could argue that optimisation for handheld devices, like mobile phones, is more essential than on stationary devices, like games consoles, due to a couple of reasons. firstly, the hardware on handheld devices is often less powerful than conventional hardware due to smaller size and less electrical power supply. the compact size of the hardware for handheld devices are also the reason why they are more likely to suffer from temperature issues causing thermal throttling, where the performance slows down significantly. secondly, heavy graphics in games can quickly drain your phone's battery life. so, it is crucial to keep the gpu resources to a minimum when possible. variable rate shading is a way to help doing just that. how does variable rate shading work? in principle, variable rate shading is actually a very simple method which can be implemented without having to redesign an existing rendering pipeline. there are three ways to define areas to be optimised using variable rate shading: let an attachment in the form of an image serve as a mask. execute the optimisation on a per-triangle basis. let the vrs optimisation be based on a per-draw call. use an attachment as a mask you can provide the gpu with an image that serves as a mask. the mask contains information about what areas need to be rendered in a traditional manner by shading each pixel individually, and which areas need to be optimised by shading a group of pixels at once. the image below is visualising such a mask by colour-coding different areas: the blue area does not have any optimisation applied (1x1) as this area is where the player focuses on when driving. the green area is optimised by shading four pixels (2x2) by only one shading calculation, as this area contains less details due to motion blur. the red area can be optimised even more (4x4), as it is affected by a more aggressive motion blur. the yellow and purple areas are also shaded with less shading calculations. the areas defined in the image above could be static, at least while the player is driving the boat at top speed, as the boat is positioned at the centre of the image at all times. however, the level of optimisation could be reduced when another boat is passing by or when the boat slows down and therefore the motion blur is gradually reduced. there are times where a more dynamic approach is needed, as it sometimes can be difficult to know beforehand what areas should be optimised and what areas should be shaded in a traditional manner. in those cases, it could be beneficial to generate the mask more dynamically by rendering the geometry for the scene in an extra pass. simply colour the geometric elements in the scene and pass it to the gpu as a mask for the variable rate shading optimisation. if the scene is rendered by using deferred lighting, an extra pass may not be needed as the mask could be based on the default geometry pass required for deferred shading. optimisation based on primitives another way of using variable rate shading is taking advantage of other extensions as they allow you to define geometric elements to be optimised rather than using a mask. this can be done on a per-triangle basis or simply done by a per-draw call. defining geometric elements could be a more efficient approach as there is no need for generating a mask as well as needing less memory bandwidth. for the per-triangle basis extension, you are able to define the optimisation level in the vertex shader. for the per-draw call method, the optimisation level can be defined before the draw call takes place. keep in mind that the three methods can be combined if needed. the image below is a rendering pass where all objects in a scene are shaded in different colours to define what areas should be shaded in a traditional manner (meaning no optimisation) and what areas contain less details (therefore needing less gpu calculations). the areas defined above can be defined by all three methods. in general, by breaking a scene up in layers, where the elements nearest the camera have less optimisation and layers in the background have the most optimisation, would be an effective way to go about it. the image below shows the same scene, but this time we see the final output where vrs is on and off. as you may have noticed, it is very hard to tell any difference when the vrs optimisation is turned on or off. experiences with variable rate shading so far some commercial games have already successfully implemented variable rate shading. the image below is from wolfenstein young blood. as you may have noticed, there is barely any visual difference when vrs is on or off, but you are able to tell a difference in frame rate. in fact, the game performs, on average, 10% or higher when vrs is turned on. that may not sound like a lot, but considering that it is an easy optimisation to implement, there is barely any noticeable change in the visual quality. the 10% performance boost is on top of other optimisation techniques and it is actually not a bad performance boost after all. other games have shown an even a higher performance boost. for example, gears tactics has a performance boost up to 30% when using variable rate shading. the image below is from that game. virtual reality variable rate shading can benefit virtual reality as well. not only does virtual reality by nature require two rendered images (one image for each eye), but the player who wears the virtual mask naturally pays most attention to the central area of the rendered image. the areas of the rendered image that are seen from the corner of your eye naturally do not need the same amount of details as the central area of the rendered images. that means even though a static vrs mask can be used for a reasonable overall optimisation, using an eye tracker could result in an even more efficient optimisation and therefore less noticeable quality reduction. it is crucial to have a consistent high frame rate for virtual reality. if the frame rate is not relatively consistent or the rendering performance is suffering for a consistent low frame rate, it quickly gets uncomfortable to wear a vr headset and the player might even get dizzy and feel physically sick. by reducing the gpu calculations, using variable rate shading not only boosts the frame rate, it also uses less battery for mobile devices. this is a huge win for systems like samsung gear vr where a long battery life is much appreciated as the graphics are running on a galaxy mobile phone. the image below shows a variable rate shading mask generated by eye tracking technology for a virtual reality headset. the centre of the left and right images shade pixels in a traditional manner. the other colours represent different degrees of optimisation areas. which samsung devices support variable rate shading? all hardware listed here supports variable rate shading. mobile phones: samsung galaxy s22, s22+ and s22 ultra tablets: samsung tab s8, s8+ and s8 ultra the following graphics apis, vulkan and opengl es 2.0 (and higher), both support variable rate shading. the opengl extensions for the three ways of using variable rate shading are the following: gl_ext_fragment_shading_rate_attachment for allowing to send a mask to the gpu. gl_ext_fragment_shading_rate_primitive for per-triangle basis, where writing a value to gl_primitiveshadingrateext in the vertex shader defines the level of optimisation. gl_ext_fragment_shading_rate for per-draw call, where glshadingrateext should be called to define the optimisation level. the extension that enables variable rate shading for vulkan is vk_khr_fragment_shading_rate. conclusion in this article, we have established the following: variable rate shading is a hardware feature and is fairly easy to implement as it does not require any redesign of existing rendering pipelines. variable rate shading is an optimisation technique which reduces gpu calculations by allowing a group of pixels to be shaded by the same colour rather than each pixel individually. variable rate shading is particularly useful for mobile gaming as well as samsung gear vr, as it boosts performance and prolongs battery life. the level of optimisation can be defined by passing a mask to the gpu that contains areas of different optimisation levels. some implementations have proven to boost the framerate 10% or higher, while other implementations manage to increase the frame rate up to 30%. note: some images in this post are courtesy of ul solutions. additional resources on the samsung developers site the samsung developers site has many resources for developers looking to build for and integrate with samsung devices and services. stay in touch with the latest news by creating a free account and subscribing to our monthly newsletter. visit the marketing resources page for information on promoting and distributing your apps. finally, our developer forum is an excellent way to stay up-to-date on all things related to the galaxy ecosystem.

Søren Klit Lambæk

https://developer.samsung.com/galaxy-gamedev/blog/en-us/2022/11/22/using-variable-rate-shading-to-improve-performance-on-mobile-games

success story game, mobile

blog

Accelerate game performance based on SceneSDK

introduction in recent years, the mobile game market has been growing fast. along with hardware upgrades, the implementation of mobile games is more complicated, and the loading process and the display of some scenes consume a lot of cpu and gpu resources. so samsung worked with game vendors, including the tencent game performance amelioration (mtgpa) team, to improve game user experience based on scenesdk. scenesdk focuses on performance optimization, with the game vendor's cooperation, by combining the abilities of device manufacturers to control hardware resources and the abilities of games to sync up scenario information. it can maximize the game experience. currently it includes many items such as scene guarantee and frequency reduction notification. it can get game information, send a mobile device's status to a game and supports 40+ games, including many popular games. optimization solutions scene protection scene protection divides a game into different game scenes according to certain rules (such as coarse-grained loading, lobby, single game round, ultimate kill, aiming and shooting, and much more), and then provide finer-grained performance guarantees for different game scenes. considering that hardware resources are limited, if the protection of all the scenes is the same, the actual protection effect is not ideal because the hardware is fully loaded at the beginning. the system high temperature protection is triggered quickly, the cpu and gpu are forced to reduce frequency, and the rendering performance will be even worse. therefore, the game could send game events, like loading, starting, lobby or scene loading, to scenesdk on the mobile device's side. the game information is sent in the json format {sceneid: value}. it’s flexible and can be extended with more sceneids if needed. the main value information is shown in the table below. after getting the scene info, the scenesdk service changes cpu/gpu frequency to improve game performance based on different game scenarios. during gameplay, it is necessary to classify the scene according to the importance level. the strategy of hierarchical protection is slightly different according to the underlying adjustment capabilities of each manufacturer, but the core is the highest-level scene, which is fully protected. for the protection of different levels of scenes, the effect can be shown as below. for the highest priority (critical) scene, the fps (frames per second) is more stable. for the lowest level scene, fps slowly declines without affecting the experience. at the same time, the protection of the scene switching is also effective in real time. as shown in the following figure, when the scene level is switched from low (1) to critical (3), the cpu frequency starts to increase, and the fps also gradually starts to increase. the comparison is as follows: frequency reduction notification the frequency reduction notification is to inform the game of the system cpu frequency reduction. the extent of cpu frequency reduction varies from manufacturer to manufacturer. such adjustments inevitably result in equipment that just meets the performance needs or already fails to meet the performance needs of the game so that it starts to freeze or becomes stuck. therefore, if the game can adjust the configuration items related to performance consumption instantly, by temporarily reducing or disabling some functions and using such notifications, a stuck game can be avoided to ensure the best player experience. variable refresh rate a key part of the technology used in the latest samsung flagship model is the ability to not only run conventional mobile refresh rate limits, but also to dynamically modify them based on game requirements at runtime. a common misconception is that if a device has the ability to run at 120 fps, it should always do so. however, most games support multiple options that are lower than 120 fps or don’t support 120 fps. if a game does not support 120 fps, a 120 fps refresh rate is not required and may cost more power consumption. the ideal approach is to make use of the maximum refresh rate only when it has the greatest benefit. so when the game sends the json string {“target fps”: value} to the mobile device, the device changes the refresh rate to save power by not running in the highest refresh rate. summary this table shows the benefit of using scenesdk at launch time. this table shows better performance during two rounds of testing with scenesdk on and scenesdk off. this table shows lower power consumption when dynamic refresh rate is applied. overall, scenesdk could be a good option to improve your game performance. please feel free to contact us if you need more information.

Xiangguo Qi

https://developer.samsung.com/galaxy-gamedev/blog/en-us/2022/04/26/accelerate-game-performance-based-on-scenesdk

tutorials game, mobile

blog

Using Conservative Morphological Anti-Aliasing to Improve Game Visuals

anti-aliasing is an important addition to any game to improve visual quality by smoothing out the jagged edges of a scene. msaa (multisample anti-aliasing) is one of the oldest methods to achieve this and is still the preferred solution for mobile. however it is only suitable for forward rendering and, with mobile performance improving year over year, deferred rendering is becoming more common, necessitating the use of post-process aa. this leaves slim pickings as such algorithms tend to be too expensive for mobile gpus with fxaa (fast approximate anti-aliasing) being the only ‘cheap’ option among them. fxaa may be performant enough but it only has simple colour discontinuity shape detection, leading to an often unwanted softening of the image. its kernel is also limited in size, so it struggles to anti-alias longer edges effectively. space module scene with cmaa applied. conservative morphological anti-aliasing conservative morphological anti-aliasing (cmaa) is a post-process aa solution originally developed by intel for their low power integrated gpus 1. its design goals are to be a better alternative to fxaa by: being minimally invasive so it can be acceptable as a replacement in a wide range of applications, including worst case scenarios such as text, repeating patterns, certain geometries (power lines, mesh fences, foliage), and moving images. running efficiently on low-medium range gpu hardware, such as integrated gpus (or, in our case, mobile gpus). we have repurposed this desktop-developed algorithm and come up with a hybrid between the original 1.3 version and the updated 2.0 version 2 to make the best use of mobile hardware. a demo app was created using khronos’ vulkan samples as a framework (which could also be done with gles) to implement this experiment. the sample has a drop down menu for easy switching between the different aa solutions and presents a frametime and bandwidth overlay. cmaa has four basic logical steps: image analysis for colour discontinuities (afterwards stored in a local compressed 'edge' buffer). the method used is not unique to cmaa. extracting locally dominant edges with a small kernel. (unique variation of existing algorithms.) handling of simple shapes. handling of symmetrical long edge shapes. (unique take on the original mlaa shape handling algorithm.) pass 1 edge detection result captured in renderdoc. a full screen edge detection pass is done in a fragment shader and the resulting colour discontinuity values are written into a colour attachment. our implementation uses the pixels’ luminance values to find edge discontinuities for speed and simplicity. an edge exists if the contrast between neighbouring pixels is above an empirically determined threshold. pass 2 neighbouring edges considered for local contrast adaptation. a local contrast adaptation is performed for each detected edge by comparing the value of the previous pass against the values of its closest neighbours by creating a threshold from the average and largest of these, as described by the formula below. any that pass the threshold are written into an image as a confirmed edge. threshold = (avg+avgxy) * (1.0 - nondominantedgeremovalamount) + maxe * (nondominantedgeremovalamount); nondominantedgeremovalamount is another empirically determined variable. pass 3 this pass collects all the edges for each pixel from the previous pass and packs them into a new image for the final pass. this pass also does the first part of edge blending. the detected edges are used to look for 2, 3 and 4 edges in a pixel and then blend in the colours from the adjacent pixels. this helps avoid the unnecessary blending of straight edges. pass 4 the final pass does long edge blending by identifying z-shapes in the detected edges. for each detected z-shape, the length of the edge is traced in both directions until it reaches the end or until it runs into a perpendicular edge. pixel blending is then performed along the traced edges proportional to their distance from the centre. before and after of z-shape detection. results image comparison shows a typical scenario for aa. cmaa manages high quality anti-aliasing while retaining sharpness on straight edges. cmaa demonstrates itself as a superior solution to aliasing than fxaa by avoiding the latter’s limitations. it maintains a crisper look to the overall image and won’t smudge thin lines, all while still providing effective anti-aliasing to curved edges. it also provides a significant performance advantage to qualcomm devices and only a small penalty to arm. image comparison shows a weakness of fxaa where it smudges thin lined geometry into the background. cmaa shows no such weakness and retains the colour of the railing while adding anti-aliasing effectively. msaa is still a clear winner and our recommended solution if your game allows for it to be resolved within a single render pass. for any case where that is impractical, cmaa is overall a better alternative than fxaa and should be strongly considered. graph shows the increase in frametime for each aa method across a range of samsung devices. follow up this site has many resources for developers looking to build for and integrate with samsung devices and services. stay in touch with the latest news by creating a free account or by subscribing to our monthly newsletter. visit the marketing resources page for information on promoting and distributing your apps. finally, our developer forum is an excellent way to stay up-to-date on all things related to the galaxy ecosystem. references filip strugar and leigh davies: conservative morphological anti-aliasing (cmaa) – march 2014. filip strugar and adam t lake: conservative morphological anti-aliasing 2.0 – april 2018.

Samsung GameDev Team

https://developer.samsung.com/galaxy-gamedev/blog/en-us/2021/06/01/using-conservative-morphological-anti-aliasing-to-improve-game-visuals

Code Lab at SDC23: A Quick Recap

SDC23 | Samsung Developer

Android Game Analysis with Arm Mobile Studio

New Game Changing Vulkan Extensions for Mobile: Descriptor Indexing

Best Practices for Mobile Game Developers and Artists

New Vulkan Extensions for Mobile: Maintenance Extensions

How to Use Jetpack WindowManager in Android Game Dev

Using Variable Rate Shading to Improve Performance on Mobile Games

Accelerate game performance based on SceneSDK

Using Conservative Morphological Anti-Aliasing to Improve Game Visuals

Didn’t find what you were looking for?

FAQ

Join the Forum

Get Support

Code Lab at SDC23: A Quick Recap

SDC23 | Samsung Developer

Android Game Analysis with Arm Mobile Studio

New Game Changing Vulkan Extensions for Mobile: Descriptor Indexing

Best Practices for Mobile Game Developers and Artists

New Vulkan Extensions for Mobile: Maintenance Extensions

How to Use Jetpack WindowManager in Android Game Dev

Using Variable Rate Shading to Improve Performance on Mobile Games

Accelerate game performance based on SceneSDK

Using Conservative Morphological Anti-Aliasing to Improve Game Visuals

Didn’t find what you were looking for?

FAQ

Join the Forum

Get Support

Manage Your Cookies

Essential Cookies

Analytical/Performance Cookies

Functionality Cookies

Advertising Cookies

Preferences Submitted