Commit Graph

7027 Commits

Author SHA1 Message Date
Zephyron
40aef8bbc8 vulkan: Add Samsung driver workarounds
Add workarounds for Samsung Xclipse GPUs:

- Disable extendedDynamicState3ColorBlendEquation as it is broken in Samsung
  drivers, similar to AMD drivers
- Add Samsung's proprietary driver to the validated driver list for clock
  boosting
- Fix log message to indicate both AMD and Samsung drivers have broken
  color blend equation support

Remove stray logical OR operator from validated_driver condition.
2025-05-11 14:54:45 +01:00
Zephyron
dc918513cf video_core/vulkan: Improve texture format conversion handling
Refactors and improves the texture format conversion system in the Vulkan
renderer:

- Adds proper sRGB to linear conversion for depth formats
- Improves shader accuracy for ABGR8 SRGB to D24S8 conversion
- Adds gamma correction and proper depth range clamping
- Moves GetSupportedFormat implementation to header
- Cleans up format conversion switch statement
- Removes redundant format conversion paths

The changes improve accuracy when converting between color and depth
formats, particularly for sRGB sources. The shader improvements ensure
proper gamma correction and depth range handling.

Technical changes:
- Improves sRGB to linear conversion in fragment shader
- Adds proper depth value clamping
- Consolidates format conversion logic
- Removes duplicate GetSupportedFormat implementation
2025-05-11 14:54:45 +01:00
Zephyron
a7036e7dd9 vulkan: Implement native MSAA resolve in texture cache
Implements hardware-accelerated MSAA resolve functionality in the Vulkan
texture cache instead of relying on compute shaders. This change:

- Adds proper MSAA to non-MSAA image copy support using VkResolveImage
- Creates temporary resolve images with appropriate memory allocation
- Handles format compatibility checks with proper fallback to compute
- Manages image layout transitions and memory barriers
- Preserves existing compute shader fallback for unsupported formats

The implementation follows Vulkan best practices for MSAA resolve
operations and should provide better performance for supported formats.
2025-05-11 14:54:45 +01:00
Zephyron
fdeaf764fe vulkan: Add 4KB memory alignment for AMD and Qualcomm drivers
Adds special handling for memory allocation size on AMD and Qualcomm (Adreno)
drivers by aligning allocations to 4KB boundaries. This fixes potential memory
allocation issues on these drivers where unaligned allocations may fail or
cause undefined behavior.

Affected drivers:
- AMD Proprietary (AMDVLK)
- AMD Open Source (RADV)
- Qualcomm Proprietary (Adreno)
2025-05-11 14:54:45 +01:00
Zephyron
0b5a1c07b4 video_core: Add new shader format conversion pipelines
Adds several new shader-based format conversion pipelines to support additional
texture formats and operations:

- RGBA8 to BGRA8 conversion
- YUV420/RGB conversions
- BC7 to RGBA8 decompression
- ASTC HDR to RGBA16F decompression
- RGBA16F to RGBA8 conversion
- Temporal dithering
- Dynamic resolution scaling

Updates the texture cache runtime to handle these new conversion paths and adds
helper functions to check format compatibility for dithering and scaling
operations.

The changes include:
- New shader files and CMake entries
- Additional conversion pipeline setup in BlitImageHelper
- Extended format conversion logic in TextureCacheRuntime
- New format compatibility check helpers
2025-05-11 14:54:45 +01:00
Zephyron
a69dc81b34 video_core: Add sRGB to D24S8 depth-stencil conversion support
Implements conversion from sRGB color formats to D24S8 depth-stencil format
in the Vulkan renderer. This change includes:

- New fragment shader convert_abgr8_srgb_to_d24s8.frag that handles proper
  sRGB to linear conversion before depth calculation
- Added shader to CMake build system
- Extended BlitImageHelper with new conversion pipeline and methods
- Updated texture cache to handle sRGB to D24S8 format conversion paths

The conversion properly handles sRGB color space by first converting to
linear space before calculating luminance values for the depth component,
while preserving alpha channel data for the stencil component.
2025-05-11 14:54:45 +01:00
Zephyron
f50aa799df vulkan: Relax VRAM allocation limits for better stability
Adjusts VRAM allocation strategy to be more conservative while maintaining
performance:

- Increases reserve memory from 1/8th to 1/4th (max 2GB) for discrete GPUs
- Increases base memory limit from 6GB to 8GB
- Doubles resolution scaling memory from 1GB to 2GB per scale factor
- Reduces system memory reservation from 8GB to 4GB for integrated GPUs
- Increases maximum memory limit from 4GB to 6GB for integrated GPUs

These changes help prevent memory leaks while still providing adequate
VRAM for optimal performance.
2025-05-11 14:29:04 +01:00
Zephyron
b78d35014a android: Fix compilation by adding missing log.h include
Adds missing include for common/logging/log.h in gpu.h which was causing
compilation failures on Android. This header is needed for logging
functionality used in GPU-related operations.

The include was previously indirectly available through other headers,
but making it explicit improves code clarity and prevents potential
future compilation issues.
2025-05-11 14:29:04 +01:00
Zephyron
9697037644 buffer_cache: Simplify storage buffer binding logic
Reverts overly restrictive storage buffer validation and size calculation
that was causing rendering issues in The Legend of Zelda: Tears of the
Kingdom, particularly in underground/depth areas. The simplified approach:

- Uses GetMemoryLayoutSize() instead of manual page probing
- Removes unnecessary 4GB memory bounds validation
- Streamlines address translation and alignment handling

This fixes numerous reported cases of missing or corrupted rendering in
TOTK's underground areas where storage buffer operations are heavily used
for depth-related effects.
2025-05-11 14:29:04 +01:00
Zephyron
1a160d651d service/nvdrv: Relax GPU validation and improve error handling
Relaxes validation checks in the NVDRV GPU service and improves error notifier
handling to prevent potential hangs. Key changes:

- Remove strict size validation in SetErrorNotifier
- Relax GPFIFO entry count validation to only check for non-zero values
- Add proper error notifier state tracking in GPU class
- Improve debug logging messages

The previous strict validation was causing issues with some games like ACNH.
These changes maintain necessary checks while being more permissive with
edge cases that don't impact functionality.

Technical changes:
- Store error notifier state in GPU class for future implementation
- Remove upper bound check on GPFIFO entries
- Simplify error notifier setup flow

This should resolve hanging issues while maintaining core functionality.
2025-05-11 12:18:43 +01:00
Zephyron
5117f6496a service/nvdrv: Implement stubbed GPU functions
Implements several previously stubbed functions in the NVDRV service:
- Initialize proper transfer memory handling
- Add error notifier configuration
- Implement channel timeout and timeslice management
- Add object context allocation and tracking
- Add GPU interface stubs for new functionality

The changes improve the accuracy of GPU-related operations while maintaining
compatibility with the existing codebase. All functions now properly validate
parameters and handle endianness correctly using _le types.
2025-05-11 12:18:33 +01:00
Zephyron
9ef8867bd4 arm/video: Fix shader extension and exception handling
Two main changes in this commit:

1. Replace NVIDIA-specific GL_NV_gpu_shader5 extension with the more widely
   supported GL_EXT_shader_explicit_arithmetic_types_float16 in the scaleforce
   shader. This improves compatibility across different GPU vendors.

2. Refactor ARM32 exception handling:
   - Restructure exception cases for better readability
   - Update exception handling to match current Dynarmic API
   - Fix indentation in switch statement
   - Remove AccessViolation case as it's no longer supported in current API

These changes improve shader compatibility and align the exception handling
with the current Dynarmic implementation.
2025-05-11 12:17:03 +01:00
Zephyron
c3a7aa41d8 vulkan: Fix crashes with bindless texture constant buffer handling
Previously, the code would unconditionally add a constant buffer descriptor
at index 0 whenever storage buffers were present, which could cause conflicts
and crashes. This change:

- Adds validation to check if constant buffer 0 already exists
- Only adds the descriptor if it's not already present
- Prevents potential descriptor conflicts in shaders

This should resolve crashes in Vulkan games related to invalid descriptor
layouts and resource binding conflicts.
2025-05-11 12:17:03 +01:00
Zephyron
d34cc695df vulkan: Add bindless texture constant buffer support in compute pipeline
Add support for bindless texture constant buffers in the compute pipeline
creation process. When storage buffer descriptors are present, create a
constant buffer descriptor to handle bindless textures. This fixes the
"Failed to track bindless texture constant buffer" error.

Changes:
- Add constant buffer descriptor with index 0 and count 1 when storage
  buffers are present
- Place descriptor creation before SPIR-V code generation to ensure proper
  shader compilation

This resolves issues with bindless texture access in compute shaders.
2025-05-11 12:17:03 +01:00
Zephyron
53be20f4ea buffer_cache: Fix storage buffer memory validation and size detection
Fixes the StorageBufferBinding function to properly handle memory validation
and size detection. Key changes include:

- Fix ReadBlock usage to properly handle void return values
- Implement safer memory validation using byte-level reads
- Improve size detection logic for storage buffers
- Fix NVN buffer size reading
- Add proper bounds checking for device memory addresses
- Add better error logging for invalid conditions

This addresses the "Failed to find storage buffer for cbuf index 0" errors
by implementing more robust memory validation and size detection. The changes
ensure proper handling of invalid memory addresses and prevent crashes from
accessing out-of-bounds memory.
2025-05-11 12:17:03 +01:00
Zephyron
8830df1b9f video_core: Enforce safe memory reads for compute dispatch
- Modify DmaPusher to use safe memory reads when handling compute
  operations at High GPU accuracy
- Prevent potential memory corruption issues that could lead to
  invalid dispatch parameters
- Previously, unsafe reads could result in corrupted launch_description
  data in KeplerCompute::ProcessLaunch, causing invalid vkCmdDispatch
  calls
- By enforcing safe reads specifically for compute operations, we
  maintain performance for other GPU tasks while ensuring compute
  dispatch stability

This change requires >= High GPU accuracy level to take effect.
2025-05-11 12:17:03 +01:00
EmulationEnjoyer
3f6fbc21a6 perf(VideoCore): Refactor DispatchIndirect
- Added automatic safe or unsafe processing for better emulation accuracy.

ref: 8cae0310e3
2024-12-20 15:24:58 +00:00
anon
f0ec85c514 dma_pusher.cpp: remove now unused variable that breaks the android build (#76)
Completes http://vub63vv26q6v27xzv2dtcd25xumubshogm67yrpaz2rculqxs7jlfqad.onion/torzu-emu/torzu/pulls/71

Discovered when building android version on debian via command line.

Reviewed-on: http://vub63vv26q6v27xzv2dtcd25xumubshogm67yrpaz2rculqxs7jlfqad.onion/torzu-emu/torzu/pulls/76
Co-authored-by: anon <anon@noreply.localhost>
Co-committed-by: anon <anon@noreply.localhost>
2024-12-20 15:24:58 +00:00
lui
b76a6a79bc externals: update fmt to 11.0.2 and vcpkg to 2024.09.30 (#68)
Updated to fmt 11 with the required source changes for it to work.

Also updated vcpkg for this, and as an added benefit it fixes the `Unable to find a valid Visual Studio instance` error, and the VS 2019 build tools are no longer required. Just make sure to delete the existing downloaded vcpkg tool and binaries in `externals/vcpkg` if you have compiled before, or else it will continue to use the old version and give the error.

Reviewed-on: http://vub63vv26q6v27xzv2dtcd25xumubshogm67yrpaz2rculqxs7jlfqad.onion/torzu-emu/torzu/pulls/68
Co-authored-by: lui <lui@vub63vv26q6v27xzv2dtcd25xumubshogm67yrpaz2rculqxs7jlfqad.onion>
Co-committed-by: lui <lui@vub63vv26q6v27xzv2dtcd25xumubshogm67yrpaz2rculqxs7jlfqad.onion>
2024-12-20 15:24:58 +00:00
cow
6179cc0588 kepler_compute: use safe memory read
If unsafe read is done there can sometimes be corrupt data in the
KeplerCompute::ProcessLaunch qmd structure.

Fixes GPU crashes in 'Princess Peach: Showtime!' when using vulkan
renderer. Requires using "Accuracy Level High" (crashes will still
happen if using "Normal").

Tested on Radeon 6750XT, Linux 6.11.2, Mesa 24.2.5 (RADV driver).

Unsafe read was introduced in 115792158d
"VideoCore: Implement DispatchIndirect"

How did I debug this:
- Used VK_LAYER_KHRONOS_validation which found invalid vkCmdDispatch
  (along with a lot of other noise!)
- Instrumented all calls to vulkan Dispatch(), set breakpoint when
  grid_dim_x > 1024 (an obviously invalid value). Found dispatch came
  from RasterizerVulkan::DispatchCompute().
- Commented out DispatchCompute() entirely, game runs with no crashes
  but some graphics effects are missing.
- Keep going one layer up, observe corrupted `launch_description` in
  KeplerCompute::ProcessLaunch()
- Attempted safe ReadBlock (`which = VideoCommon::CacheType::All`)
  instead of ReadBlockUnsafe in KeplerCompute::ProcessLaunch(), did not
  help
- Go one layer up to DmaPusher. Switch to safe_process(). No more
  corrupt `launch_description`.
2024-12-20 15:24:58 +00:00
Samuliak
80d83abcf3 mark format functions as const 2024-12-20 15:24:58 +00:00
lui
6dbbe4225b renderer: add area sampling scaling method (#57)
Adds Area Sampling to the list of scaling options. Works well to achieve a high-quality, smooth super-sampling effect. Dolphin has had this for a while, and now Ryujinx has recently added it too, so I decided to port it.

Not sure if adding the extra uniform to the OpenGL WindowAdaptPass was a good idea or not, or if using the push constants under Vulkan was either, but I wasn't sure about the best way to get the window size for use in the shader, and other scaling methods still work fine. Implementation seems to work fine under both Vulkan and OpenGL, but might still need some minor tweaks to the shader. Should definitely do some testing before merging, I have tested on an Nvidia RTX 3080 under Windows.

Adapted from these two PRs:
https://github.com/Ryujinx/Ryujinx/pull/7304
https://github.com/dolphin-emu/dolphin/pull/11999

Reviewed-on: http://vub63vv26q6v27xzv2dtcd25xumubshogm67yrpaz2rculqxs7jlfqad.onion/torzu-emu/torzu/pulls/57
Co-authored-by: lui <lui@vub63vv26q6v27xzv2dtcd25xumubshogm67yrpaz2rculqxs7jlfqad.onion>
Co-committed-by: lui <lui@vub63vv26q6v27xzv2dtcd25xumubshogm67yrpaz2rculqxs7jlfqad.onion>
2024-12-20 15:24:58 +00:00
Herman Semenov
7bebaad1bf Using reserve() for optimization inserts, marked unused pair items and minor code refactor 2024-12-20 15:24:58 +00:00
echosys
08439ae74e Add option to only optimize SPIRV during load (#13)
Adds a new option "On Load" to the "Optimize SPIRV output" option that turns on optimizations during the loading of the shader cache from disk, but turns it off after that.
The previous checkbox states have been named "Never" for unchecked and "Always" for checked.

The idea is that once the shader cache has most of the shaders in a game cached they can be optimized during initial game startup (where a performance hit matters less) and the few shaders that get compiled during runtime are not optimized to reduce performance hits.

Most of the commit is adding the setting to the Android app, the main logic is in the `gl_shader_cache.cpp` and `vk_pipeline_cache.cpp` files.

Reviewed-on: http://vub63vv26q6v27xzv2dtcd25xumubshogm67yrpaz2rculqxs7jlfqad.onion/torzu-emu/torzu/pulls/13
Co-authored-by: echosys <echosys@noreply.localhost>
Co-committed-by: echosys <echosys@noreply.localhost>
2024-12-20 15:24:58 +00:00
Jarrod Norwell
0078a5544e Implemented LogicOp fix 2024-12-20 15:24:57 +00:00
Mike Lothian
c39939fd8e Use fmt 11.0.0 2024-12-20 15:24:57 +00:00
spectranator
701f391d00 Disabled problematic MSVC warning-to-errors 2024-12-20 15:24:57 +00:00
darktux
8028770d4e Optionally optimize generated SPIRV with spirv-opt (#10)
Reviewed-on: http://y2nlvhmmk5jnsvechppxnbyzmmv3vbl7dvzn6ltwcdbpgxixp3clkgqd.onion/darktux/torzu/pulls/10
Co-authored-by: darktux <darktux@y2nlvhmmk5jnsvechppxnbyzmmv3vbl7dvzn6ltwcdbpgxixp3clkgqd.onion>
Co-committed-by: darktux <darktux@y2nlvhmmk5jnsvechppxnbyzmmv3vbl7dvzn6ltwcdbpgxixp3clkgqd.onion>
2024-12-20 15:24:57 +00:00
darktux
8b6b7ab647 Removed telemetry and anonymized SCM (git) strings 2024-12-20 15:24:57 +00:00
darktux
b250c2ff97 Fix GCC builds with Debug build type
When compiling with -DCMAKE_BUILD_TYPE=Debug, GCC would (correctly) fail to
compile intrinsics in stb and host1x due to lack of optimizations.

Sadly, the compilation error given is bogus and Clang completing the builds
without issues does raise some eyebrows.

Therefore, force optimizations for the offending files under GCC when
creating Debug builds.

Signed-off-by: voidanix <voidanix@keyedlimepie.org>
2024-12-20 15:24:57 +00:00
Lucas Clemente Vella
bf4c8952b0 Vulkan validation error fix.
Different image usage flags between image creation and image view
creation.
2024-12-20 15:24:57 +00:00
Alessio
805bcb19dc Radeon gpu profiler detection support 2024-12-20 15:24:57 +00:00
Exverge
41807019b6 fix: Fixes compiling to non-Apple OSes on arm64 2024-12-20 15:24:57 +00:00
Nick Majkic
75b68c9170 Macos moltenvk headers 2024-12-20 15:24:57 +00:00
Nick Majkic
b9769e59e9 Clean up CMAKE files for mac and xcode building 2024-12-20 15:24:57 +00:00
Alessio
409e0e558f Better surface logging 2024-12-20 15:24:57 +00:00
Alessio
5bf26b16f6 fix for amd video playback (green videos) 2024-12-20 15:24:57 +00:00
Liam
1a75fa37d3 renderer_vulkan: fallback to D32 for missing D24X8 2024-12-20 15:24:47 +00:00
niansa
9b4ef19ed9 Port changes from Early Access 2024-12-20 15:24:40 +00:00
liamwhite
f1b1530249
Merge pull request #13171 from liamwhite/fake-address
texture_cache: do not track invalid addresses
2024-02-27 09:42:46 -05:00
liamwhite
6948ac8c16
general: workarounds for SMMU syncing issues (#12749) 2024-02-27 15:42:15 +01:00
liamwhite
b2e129eaa5
vk_rasterizer: flip scissor y on lower left origin mode (#13122) 2024-02-27 15:40:33 +01:00
liamwhite
1de37306a5
buffer_cache: avoid overflow in usage tracker (#13166) 2024-02-27 15:39:11 +01:00
liamwhite
9bc85dda5f
texture_cache: use two-pass collection for costly load resources (#13096) 2024-02-27 15:38:14 +01:00
Narr the Reg
1bec420695
Merge pull request #13172 from liamwhite/gl-streams
renderer_opengl: declare geometry stream support in profile
2024-02-26 11:51:25 -06:00
Liam
a0e254e7c4 renderer_opengl: declare geometry stream support in profile 2024-02-26 11:18:30 -05:00
Liam
25c3bbba0e settings: remove global override for smash on amdvlk 2024-02-26 11:16:18 -05:00
Liam
d66ca8b731 video_core: make gpu context aware of rendering program 2024-02-26 11:16:14 -05:00
Liam
fd9ed54f27 texture_cache: do not track invalid addresses 2024-02-26 10:26:27 -05:00
Narr the Reg
984396a21a
Merge pull request #13001 from liamwhite/scaled-availability
vulkan_device: don't use fixed cap for memory limits
2024-02-22 11:31:17 -06:00