Commit Graph

267 Commits

Author SHA1 Message Date
Marcin Mikołajczyk
0add6b8c1f
Neo: packed math for unsigned/signed integers (#4407) 2026-05-13 20:33:50 -07:00
Marcin Mikołajczyk
89b886348e
Neo: three-operand min/max/med for 16bit floats (#4395) 2026-05-11 23:06:00 +02:00
squidbus
ac61f4aee2
shader_recompiler: Strip out manual bounds checking (#4380) 2026-05-09 10:05:18 -07:00
oltolm
442a07a707
Fix compilation with mingw-w64 (#4365)
* cmake: apply sirit unused-command-line-argument flag only with Clang

* shader_recompiler: move Opcode magic_enum range customization to opcodes.h

Define the magic_enum range for Shader::Gcn::Opcode as a proper enum_range specialization in opcodes.h instead of relying on translation-unit macros in translate.cpp.

This makes the customization visible where the enum is used and avoids the GCC linkage/build issue.

* thread: use Windows thread naming path for MinGW-w64

Switch the thread naming guard from _MSC_VER to _WIN32 so MinGW-w64 builds use the Windows SetThreadDescription implementation instead of falling through the POSIX branch.

This matches the platform rather than the compiler and avoids the MinGW-w64 build issue.
2026-05-08 22:36:55 -07:00
Marcin Mikołajczyk
34b35b526e
Neo: Float16 packed math (#4354) 2026-05-04 15:21:20 -07:00
Marcin Mikołajczyk
26eaa3e3af
Neo: 16bit shift instructions (#4351) 2026-05-04 10:13:42 -07:00
Marcin Mikołajczyk
a3e25efad5
Neo: V_MAD_MIX opcodes (#4338) 2026-04-30 16:56:27 -07:00
Stephen Miller
475696c542
Bump enum range to fix unknown opcode logging (#4333) 2026-04-29 07:11:41 +03:00
Marcin Mikołajczyk
76729835d7
Neo: bit and alu instructions (#4332) 2026-04-28 14:54:15 -07:00
Marcin Mikołajczyk
d47b0524ce
V_ADD3_U32 and V_OR3_B32 (#4326) 2026-04-27 13:35:37 -07:00
Marcin Mikołajczyk
fba374442c
file_sys: apply case-insensitive search to mods_path on GNU/Linux and macOS (#4312) (#4310)
The case-insensitive fallback search() in GetHostPath is only
invoked for patch_path and host_path, so mods whose file or folder
capitalization does not exactly match the guest path are silently
bypassed even when the files are present. Mirror the existing
search(patch_path) pass for mods_path, placed first to preserve
mod > patch > base precedence.

Co-authored-by: Matías Buzzo <matias@mbuzzo.com>
2026-04-27 18:07:05 +03:00
Marcin Mikołajczyk
99f2480e21
Neo: V_*_F16 arithmetic ops (#4311) 2026-04-25 12:51:02 +02:00
Marcin Mikołajczyk
c1e496efcd
SDWA (#4203) 2026-04-14 22:41:55 +03:00
TheTurtle
1f50aa3172
frontend: Add helper methods for thread bit getters and setters (#4243)
Co-authored-by: georgemoralis <giorgosmrls@gmail.com>
2026-04-09 23:32:21 +03:00
TheTurtle
0d3b6f7dd0
shader_recompiler: Minor improvements to buffer atomics (#4242)
* resource_tracking_pass: Adjust buffer type if host doesn't support float buffer atomic

* resource_tracking_pass: Implement data append/consume as buffer atomics in IR level

This was previously done in spirv backend, the implementation was exactly the same as the buffer atomics, so unify them

* ir: Bump instruction flag to 8 bytes

* frontend: Pass pc to buffer flags for better debugging when sharp tracking fails

* clang format

---------

Co-authored-by: georgemoralis <giorgosmrls@gmail.com>
2026-04-09 23:31:33 +03:00
Kravickas
5945c8719b
Implement BUFFER_ATOMIC_FCMPSWAP (#4200)
Implement BUFFER_ATOMIC_FCMPSWAP via descriptor aliasing + bitcast
2026-04-01 12:55:06 +03:00
georgemoralis
08168dc386
New config mode (part1 of 0.15.1 branch series) (#4145)
* using new emulator_settings

* the default user is now just player one

* transfer install, addon dirs

* fix load custom config issue

---------

Co-authored-by: kalaposfos13 <153381648+kalaposfos13@users.noreply.github.com>
2026-03-21 22:26:36 +02:00
Kravickas
2ca342970a
MIP fixes (#4141)
* int32-modifiers

GCN VOP3 abs/neg modifier bits always operate on the sign bit (bit 31)
regardless of instruction type. For integer operands this means:
	
abs = clear bit 31   (x & 0x7FFFFFFF)
neg = toggle bit 31  (x ^ 0x80000000)

* int64-modifiers

Previously GetSrc64<IR::U64> completely ignored input modifiers
for integer operands. Now unpacks to two U32s, modifies the high
dword's bit 31 (= bit 63 of the 64-bit value), and repacks.

* V_MUL_LEGACY_F32

GCN V_MUL_LEGACY_F32: if either source is zero, result is +0.0
regardless of the other operand (even NaN or Inf). Standard IEEE
multiply produces NaN for 0*Inf. The fix adds a zero-check select
before the multiply.
2026-03-18 10:05:20 +02:00
Stephen Miller
85476e55ea
Recompiler: Implement IMAGE_ATOMIC_CMPSWAP (#4109)
* To implement ImageAtomicCmpSwap

...but it doesn't work, so here it shall stay.

* a fix

* Clang

* Add to MayHaveSideEffects

I missed this while digging through IR code.
2026-03-09 18:02:22 +02:00
TheTurtle
e16ba06ab0
shader_recompiler: Support 32 thread sharing mode (#4110) 2026-03-09 17:33:58 +02:00
Pavel
89e74828e6
fixup r128 (#4100) 2026-03-05 22:44:37 +02:00
kalaposfos13
c2a47d2a99
Handle operand fields execlo and exechi for S_MOV (#4023)
Some checks are pending
Build and Release / reuse (push) Waiting to run
Build and Release / clang-format (push) Waiting to run
Build and Release / get-info (push) Waiting to run
Build and Release / windows-sdl (push) Blocked by required conditions
Build and Release / macos-sdl (push) Blocked by required conditions
Build and Release / linux-sdl (push) Blocked by required conditions
Build and Release / linux-sdl-gcc (push) Blocked by required conditions
Build and Release / pre-release (push) Blocked by required conditions
Co-authored-by: TheTurtle <geoster3d@gmail.com>
2026-02-11 16:00:13 +02:00
Marcin Mikołajczyk
c81ebe6418
Implement V_FFBH_I32 (#3965)
Some checks are pending
Build and Release / reuse (push) Waiting to run
Build and Release / clang-format (push) Waiting to run
Build and Release / get-info (push) Waiting to run
Build and Release / windows-sdl (push) Blocked by required conditions
Build and Release / macos-sdl (push) Blocked by required conditions
Build and Release / linux-sdl (push) Blocked by required conditions
Build and Release / linux-sdl-gcc (push) Blocked by required conditions
Build and Release / pre-release (push) Blocked by required conditions
2026-01-27 23:08:26 +02:00
Marcin Mikołajczyk
1473b2358a
Implement V_CMP_OP_F64 (#3962) 2026-01-27 20:18:05 +02:00
Marcin Mikołajczyk
1e059cac04
Implement V_LSHR_B64 (#3961) 2026-01-27 12:27:56 +02:00
TheTurtle
d3ad728ac0
vector_alu: Handle -1 as src1 in v_cmp_u64 (#3855) 2025-12-06 15:11:29 -08:00
psucien
a9f8eaf778
video_core: Initial implementation of pipeline cache (#3816)
Some checks are pending
Build and Release / reuse (push) Waiting to run
Build and Release / clang-format (push) Waiting to run
Build and Release / get-info (push) Waiting to run
Build and Release / windows-sdl (push) Blocked by required conditions
Build and Release / macos-sdl (push) Blocked by required conditions
Build and Release / linux-sdl (push) Blocked by required conditions
Build and Release / linux-sdl-gcc (push) Blocked by required conditions
Build and Release / pre-release (push) Blocked by required conditions
* Initial implementation

* Fix for crash caused by stale stages data; cosmetics applied

* Someone mentioned the assert

* Async blob writer

* Fix for memory leak

* Remain stuff

* Async changed to `packaged_task`
2025-11-29 11:52:08 +02:00
Stephen Miller
6295c32e5c
Render.Recompiler: Implement V_FLOOR_F64 (#3828)
* VectorFpRound64 decode table

Also fixed definition for V_TRUNC_F64, though I doubt that would change anything important.

* V_FLOOR_F64 implementation

Used by Just Cause 4

* Oops

Never forget your 64s
2025-11-24 23:51:06 -08:00
TheTurtle
f1a8b7d85e
vector_alu: Fix V_CMP_U64 (#3823)
* vector_alu: Fix V_CMP_U64

* vector_alu: Also handle vcc in V_CMP_U64
2025-11-23 17:26:34 -08:00
Stephen Miller
683e5f3b04
Core: Simulate write-only file access with read-write access (#3360)
Some checks are pending
Build and Release / reuse (push) Waiting to run
Build and Release / clang-format (push) Waiting to run
Build and Release / get-info (push) Waiting to run
Build and Release / windows-sdl (push) Blocked by required conditions
Build and Release / macos-sdl (push) Blocked by required conditions
Build and Release / linux-sdl (push) Blocked by required conditions
Build and Release / linux-sdl-gcc (push) Blocked by required conditions
Build and Release / pre-release (push) Blocked by required conditions
* Swap write access mode for read write

Opening with access mode w will erase the opened file. We do not want this.

* Create mode

Opening with write access was previously the only way to create a file through open, so add a separate FileAccessMode that uses the write access mode to create files.

* Update file_system.cpp

Remove a hack added to posix_rename to bypass the file clearing behaviors of FileAccessMode::Write

* Check access mode in read functions

Write-only files cause the EBADF return on the various read functions. Now that we're opening files differently, properly handling this is necessary.

* Separate appends into proper modes

Fixes a potential regression from one of my prior PRs, and ensures the Write | Append flag combo also behaves properly in read-related functions.

* Move IsWriteOnly check after device/socket reads

file->f is only valid for files, so checking this before checking for sockets/devices will cause access violations.

* Fix issues

Now that Write is identical to ReadWrite, internal uses of Write need to be swapped to my new Create mode

* Fix remaining uses of FileAccessMode write to create files

Missed these before.

* Fix rebase

* Add stubbed get_authinfo (#3722)

* mostly stubbed get_authinfo

* Return value observed on console if get_authinfo was called for the current thread, esrch otherwise

---------

Co-authored-by: kalaposfos13 <153381648+kalaposfos13@users.noreply.github.com>
Co-authored-by: georgemoralis <giorgosmrls@gmail.com>
2025-11-04 10:57:26 +02:00
TheTurtle
8f37cfb739
amdgpu: Split liverpool registers and cleanup (#3707)
Some checks are pending
Build and Release / reuse (push) Waiting to run
Build and Release / clang-format (push) Waiting to run
Build and Release / get-info (push) Waiting to run
Build and Release / windows-sdl (push) Blocked by required conditions
Build and Release / windows-qt (push) Blocked by required conditions
Build and Release / macos-sdl (push) Blocked by required conditions
Build and Release / macos-qt (push) Blocked by required conditions
Build and Release / linux-sdl (push) Blocked by required conditions
Build and Release / linux-qt (push) Blocked by required conditions
Build and Release / linux-sdl-gcc (push) Blocked by required conditions
Build and Release / linux-qt-gcc (push) Blocked by required conditions
Build and Release / pre-release (push) Blocked by required conditions
2025-10-05 13:42:40 -07:00
DanielSvoboda
eb18382396
Fix: V_MUL_I32_I24 | V_MUL_U32_U24 (#3632)
Some checks are pending
Build and Release / reuse (push) Waiting to run
Build and Release / clang-format (push) Waiting to run
Build and Release / get-info (push) Waiting to run
Build and Release / windows-sdl (push) Blocked by required conditions
Build and Release / windows-qt (push) Blocked by required conditions
Build and Release / macos-sdl (push) Blocked by required conditions
Build and Release / macos-qt (push) Blocked by required conditions
Build and Release / linux-sdl (push) Blocked by required conditions
Build and Release / linux-qt (push) Blocked by required conditions
Build and Release / linux-sdl-gcc (push) Blocked by required conditions
Build and Release / linux-qt-gcc (push) Blocked by required conditions
Build and Release / pre-release (push) Blocked by required conditions
2025-09-19 19:05:22 -07:00
squidbus
0eff74223a
shader_recompiler: Implement fallback path for missing shaderFloat16 support. (#3604) 2025-09-18 00:43:47 -07:00
Stephen Miller
0bfde1fcde
video_core: Check DB_SHADER_CONTROL register before performing depth exports (#3588)
The DB_SHADER_CONTROL register has several enable flags which must be set before certain depth exports are enabled.
This commit adds logic to respect the values in this register when performing depth exports, which fixes the regression in earlier versions of KNACK.
I've also renamed DepthBufferControl to DepthShaderControl, since that's closer to the official name for the register.
2025-09-13 04:32:24 -07:00
TheTurtle
374c2194d4
video_core: Address various UE bugs (#3559)
* vk_rasterizer: Reorder image query in fast clear elimination

Fixes missing clears when a texture is being cleared using this method but never actually used for rendering purposes by ensuring the texture cache has at least a chance to register cmask

* shader_recompiler: Partial support for ANCILLARY_ENA

* pixel_format: Add number conversion of BC6 srgb format

* texture_cache: Support aliases of 3D and 2D array images

Used be UE to render its post processing LUT

* pixel_format: Test BC6 srgb as unorm

Still not sure what is up with snorm/unorm can be useful to have both actions to compare for now

* video_core: Use attachment feedback layout instead of general if possible

UE games often do mipgen passes where the previous mip of the image being rendered to is bound for reading. This appears to cause corruption issues so use attachment feedback loop extension to ensure correct output

* renderer_vulkan: Improve feedback loop code

* Set proper usage flag for feedback loop usage
* Add dynamic state extension and enable it for color aspect when necessary
* Check if image is bound instead of force_general for better code consistency

* shader_recompiler: More proper depth export implementation

* shader_recompiler: Fix bug in output modifiers

* shader_recompiler: Fix sampling from MSAA images

This is not allowed by any graphics API but seems hardware supports it somehow and it can be encountered. To avoid glitched output translate to to a texelFetch call on sample 0

* clang format

* image: Add back missing code

* shader_recompiler: Better ancillary implementation

Now is implemented with a custom attribute that is constant propagated depending on which parts of it are extracted. It will assert if an unknown part is used or if the attribute itself is not removed by dead code elim

* copy_shader: Ignore not enabled export channels

* constant_propagation: Invalidate ancillary after successful elimination

* spirv: Fix f11/f10 conversion to f32

---------

Co-authored-by: georgemoralis <giorgosmrls@gmail.com>
2025-09-12 19:29:16 +03:00
TheTurtle
eb9a7e8fbd
liverpool: Write valid queries on PixelPipeStatDump (#3553)
* liverpool: Write valid queries on PixelPipeStatDump

* export: Small assert swap

* liverpool: Advance zpass counter on every dump request
2025-09-07 18:08:26 -07:00
baggins183
df52585086
Allow vector and scalar offset in buffer address arg to LoadBuffer/StoreBuffer (#3439)
Some checks are pending
Build and Release / reuse (push) Waiting to run
Build and Release / clang-format (push) Waiting to run
Build and Release / get-info (push) Waiting to run
Build and Release / windows-sdl (push) Blocked by required conditions
Build and Release / windows-qt (push) Blocked by required conditions
Build and Release / macos-sdl (push) Blocked by required conditions
Build and Release / macos-qt (push) Blocked by required conditions
Build and Release / linux-sdl (push) Blocked by required conditions
Build and Release / linux-qt (push) Blocked by required conditions
Build and Release / linux-sdl-gcc (push) Blocked by required conditions
Build and Release / linux-qt-gcc (push) Blocked by required conditions
Build and Release / pre-release (push) Blocked by required conditions
* Allow vector and scalar offset in buffer address arg to
LoadBuffer/StoreBuffer

* remove is_ring check

* fix atomics and update pattern matching for tess factor stores

* remove old asserts about soffset

* small fixes

* copyright

* Handle sgpr initialization for 2 special hull shader values, including tess factor buffer offset
2025-09-03 20:54:23 -07:00
Stephen Miller
c26f56ab02
Handle offsets and format overrides in fetch shaders (#3486)
Co-authored-by: TheTurtle <47210458+raphaelthegreat@users.noreply.github.com>
2025-08-30 14:20:23 -07:00
squidbus
32244c097c
shader_recompiler: Implement V_ADD_F64 and loading 64-bit float from SGPR. (#3483) 2025-08-29 18:11:21 -07:00
Stephen Miller
56626111ab
Properly use float type for float buffer atomics (#3480)
Co-authored-by: TheTurtle <47210458+raphaelthegreat@users.noreply.github.com>
2025-08-29 17:18:10 -07:00
Stephen Miller
6f26f66d77
Handle ExecLo source in S_FF1_I32_B64 (#3481)
Co-authored-by: TheTurtle <47210458+raphaelthegreat@users.noreply.github.com>
2025-08-29 16:59:11 -07:00
TheTurtle
8ae45559fc
vector_interpolation: Address some assertions (#3473) 2025-08-29 13:44:45 -07:00
TheTurtle
6d98a5ab60
vk_pipeline_cache: Cleanup graphics key refresh (#3449)
Some checks are pending
Build and Release / reuse (push) Waiting to run
Build and Release / clang-format (push) Waiting to run
Build and Release / get-info (push) Waiting to run
Build and Release / windows-sdl (push) Blocked by required conditions
Build and Release / windows-qt (push) Blocked by required conditions
Build and Release / macos-sdl (push) Blocked by required conditions
Build and Release / macos-qt (push) Blocked by required conditions
Build and Release / linux-sdl (push) Blocked by required conditions
Build and Release / linux-qt (push) Blocked by required conditions
Build and Release / linux-sdl-gcc (push) Blocked by required conditions
Build and Release / linux-qt-gcc (push) Blocked by required conditions
Build and Release / pre-release (push) Blocked by required conditions
* vk_pipeline_cache: Cleanup graphics key refresh

* position: Don't assert on None mapping

Also check outputs in runtime info so shader is recompiled if they change
2025-08-23 17:16:58 -07:00
TheTurtle
6dd2b3090c
shader_recompiler: Improve shader exports accuracy (part 1) (#3447)
* video_core: support for RT layer outputs

- support for RT layer outputs
- refactor for handling of export attributes
- move output->attribute mapping to a separate header

* export: Rework render target exports

- Centralize all code related to MRT exports into a single function to make it easier to follow
- Apply swizzle to output RGBA colors instead of the render target channel.
  This fixes swizzles on formats with < 4 channels

For example with render target format R8_UNORM and COMP_SWAP ALT_REV the previous code would output

frag_color.a = color.r;

instead of

frag_color.r = color.a;

which would result in incorrect output in some cases

* vk_pipeline_cache: Apply swizzle to write masks

---------

Co-authored-by: polyproxy <47796739+polybiusproxy@users.noreply.github.com>
2025-08-23 14:39:59 -07:00
kalaposfos13
8a84f1b778
Implement V_CMP_GT_U64 (#3352)
Some checks are pending
Build and Release / reuse (push) Waiting to run
Build and Release / clang-format (push) Waiting to run
Build and Release / get-info (push) Waiting to run
Build and Release / windows-sdl (push) Blocked by required conditions
Build and Release / windows-qt (push) Blocked by required conditions
Build and Release / macos-sdl (push) Blocked by required conditions
Build and Release / macos-qt (push) Blocked by required conditions
Build and Release / linux-sdl (push) Blocked by required conditions
Build and Release / linux-qt (push) Blocked by required conditions
Build and Release / linux-sdl-gcc (push) Blocked by required conditions
Build and Release / linux-qt-gcc (push) Blocked by required conditions
Build and Release / pre-release (push) Blocked by required conditions
* Implement V_CMP_GT_U64

* Add GroupAny

* Use GroupAny

* Add assert

* clang
2025-08-20 19:53:54 -07:00
baggins183
670067c001
Improve heuristic for attributes passed via ring buffers. (#3426)
Some checks are pending
Build and Release / reuse (push) Waiting to run
Build and Release / clang-format (push) Waiting to run
Build and Release / get-info (push) Waiting to run
Build and Release / windows-sdl (push) Blocked by required conditions
Build and Release / windows-qt (push) Blocked by required conditions
Build and Release / macos-sdl (push) Blocked by required conditions
Build and Release / macos-qt (push) Blocked by required conditions
Build and Release / linux-sdl (push) Blocked by required conditions
Build and Release / linux-qt (push) Blocked by required conditions
Build and Release / linux-sdl-gcc (push) Blocked by required conditions
Build and Release / linux-qt-gcc (push) Blocked by required conditions
Build and Release / pre-release (push) Blocked by required conditions
Previously a buffer load in a vertex shader could be treated like a ring access, dropping offen vgpr and possibly asserting during resource tracking because of mismatch between types (u32x2 vs U32), caused by inconsistencies in flags (index_enable and offset_enable)
2025-08-17 23:22:40 +03:00
TheTurtle
93767ae31b
shader_recompiler: Rework sharp tracking for robustness (#3327)
* shader_recompiler: Remove remnants of old discard

Also constant propagate conditional discard if condition is constant

* resource_tracking_pass: Rework sharp tracking for robustness

* resource_tracking_pass: Add source dominance analysis

When reachability is not enough to prune source list, check if a source dominates all other sources

* resource_tracking_pass: Fix immediate check

How did this work before

* resource_tracking_pass: Remove unused template type

* readlane_elimination_pass: Don't add phi when all args are the same

New sharp tracking exposed some bad sources coming on sampler sharps with aniso disable pattern that also were part of readlane pattern, fix tracking by removing the unnecessary phis inbetween

* resource_tracking_pass: Allow phi in disable aniso pattern

* resource_tracking_pass: Handle not valid buffer sharp and more phi in aniso pattern
2025-07-28 13:32:16 -07:00
baggins183
116554e425
V_ALIGNBYTE_B32 and V_ALIGNBIT_B32 (#3316)
* implement V_ALIGNBYTE_B32 and V_ALIGNBIT_B32

* fix mask

* uncomment alignbit
2025-07-28 00:27:13 -07:00
TheTurtle
93b06ba2da
translate: Correct instance id fetch in local shader (#3309) 2025-07-23 23:27:11 +03:00
TheTurtle
19c3d05ac1
shader_recompiler: Use VM bit for conditional discard (#3306) 2025-07-23 20:58:09 +03:00