* cmake: apply sirit unused-command-line-argument flag only with Clang
* shader_recompiler: move Opcode magic_enum range customization to opcodes.h
Define the magic_enum range for Shader::Gcn::Opcode as a proper enum_range specialization in opcodes.h instead of relying on translation-unit macros in translate.cpp.
This makes the customization visible where the enum is used and avoids the GCC linkage/build issue.
* thread: use Windows thread naming path for MinGW-w64
Switch the thread naming guard from _MSC_VER to _WIN32 so MinGW-w64 builds use the Windows SetThreadDescription implementation instead of falling through the POSIX branch.
This matches the platform rather than the compiler and avoids the MinGW-w64 build issue.
The case-insensitive fallback search() in GetHostPath is only
invoked for patch_path and host_path, so mods whose file or folder
capitalization does not exactly match the guest path are silently
bypassed even when the files are present. Mirror the existing
search(patch_path) pass for mods_path, placed first to preserve
mod > patch > base precedence.
Co-authored-by: Matías Buzzo <matias@mbuzzo.com>
* resource_tracking_pass: Adjust buffer type if host doesn't support float buffer atomic
* resource_tracking_pass: Implement data append/consume as buffer atomics in IR level
This was previously done in spirv backend, the implementation was exactly the same as the buffer atomics, so unify them
* ir: Bump instruction flag to 8 bytes
* frontend: Pass pc to buffer flags for better debugging when sharp tracking fails
* clang format
---------
Co-authored-by: georgemoralis <giorgosmrls@gmail.com>
* using new emulator_settings
* the default user is now just player one
* transfer install, addon dirs
* fix load custom config issue
---------
Co-authored-by: kalaposfos13 <153381648+kalaposfos13@users.noreply.github.com>
* int32-modifiers
GCN VOP3 abs/neg modifier bits always operate on the sign bit (bit 31)
regardless of instruction type. For integer operands this means:
abs = clear bit 31 (x & 0x7FFFFFFF)
neg = toggle bit 31 (x ^ 0x80000000)
* int64-modifiers
Previously GetSrc64<IR::U64> completely ignored input modifiers
for integer operands. Now unpacks to two U32s, modifies the high
dword's bit 31 (= bit 63 of the 64-bit value), and repacks.
* V_MUL_LEGACY_F32
GCN V_MUL_LEGACY_F32: if either source is zero, result is +0.0
regardless of the other operand (even NaN or Inf). Standard IEEE
multiply produces NaN for 0*Inf. The fix adds a zero-check select
before the multiply.
* To implement ImageAtomicCmpSwap
...but it doesn't work, so here it shall stay.
* a fix
* Clang
* Add to MayHaveSideEffects
I missed this while digging through IR code.
* VectorFpRound64 decode table
Also fixed definition for V_TRUNC_F64, though I doubt that would change anything important.
* V_FLOOR_F64 implementation
Used by Just Cause 4
* Oops
Never forget your 64s
* Swap write access mode for read write
Opening with access mode w will erase the opened file. We do not want this.
* Create mode
Opening with write access was previously the only way to create a file through open, so add a separate FileAccessMode that uses the write access mode to create files.
* Update file_system.cpp
Remove a hack added to posix_rename to bypass the file clearing behaviors of FileAccessMode::Write
* Check access mode in read functions
Write-only files cause the EBADF return on the various read functions. Now that we're opening files differently, properly handling this is necessary.
* Separate appends into proper modes
Fixes a potential regression from one of my prior PRs, and ensures the Write | Append flag combo also behaves properly in read-related functions.
* Move IsWriteOnly check after device/socket reads
file->f is only valid for files, so checking this before checking for sockets/devices will cause access violations.
* Fix issues
Now that Write is identical to ReadWrite, internal uses of Write need to be swapped to my new Create mode
* Fix remaining uses of FileAccessMode write to create files
Missed these before.
* Fix rebase
* Add stubbed get_authinfo (#3722)
* mostly stubbed get_authinfo
* Return value observed on console if get_authinfo was called for the current thread, esrch otherwise
---------
Co-authored-by: kalaposfos13 <153381648+kalaposfos13@users.noreply.github.com>
Co-authored-by: georgemoralis <giorgosmrls@gmail.com>
The DB_SHADER_CONTROL register has several enable flags which must be set before certain depth exports are enabled.
This commit adds logic to respect the values in this register when performing depth exports, which fixes the regression in earlier versions of KNACK.
I've also renamed DepthBufferControl to DepthShaderControl, since that's closer to the official name for the register.
* vk_rasterizer: Reorder image query in fast clear elimination
Fixes missing clears when a texture is being cleared using this method but never actually used for rendering purposes by ensuring the texture cache has at least a chance to register cmask
* shader_recompiler: Partial support for ANCILLARY_ENA
* pixel_format: Add number conversion of BC6 srgb format
* texture_cache: Support aliases of 3D and 2D array images
Used be UE to render its post processing LUT
* pixel_format: Test BC6 srgb as unorm
Still not sure what is up with snorm/unorm can be useful to have both actions to compare for now
* video_core: Use attachment feedback layout instead of general if possible
UE games often do mipgen passes where the previous mip of the image being rendered to is bound for reading. This appears to cause corruption issues so use attachment feedback loop extension to ensure correct output
* renderer_vulkan: Improve feedback loop code
* Set proper usage flag for feedback loop usage
* Add dynamic state extension and enable it for color aspect when necessary
* Check if image is bound instead of force_general for better code consistency
* shader_recompiler: More proper depth export implementation
* shader_recompiler: Fix bug in output modifiers
* shader_recompiler: Fix sampling from MSAA images
This is not allowed by any graphics API but seems hardware supports it somehow and it can be encountered. To avoid glitched output translate to to a texelFetch call on sample 0
* clang format
* image: Add back missing code
* shader_recompiler: Better ancillary implementation
Now is implemented with a custom attribute that is constant propagated depending on which parts of it are extracted. It will assert if an unknown part is used or if the attribute itself is not removed by dead code elim
* copy_shader: Ignore not enabled export channels
* constant_propagation: Invalidate ancillary after successful elimination
* spirv: Fix f11/f10 conversion to f32
---------
Co-authored-by: georgemoralis <giorgosmrls@gmail.com>
* Allow vector and scalar offset in buffer address arg to
LoadBuffer/StoreBuffer
* remove is_ring check
* fix atomics and update pattern matching for tess factor stores
* remove old asserts about soffset
* small fixes
* copyright
* Handle sgpr initialization for 2 special hull shader values, including tess factor buffer offset
* vk_pipeline_cache: Cleanup graphics key refresh
* position: Don't assert on None mapping
Also check outputs in runtime info so shader is recompiled if they change
* video_core: support for RT layer outputs
- support for RT layer outputs
- refactor for handling of export attributes
- move output->attribute mapping to a separate header
* export: Rework render target exports
- Centralize all code related to MRT exports into a single function to make it easier to follow
- Apply swizzle to output RGBA colors instead of the render target channel.
This fixes swizzles on formats with < 4 channels
For example with render target format R8_UNORM and COMP_SWAP ALT_REV the previous code would output
frag_color.a = color.r;
instead of
frag_color.r = color.a;
which would result in incorrect output in some cases
* vk_pipeline_cache: Apply swizzle to write masks
---------
Co-authored-by: polyproxy <47796739+polybiusproxy@users.noreply.github.com>
Previously a buffer load in a vertex shader could be treated like a ring access, dropping offen vgpr and possibly asserting during resource tracking because of mismatch between types (u32x2 vs U32), caused by inconsistencies in flags (index_enable and offset_enable)
* shader_recompiler: Remove remnants of old discard
Also constant propagate conditional discard if condition is constant
* resource_tracking_pass: Rework sharp tracking for robustness
* resource_tracking_pass: Add source dominance analysis
When reachability is not enough to prune source list, check if a source dominates all other sources
* resource_tracking_pass: Fix immediate check
How did this work before
* resource_tracking_pass: Remove unused template type
* readlane_elimination_pass: Don't add phi when all args are the same
New sharp tracking exposed some bad sources coming on sampler sharps with aniso disable pattern that also were part of readlane pattern, fix tracking by removing the unnecessary phis inbetween
* resource_tracking_pass: Allow phi in disable aniso pattern
* resource_tracking_pass: Handle not valid buffer sharp and more phi in aniso pattern