* using new emulator_settings
* the default user is now just player one
* transfer install, addon dirs
* fix load custom config issue
---------
Co-authored-by: kalaposfos13 <153381648+kalaposfos13@users.noreply.github.com>
* no
no
* Adjust locking strategy
Use a separate mutex for the initial error checks + GPU unmap instead of using the reader lock. Make sure all writers lock this separate mutex, and for those that don't perform GPU unmaps, lock the writer lock immediately too.
This gets around every race condition I've envisioned so far, and hopefully does the trick?
* Clang
* Always GPU unmap
GPU unmaps have logic built-in to only run on mapped areas.
Not sure if userfaultfd would work with this, but since that's already broken anyway, I'll let reviewers decide that.
Without doing this, I'd need to do an extra pass through VMAs to find what all needs to be GPU modified before I can unmap from GPU, then perform remaining unmap work. Especially for places like MapMemory, that's a lot of code bloat.
* Fixups
* Update memory.cpp
* Rename mutex
It's really just a mutex for the sole purpose of dealing with GPU unmaps, so unmap_mutex is a bit more fitting than transition_mutex
* Optimizations
Microsoft allows you to coalesce multiple free placeholders in one VirtualFreeEx call, so we can perform the VirtualFreeEx after coalescing with neighboring regions to eliminate a VirtualFreeEx call in some situations.
* Remove unnecessary VirtualProtect call
As far as I can tell, this call wastes a bunch of time, and is completely unnecessary.
With our current codebase, simply supplying prot to MapViewOfFile3 works properly.
* Properly handle file mmaps with offsets
Pretty easy fix to perform while I'm here, so I might as well include it.
* Oops
Leftover stuff from local things + clang
* Disable tracy memory tracking
Tracy's memory tracking is built around a typical malloc/free API, so each individual alloc must correspond to a free.
Moving these to address space would fix issues on Windows, but Linux/Mac would have the same issues with our current code.
Disabling VMA merging is technically a fix, but since that's hardware-accurate behavior, I'd rather not disable it.
I'm sure there's a simple solution I'm missing, but unless other devs have a better idea of how this should be handled, the best I can do is disable it so we can keep using Tracy to trace performance.
* Update address_space.cpp
* Debug logging
Should give a decent idea of how nasty these AddressSpace calls are in games that lost perf.
* test removing thread safety
Just for testing, will revert afterwards.
* Check name before merging
Fixes a regression in Apex Legends
* Revert "test removing thread safety"
This reverts commit ab897f4b1c.
* Move mutex locks before IsValidMapping calls
These aren't thread safe, this fixes a rare race condition that I ran into with Apex Legends.
* Revert "Debug logging"
This reverts commit eb2b12a46c.
* Proper VMA splitting in ProtectBytes, SetDirectMemoryType, and NameVirtualRange
Also slight optimization by eliminating AddressSpace protect calls when requested prot matches the previous prot.
Fixes a regression in God of War: Ragnarok
* Clang
* Fixes to SetDirectMemoryType logic
Fixes some regressions in Marvel's Spider-Man that occurred with my previous commits to this PR.
* Fix Genshin Impact again
* Assert on out-of-bounds protect calls
Our page tracking code is prone to causing this.
* test mutex again
This time, remember all mutex stuff
* Revert hack
I'll work on a better way to deal with mutexes in a bit, first I'm pushing up some extra fixes
* Proper logic for checked ReleaseDirectMemory, added bounds checks
Should help some games.
* Better logging for ReleaseDirectMemory errors.
* Only perform region coalescing after all unmap operations.
A small optimization for unmapping multiple regions. Since Microsoft lets you coalesce multiple placeholders at once, we can save doing any VirtualFreeEx calls for coalescing until after we unmap everything in the requested range.
* Separate VMA creation logic into a separate method, update MapFile to use it
MapFile is technically another "emulation" of MapMemory, both should follow similar logic.
To avoid duplicating code, move shared logic to a different function that both MapMemory and MapFile can call.
This fixes memory asserts in a couple of online-only apps I have.
* Clang
* Fix TryWriteBacking
This fixes a lot of regressions that got misattributed
Co-Authored-By: TheTurtle <47210458+raphaelthegreat@users.noreply.github.com>
* Fix again
Fixes device lost crashes with some games after my last commit.
* Oops
* Mutex cleanup
Avoided changing anything in MapMemory, UnmapMemory, PoolCommit, or PoolDecommit since those all need a little extra granularity to prevent GPU deadlocking.
Everything else now uses standard library locks to make things a little simpler.
* Swap MapMemory and PoolCommit to use scoped lock
GPU maps are safe, so this is fine. Unmaps are the primary issue.
---------
Co-authored-by: TheTurtle <47210458+raphaelthegreat@users.noreply.github.com>
* Initial work
* Bug fixing
deadlocks and broken unmaps
* Fix more bugs
broken memory pools
* More bug fixing
Still plenty more to fix though
* Even more bug fixing
Finally got Final Fantasy XV back to running, haven't found anymore bugs yet.
* More bugfixing
* Update memory.cpp
* Rewrite start
* Fix for oversized unmaps
* Oops
* Update address_space.cpp
* Clang
* Mac fix?
* Track VMA physical areas based on start in VMA
Allows me to simplify some logic, and should (finally) allow merging VMAs in memory code.
* Merge VMAs, fix some bugs
Finally possible thanks to address space + phys tracking changes
* Clang
* Oops
* Oops2
* Oops3
* Bugfixing
* SDK check for coalescing
Just to rule out any issues from games that wouldn't see coalescing in the first place.
* More ReleaseDirectMemory fixes
I really suck at logic some days
* Merge physical areas within VMAs
In games that perform a lot of similar mappings, you can wind up with 1000+ phys areas in one vma.
This should reduce some of the overhead that might cause.
* Hopefully fix Mac compile
Why must their uint64_t be different?
* Mac pt.2
Oops
* Earlier initialization of elf info.
Everything used for elf info initialization comes from the param.sfo, so we can initialize this earlier to have this information accessible during memory init.
* Extract compiled SDK version from pubtoolinfo string
Up until now, we've been using the game's reported "firmware version" as our compiled SDK version. This behavior is inaccurate, and is something that has come up in my hardware tests before.
For the actual compiled SDK version, we should use the SDK version in the PUBTOOLINFO string of the param.sfo, only falling back on the firmware version when that the sdk_ver component isn't present.
* Store compiled SDK version in ElfInfo
* Limit address space for compiled SDK version at or above FW 3
Sony placed a hard cap at 0xfc00000000, with a slight extension for stack mappings. For now, though stack mappings aren't implemented, there's no harm in keeping a slightly extended address space (since this cap is lower than our old user max).
Limiting the max through address space is necessary for Windows due to performance issues, in the future I plan to properly implement checks in memory manager code to properly handle this behavior for all platforms.
* Use compiled SDK version for sceKernelGetCompiledSdkVersion
I think this is pretty self explanatory.
* Log SDK version
Since this value is what most internal firmware version checks are against, logging the value will help with debugging.
* Update address_space.cpp
* Update emulator.cpp
* Backwards compatible logging
Because that's apparently an issue now
For now, I've included up to Windows 11 22H2 in the workaround.
I've only personally seen reports of issues on Windows 11 21H2, but better safe than sorry (considering Windows 10 22H2 has issues).
* SearchFree adjustments
* Robust address validation
I've adjusted IsValidAddress to take in a size, and check whether the whole range is contained in vma map.
If no size is provided, the function reverts to the old form of address validation instead.
* Map around gaps
As is, this should work mostly.
Only remaining issue is adding logic to pass the "mapped regions" to the guest vma map (and make such logic cross-platform).
* Initialize vma_map using gaps
This should allow memory code to catch any issues from address space gaps, and prevent non-fixed mappings from jumping to a location that isn't actually available.
* Clang
* Fix compile
* Clang
* Fix compile again
* Set system_managed_base and system_managed_size based on
Many places in our code use system_managed_base as the minimum mappable address, ensure this fact remains the same on Windows to prevent potential bugs.
* Reduce address validation in SearchFree
Allows SearchFree to function when a certain Windows GPU driver goes and reserves the whole system managed area.
Since SearchFree is only called on flexible addresses, allowing this particular case, where addresses are in bounds, but there's not enough space to map, should be safe enough.
* Bump address space size further
To handle Madden NFL 16 (and any games like it)
* More thorough logging of available memory regions
Should help with spotting weirdness.
* Formatting fixes
* Clang
* Slight reduction of user space
Still large enough to handle EA's shenanigans, but small enough that Linux doesn't break.
* Assert on VirtualQuery failure
* Extra debugging information
* Further reduce user space
This will unfix most of EA's titles, but UFC will still work.
Older windows versions support the high addresses, but trying to actually use them causes significant performance issues.
* Extra debugging info
Just in case other testers still run into issues.
* Remove debug logging
* Revert user space increases
Technically this constant is still higher than before, but weird side effects of our old logic resulted in a max address somewhere around this in main.
* address_space: Support expanded virtual memory space on macOS.
Co-Authored-By: squidbus <175574877+squidbus@users.noreply.github.com>
* Move address space constants to address_space.cpp
This ensures that all code must use the calculated address space memory values instead of the constants, since the calculated values can differ based on the platform.
This does require slight modification to thread state and gnmdriver code, since both were already using these constants directly.
* Workaround Windows 10 limitations
If a Windows 10 device is detected, use a lower value for USER_MAX to prevent system-wide hangs in VirtualAlloc2 calls.
* Fix compile for Windows-Qt
* Move tessellation_factors_ring_addr initialization to sceGnmGetTheTessellationFactorRingBufferBaseAddress
* Set image base address on Windows
This seems to work fine on Windows 11, needs testing from Windows 10 due to the previously discussed bugs.
* Set Linux executable base to 0x700000000000
This allows Linux to map the full user space without any workarounds.
Co-Authored-By: Marcin Mikołajczyk <2052578+mikusp@users.noreply.github.com>
* Basic formatting changes
* Reduce USER_MAX on Linux
Seems like finding a reliable way to move shadPS4's position in memory is difficult, for now limit the user size so we aren't trying to overwrite ourselves.
* Move memory and address_space variables.
---------
Co-authored-by: squidbus <175574877+squidbus@users.noreply.github.com>
Co-authored-by: Marcin Mikołajczyk <2052578+mikusp@users.noreply.github.com>
* Add configurable extra memory
* lowercase getter and setter
* Refactor memory setup to configure maximum memory limits at runtime
* sir clang offnir, the all-formatting
* Correctly update BackingSize on W*ndows too
* small format change
* remove total_memory_to_use from the header
* i have no idea how to name this commit
"addressing review comments" is a good name i guess
* Do not include extraDmem in the general config
* Fix flag handling on Windows
Fixes a weird homebrew kalaposfos made
* Fix backing protects
Windows requires that protections on areas committed through MapViewOfFile functions are less than the original mapping.
The best way to make sure everything works is to VirtualProtect the code area with the requested protection instead of applying prot directly.
* Fix error code for sceKernelMapDirectMemory2
Real hardware returns EINVAL instead of EACCES here
* Fix prot setting in ProtectBytes
* Handle some extra protection-related edge cases.
Real hardware treats read and write as separate perms, but appends read if you call with write-only (this is visible in VirtualQuery calls)
Additionally, execute permissions are ignored when protecting dmem mappings.
* Properly handle exec permission behavior for memory pools
Calling sceKernelMemoryPoolCommit with executable permissions returns EINVAL, mprotect on pooled mappings ignores the exec protection.
* Clang
* Allow execution protection for direct memory
Further hardware tests show that the dmem area is actually executable, this permission is just hidden from the end user.
* Clang
* More descriptive assert message
* Align address and size in mmap
Like most POSIX functions, mmap aligns address down to the nearest page boundary, and aligns address up to the nearest page boundary.
Since mmap is the only memory mapping function that doesn't error early on misaligned length or size, handle the alignment in the libkernel code.
* Clang
* Fix valid flags
After changing the value, games that specify just CpuWrite would hit the error return.
* Fix prot conversion functions
The True(bool) function returns true whenever value is greater than 0. While this rarely manifested before because of our wrongly defined CpuReadWrite prot, it's now causing trouble with the corrected values.
Technically this could've also caused trouble with games mapping GpuRead permissions, but that seems to be a rare enough use case that I guess it never happened?
I've also added a warning for the case where `write & !read`, since we don't properly handle write-only permissions, and I'm not entirely sure what it would take to deal with that.
* Fix some lingering dmem issues
ReleaseDirectMemory was always unmapping with the size parameter, which could cause it to unmap too much. Since multiple mappings can reference the same dmem area, I've calculated how much of each VMA we're supposed to unmap.
Additionally, I've adjusted the logic for carving out the free dmem area to properly work if ReleaseDirectMemory is called over multiple dmem areas.
Finally, I've patched a bug with my code in UnmapMemory.
* Remove mapped dmem type
Since physical addresses can be mapped multiple times, tracking mapped pages is not necessary.
This also allows me to significantly simplify the MapMemory physical address validation logic.
* Proper implementation for sceKernelMtypeprotect
I've rewritten SetDirectMemoryType to use virtual addresses instead of physical addresses, allowing it to be used in sceKernelMtypeprotect.
To accommodate this change, I've also moved address and size alignment out of MemoryManager::Protect
* Apply memory type in sceKernelMemoryPoolCommit
* Organization
Some potentially important missing mutexes, removed some unnecessary mutexes, moved some mutexes after early error returns, and updated copyright dates
* Iterator logic cleanup
Missing end check in ClampRangeSize, and adjusted VirtualQuery and DirectMemoryQuery.
* Clang
* Adjustments
* Properly account for behavior differences in MapDirectMemory2
Undid the changes to direct memory areas, added more robust logic for changing dma types, and fixed DirectMemoryQuery to return hardware-accurate direct memory information in cases where dmas split here, but not on real hardware.
I've also changed MapMemory's is_exec flag to a validate_dmem flag, used to handle alternate behavior in MapDirectMemory2. is_exec is now determined by the use of MemoryProt::CpuExec instead.
* Clang
* Add execute permissions to physical backing
Needed for executable mappings to work properly on Windows, fixes regression in RE2 with prior commit.
* Minor variable cleanup
* Update memory.h
* Prohibit direct memory mappings with exec protections
Did a quick hardware test to confirm, only seems to be prohibited for dmem mappings though.
* Update memory.cpp
* Fix isDevKit
Previously, isDevKit could increase the physical memory used above the length we reserve in the backing file.
* Physical backing for flexible allocations
I took the simple approach here, creating a separate map for flexible allocations and pretty much just copying over the logic used in the direct memory map.
* Various fixups
* Fix mistake #1
* Assert + clang
* Fix 2
* Clang
* Fix CanMergeWith
Validate physical base for flexible mappings
* Clang
* Physical backing for pooled memory
* Allow VMA splitting in NameVirtualRange
This should be safe, since with the changes in this PR, the only issues that come from discrepancies between address space and vma_map are issues related to vmas being larger than address space mappings. NameVirtualRange will only ever shrink VMAs by naming part of one.
* Fix
* Fix NameVirtualRange
* Revert NameVirtualRange changes
Seems like it doesn't play nice for Windows
* Clean up isDevKit logic
We already log both isNeo and isDevKit in Emulator::Run, so the additional logging in MemoryManager::SetupMemoryRegions isn't really necessary.
I've also added a separate constant for non-pro devkit memory, as suggested.
Finally I've changed a couple constants to use the ORBIS prefix we generally follow here, instead of the SCE prefix.
* Erase flexible memory contents from physical memory on unmap
Flexible memory should not be preserved on unmap, so erase flexible contents from the physical backing when unmapping.
* Expand flexible memory map
Some games will end up fragmenting the physical backing space used for flexible memory. To reduce the frequency of this happening under normal circumstances, allocate the entirety of the remaining physical backing to the flexible memory map.
This is effectively a workaround to the problem, but at the moment I think this should suffice.
* Clang
Should fix some `Region coalescing failed: Attempt to access invalid address.` crashes.
Co-authored-by: TheTurtle <47210458+raphaelthegreat@users.noreply.github.com>
* libkernel: Cleanup some function places
* kernel: Refactor thread functions
* kernel: It builds
* kernel: Fix a bunch of bugs, kernel thread heap
* kernel: File cleanup pt1
* File cleanup pt2
* File cleanup pt3
* File cleanup pt4
* kernel: Add missing funcs
* kernel: Add basic exceptions for linux
* gnmdriver: Add workload functions
* kernel: Fix new pthreads code on macOS. (#1441)
* kernel: Downgrade edeadlk to log
* gnmdriver: Add sceGnmSubmitCommandBuffersForWorkload
* exception: Add context register population for macOS. (#1444)
* kernel: Pthread rewrite touchups for Windows
* kernel: Multiplatform thread implementation
* mutex: Remove spamming log
* pthread_spec: Make assert into a log
* pthread_spec: Zero initialize array
* Attempt to fix non-Windows builds
* hotfix: change incorrect NID for scePthreadAttrSetaffinity
* scePthreadAttrSetaffinity implementation
* Attempt to fix Linux
* windows: Address a bunch of address space problems
* address_space: Fix unmap of region surrounded by placeholders
* libs: Reduce logging
* pthread: Implement condvar with waitable atomics and sleepqueue
* sleepq: Separate and make faster
* time: Remove delay execution
* Causes high cpu usage in Tohou Luna Nights
* kernel: Cleanup files again
* pthread: Add missing include
* semaphore: Use binary_semaphore instead of condvar
* Seems more reliable
* libraries/sysmodule: log module on `sceSysmoduleIsLoaded`
* libraries/kernel: implement `scePthreadSetPrio`
---------
Co-authored-by: squidbus <175574877+squidbus@users.noreply.github.com>
Co-authored-by: Daniel R. <47796739+polybiusproxy@users.noreply.github.com>
* Implement `sceKernelFtruncate` and `sceKernelUnlink`.
* Remove unused variable.
* Implement `sceKernelReserveVirtualRange`, misc fixes
* Fix `sceKernelReserveVirtualRange`.
* Add TODO on reserve
* Replace comment with assert.
* Add missing copyright header
* Add `UNREACHABLE` for `IOFile::Unlink`.
* Move NT API initialization out of the header
* Fix bug where files were always mapped as read only.
* `clang-format`
* tls: Actaully fix TLS on linux
* emulator: Remove nptoolkit
* Not quite supported yet, makes games misbehave
* kernel: Back to SCHED_OTHER
* kernel: Remove unused signal function
* address_space: Fix Unmap call on linux
* clang format
* video_core: Add a few missed things
* libkernel: More proper memory mapped files
* memory: Fix tessellation buffer mapping
* Cuphead work
* sceKernelPollSema fix
* clang format
* fixed ngs2 lle loading and rtc lib
* draft pthreads keys implementation
* fixed return codes
* return error code if sceKernelLoadStartModule module is invalid
* re-enabled system modules and disable debug in libs.h
* Improve linux support
* fix windows build
* kernel: Rework keys
---------
Co-authored-by: georgemoralis <giorgosmrls@gmail.com>
* core: Split module code from linker
* linker: Properly implement thread local storage
* kernel: Fix a few memory functions
* kernel: Implement module loading
* Now it's easy to do anyway with new module rework