Most systems that Dolphin runs on have a page size of 4 KiB, which
conveniently matches the page size of the GameCube and Wii. But there
are also systems that use larger page sizes, notably Apple CPUs with
16 KiB page sizes. To let us create host mappings on such systems, this
commit implements combining guest mappings into host page sized mappings
wherever possible.
For this to work for a given mapping, not only do four (in the case of
16 KiB) guest mappings have to exist adjacent to each other, but the
corresponding translated addresses also have to be adjacent, and the
lowest bits of the addresses have to match. When I tested a few games,
the following percentages of guest mappings met these criteria:
Spider-Man 2: 0%-12%
Rogue Squadron 2: 39%-42%
Rogue Squadron 3: 28%-41%
So while 16 KiB systems don't get as much of a performance improvement
as 4 KiB systems, they do still get some improvement.
Removing and readding every page table mapping every time something
changes in the page table is very slow. Instead, let's generate a diff
and ask Memmap to update only the diff.
Previously we've only been setting up fastmem mappings for block address
translation, but now we also do it for page address translation. This
increases performance when games access memory using page tables, but
decreases performance when games set up page tables.
The tlbie instruction is used as an indication that the mappings need to
be updated.
There are some accuracy downsides:
* The TLB is now effectively infinitely large, which matters if games
don't use tlbie when modifying page tables.
* The R and C bits for page table entries get set pessimistically rather
than when the page is actually accessed.
No games are known to be broken by these inaccuracies, but unfortunately
the second inaccuracy causes a large performance regression in Rogue
Squadron 3. You still get the old, more accurate behavior if Enable
Write-Back Cache is on.
During my quest to rid Dolphin of the memory-unsafe GetPointer function,
I didn't realize that DSPHLE had its own set of memory access functions
that were essentially duplicating the standard ones. This commit gets
rid of them and replaces calls to them with calls to MemoryManager.
Instead of creating many 128 KiB mappings, we can create a few large
mappings. On my Windows PC, this speeds up GameCube (FakeVMEM) game boot
times by about 200 ms and Wii game boot times by about 60 ms. Loading
savestates is also faster, by about 45 ms for GameCube (FakeVMEM) games
and 5 ms for Wii games. The impact is presumably smaller on other OSes
because Windows is particularly slow at creating mappings.
To ensure memory safety, callers of GetPointer have to perform a bounds
check. But how is this bounds check supposed to be performed?
GetPointerForRange contained one implementation of a bounds check, but
it was cumbersome, and it also isn't obvious why it's correct.
To make doing the right thing easier, this commit changes GetPointer to
return a span that tells the caller how many bytes it's allowed to
access.
IOS::HLE::IOCtlVRequest::Dump sometimes tries to call GetPointerForRange
with an address of 0 and a size of 0. Address 0 is valid, but we were
mistakenly also trying to check that address 3FFFFFFF is valid, which it
isn't.
Fixes https://bugs.dolphin-emu.org/issues/13514.
Typically when someone uses GetPointer, it's because they want to read
from a range of memory. GetPointer is unsafe to use for this. While it
does check that the passed-in address is valid, it doesn't know the size
of the range that will be accessed, so it can't check that the end
address is valid. The safer alternative GetPointerForRange should be
used instead.
Note that there is still the problem of many callers not checking for
nullptr.
This is part 2 of a series of changes removing the use of GetPointer
throughout the code base. After this, VideoCommon is the one major part
of Dolphin that remains.
Previously we would only backpatch overflowed address calculations
if the overflow was 0x1000 or less. Now we can handle the full 2 GiB
of overflow in both directions.
I'm also making equivalent changes to JitArm64's code. This isn't because
it needs it – JitArm64 address calculations should never overflow – but
because I wanted to get rid of the 0x100001000 inherited from Jit64 that
makes even less sense for JitArm64 than for Jit64.
See the comment added by this commit. We were previously guarding against
overshooting in address calculations, but not against undershooting.
Perhaps someone assumed that the displacement of an x86 loadstore was
treated as unsigned?
Note: While the comment says we can undershoot by up to 2 GiB, in
practice Jit64 as it currently behaves won't actually undershoot by more
than 0x8000 if my analysis is correct. But address space is cheap, so
let's guard the full 2 GiB.
This existed in the initial megacommit (though I don't know why) as IO_SIZE. It was used in Memmap's Init() to compute totalMemSize, but I don't know if it actually did anything then. That use was removed in 2d0f714546, but the constant persisted until cc858c63b8, when it became a static variable.
This is used when fastmem isn't available. Instead of always falling
back to the C++ code in MMU.cpp, the JIT translates addresses on its
own by looking them up in a table that Dolphin constructs. This is
slower than fastmem, but faster than the old non-fastmem code.
This is primarily useful for iOS, since that's the only major platform
nowadays where you can't reliably get fastmem. I think it would make
sense to merge this feature to master despite this, since there's
nothing actually iOS-specific about the feature. It would be of use
for me when I have to disable fastmem to stop Android Studio from
constantly breaking on segfaults, for instance.
Co-authored-by: OatmealDome <julian@oatmealdome.me>
These GetPointer calls could cause crashes, in part because the
callers didn't do null checks and in part because GetPointer
inherently is unsafe to use for accesses larger than 1 byte.
SPDX standardizes how source code conveys its copyright and licensing
information. See https://spdx.github.io/spdx-spec/1-rationale/ . SPDX
tags are adopted in many large projects, including things like the Linux
kernel.
Changed several enums from Memmap.h to be static vars and implemented Get functions to query them. This seems to have boosted speed a bit in some titles? The new variables and some previously statically initialized items are now initialized via Memory::Init() and the new AddressSpace::Init(). s_ram_size_real and the new s_exram_size_real in particular are initialized from new OnionConfig values "MAIN_MEM1_SIZE" and "MAIN_MEM2_SIZE", only if "MAIN_RAM_OVERRIDE_ENABLE" is true.
GUI features have been added to Config > Advanced to adjust the new OnionConfig values.
A check has been added to State::doState to ensure savestates with memory configurations different from the current settings aren't loaded. The STATE_VERSION is now 115.
FIFO Files have been updated from version 4 to version 5, now including the MEM1 and MEM2 sizes from the time of DFF creation. FIFO Logs not using the new features (OnionConfig MAIN_RAM_OVERRIDE_ENABLE is false) are still backwards compatible. FIFO Logs that do use the new features have a MIN_LOADER_VERSION of 5. Thanks to the order of function calls, FIFO logs are able to automatically configure the new OnionConfig settings to match what is needed. This is a bit hacky, though, so I also threw in a failsafe for if the conditions that allow this to work ever go away.
I took the liberty of adding a log message to explain why the core fails to initialize if the MIN_LOADER_VERSION is too great.
Some IOS code has had the function "RAMOverrideForIOSMemoryValues" appended to it to recalculate IOS Memory Values from retail IOSes/apploaders to fit the extended memory sizes. Worry not, if MAIN_RAM_OVERRIDE_ENABLE is false, this function does absolutely nothing.
A hotfix in DolphinQt/MenuBar.cpp has been implemented for RAM Override.