Commit Graph

2569 Commits

Author SHA1 Message Date
Mai M
c227023375
Merge pull request #10050 from JosJuice/jitarm64-disable-psq-x
JitArm64: Disable indexed paired loadstore instructions
2021-08-22 19:11:16 -04:00
JosJuice
f70ddebb40 JitArm64: Disable indexed paired loadstore instructions
Since the merge of b24b79e, we've gotten reports that the
following games are broken on JitArm64:

* Sonic Heroes
* The SpongeBob SquarePants Movie
* Astérix & Obélix XXL
* The Incredibles: Rise of the Underminer

Disabling the register cache avoids the issue, so the cause
of the bug might not actually have anything to do with the
newly implemented instructions. Nevertheless, I don't want
to ship a beta with this problem present, so I would like to
disable these instructions for the time being.
2021-08-22 22:16:02 +02:00
Admiral H. Curtiss
f5cd17925a PowerPC: Fix for calling InvalidateICacheLines() with a count of 1 causing a (harmless) second invalidation. 2021-08-19 22:54:34 +02:00
Mai M
6863b7ae9e
Merge pull request #10036 from JosJuice/jitarm64-psq-x
JitArm64: Implement indexed paired loadstore instructions
2021-08-17 23:00:57 -04:00
JosJuice
b24b79e373 JitArm64: Implement indexed paired loadstore instructions
After writing 23b81ef without realizing that we hadn't actually
implemented the indexed paired loadstore instructions yet,
I am now implementing them.
2021-08-17 11:29:38 +02:00
JMC47
a36855c983
Merge pull request #9818 from JosJuice/jits-cdts-double
Jits: Don't use fast double-to-single when input is double precision
2021-08-17 05:21:08 -04:00
JMC47
d162015112
Merge pull request #10007 from AdmiralCurtiss/x64-dcbx-in-loop
Jit64: dcbx loop detection for improved performance when invalidating large memory regions.
2021-08-16 21:27:16 -04:00
Admiral H. Curtiss
8b2f5d5006 Jit64: Optimize dcbx being called in a loop over a large memory region. 2021-08-17 02:38:00 +02:00
JosJuice
23b81ef495 JitArm64: Fix paired loadstore instruction decoding
One of those fun "how did this ever work" bugs.
2021-08-16 22:08:30 +02:00
Admiral H. Curtiss
df1e59409b PowerPC: Handle translation if range given to InvalidateICache spans multiple BAT or Page Table pages. 2021-08-13 21:23:12 +02:00
Admiral H. Curtiss
57037a69f9 PowerPC: Call InvalidateICacheLine() in InstructionCache::Invalidate() for clarity. 2021-08-12 19:27:25 +02:00
Admiral H. Curtiss
4afbd87188 PowerPC: Fast path in InvalidateICache is only valid if the address is 32-byte aligned. 2021-08-12 19:27:25 +02:00
Admiral H. Curtiss
116d1361d5 PowerPC: Let callers of JitCache_TranslateAddress determine whether the address was translated. 2021-08-09 18:25:35 +02:00
Admiral H. Curtiss
95fbd09691 PowerPC: Update variable name conventions and const-ness around calls to JitCache_TranslateAddress(). 2021-08-09 01:25:04 +02:00
Admiral H. Curtiss
3296d2fc1f PowerPC: Reorder members of TranslateAddressResult to reduce struct size. 2021-08-09 01:25:04 +02:00
Admiral H. Curtiss
e3a784ffba PowerPC: Convert enum in TranslateAddressResult to enum class. 2021-08-07 00:07:46 +02:00
JosJuice
37b218b344 PowerPC: Fix bad copypaste in LookupTLBPageAddress
Fixes https://bugs.dolphin-emu.org/issues/12611.
2021-08-06 18:03:54 +02:00
JosJuice
125af42e4b Jit: Re-add dcbx masking
When making 92d1d60, I checked whether the ~0x1f masking in dcbx
actually was necessary. I came to the conclusion that it wasn't,
so I removed it. However, I hadn't checked the second half of
InvalidateICache closely enough - the masking is actually needed.

This commit re-adds the masking, but this time in C++ code instead
of in jitted code in order to save icache. Though I suppose the
difference doesn't matter all that much, since this is in farcode
and all...

Hopefully fixes https://bugs.dolphin-emu.org/issues/12612.
2021-08-06 14:55:07 +02:00
JosJuice
e753455abb JitArm64: Fix W8 slowmem store
Regression from 12629be.
2021-08-05 10:57:41 +02:00
Tilka
942545b7fc
Merge pull request #9964 from JosJuice/uncached-unaligned-writes
PowerPC: Implement broken masking for uncached unaligned writes
2021-08-04 22:23:07 +01:00
JosJuice
f333c0949f PowerPC: Implement PI interrupt for uncached unaligned writes 2021-08-04 23:09:43 +02:00
JosJuice
543ed8a97c PowerPC: Implement broken masking for uncached unaligned writes
This implements the behavior described in
https://bugs.dolphin-emu.org/issues/12565.

Thank you to eigenform, delroth, phire, marcan, segher, and Extrems
for all helping in one way or another with the efforts to reverse
engineer this behavior, and to Rylie for reporting the issue.
2021-08-04 23:04:02 +02:00
JosJuice
12629beff8 JitArm64: Call swap variants of memory write functions
Write_U16_Swap leaves the upper 32 bits alone. Reimplementing this
correctly in the JIT would require more than one instruction,
so let's just call Write_U16_Swap instead, like Jit64 does.
2021-08-04 23:04:02 +02:00
JosJuice
ecbce0a204 PowerPC: Pass on full 32-bit register contents for 8/16-bit writes 2021-08-04 23:04:02 +02:00
JosJuice
c56526d5f8 PowerPC: Keep track of write-through/cache-inhibited
One of the following commits will add emulation of a quirk
that only happens when writing to memory which is mapped as
write-through or cache-inhibited, so let's keep track of
which memory is mapped in this way.
2021-08-04 23:04:02 +02:00
Admiral H. Curtiss
0e76dabbbb Jit64: Always pass effective address to InvalidateICache() in dcbx. 2021-08-04 22:17:42 +02:00
JosJuice
5bd188d40d Jit64: Fix BATAddressLookup bit test
BT sets the carry flag, not the zero flag.
2021-08-03 17:32:04 +02:00
Mai M
627832355e
Merge pull request #9973 from JosJuice/jit-fma-negation-order
Jit: Use accurate negation order for FMA instructions
2021-07-31 20:18:49 -04:00
Mai M
7c365349ee
Merge pull request #9977 from JosJuice/jitarm64-mtfsfx
JitArm64: Implement mtfsfx
2021-07-31 17:54:48 -04:00
JosJuice
a90b0a1c93 JitArm64: Implement mtfsfx
The sixth and final part of implementing the FPSCR system register
instructions.
2021-07-31 23:50:20 +02:00
JosJuice
5c5de35568 Jit: Use ibat_table for dcbf/dcbi/dcbst address check
Minor mistake in 92d1d60. We should be using ibat_table instead
of dbat_table, since we're dealing with invalidating icache.
2021-07-31 14:30:03 +02:00
Mai M
bef1fdb4cb
Merge pull request #9974 from JosJuice/jitarm64-mtfsfix
JitArm64: Implement mtfsfix
2021-07-31 03:56:51 -04:00
Léo Lam
a208ff5aab
Merge pull request #9957 from JosJuice/dcbx-faster
Jit: Perform BAT lookup in dcbf/dcbi/dcbst
2021-07-31 03:27:24 +02:00
JosJuice
0e62dac4bb JitArm64: Implement mtfsfix
Part 5 of 6 of implementing the FPSCR system register instructions.
2021-07-30 10:24:41 +02:00
JosJuice
08b358a829 Jit64: Fix minor fmaddXX inefficiencies 2021-07-29 23:34:20 +02:00
JosJuice
93e636abc3 Jit: Use accurate negation order for FMA instructions
It was believed that this only mattered when the rounding mode was
set to round to infinity, which games generally don't do, but it
can also affect the sign of the output when the inputs are all zero.
2021-07-29 23:33:35 +02:00
Mai M
c86c02e46b
Merge pull request #9960 from JosJuice/jitarm64-mtfsb1x
JitArm64: Implement mtfsb1x
2021-07-28 20:46:09 -04:00
JosJuice
3bb4a4e344 Jit64: Fix fmaddXX with accurate NaNs
So it turns out you have to pass XMM0 as the clobber register
to HandleNaNs, because HandleNaNs uses BLENDVPD and BLENDVPD
implicitly uses XMM0, and nobody noticed when I broke this in
2c38d64 because nobody plays the one game that needs accurate NaNs.
2021-07-28 23:03:03 +02:00
JosJuice
ca55d599e8 Jit: Mark ValidBlockBitSet::Test as const 2021-07-27 11:11:30 +02:00
JosJuice
c9a4021537 JitArm64: Implement mtfsb1x
Part 4 of implementing the FPSCR system register instructions.
2021-07-25 19:18:43 +02:00
JosJuice
92d1d60ff1 Jit: Perform BAT lookup in dcbf/dcbi/dcbst
When 66b992c fixed https://bugs.dolphin-emu.org/issues/12133,
it did so by removing the broken address calculation entirely and
always using the slow path. This caused a performance regression,
https://bugs.dolphin-emu.org/issues/12477.

This commit instead replaces the broken address calculation with
a BAT lookup. If the BAT lookup succeeds, we can use the old fast
path. Otherwise we use the slow path.

Intends to improve https://bugs.dolphin-emu.org/issues/12477.
2021-07-25 15:15:15 +02:00
JosJuice
b84a0704cd Revert "Jit: Fix correctness issue in dcbf/dcbi/dcbst"
This reverts commit 66b992cfe4.

A new (additional) correctness issue was revealed in the old
AArch64 code when applying it on top of modern JitArm64:
LSR was being used when LSRV was intended. This commit uses LSRV.
2021-07-25 15:13:57 +02:00
Mai M
f380c23fda
Merge pull request #9890 from JosJuice/jitarm64-mtfsb0x
JitArm64: Implement mtfsb0x
2021-07-22 21:41:01 -04:00
Markus Wick
5af5656383
Merge pull request #9932 from JosJuice/jitarm64-dcbz-backpatch
JitArm64: Fix dcbz backpatch
2021-07-23 01:58:59 +02:00
JosJuice
fdcea8566d JitArm64: Improve Arm64FPRCache::GetCallerSavedUsed
If we're only using the lower 64 bits of a callee-saved
register, GetCallerSavedUsed can return false for it.
2021-07-22 10:42:44 +02:00
JosJuice
1df3456267 JitArm64: Remove a comment in dcbz implementation
This implementation is pretty efficient in my opinion. And "As
long as we aren't falling back to interpreter we're winning a lot"
applies to basically every instruction to some degree anyway.
2021-07-21 19:24:41 +02:00
JosJuice
d91d6fcdc5 JitArm64: Fix dcbz backpatch
The dcbz instruction needs to lock W30 so that the slowmem code will
push and pop it when calling into C++. Also, the slowmem code expects
that the address is present in W0, so replace the use of W0 as a scratch
register in the fastmem code with the now locked W30.
2021-07-21 19:19:52 +02:00
JosJuice
302b47f5e6 JitArm64: Add temp reg parameter to Arm64RegCache::Flush
We currently have a bug when calling Arm64GPRCache::Flush with
FlushMode::MaintainState, zero free host registers, and at least
one guest register containing an immediate. We end up grabbing
a temporary register from the register cache in order to be
able to write the immediate to memory, but grabbing a temporary
register when there are zero free registers causes the least
recently used register to be flushed in a way which does not
maintain the state of the register cache.

To get around this, require callers to pass in a temporary
register in the GPR MaintainState case. In other cases,
passing in a temporary register is not required but can help
avoid spilling a register (if the caller already had a
temporary register at hand anyway, which in particular will
be the case in my upcoming memcheck pull request).
2021-07-21 16:28:19 +02:00
CrystalGamma
c991904e04 PowerPC: Add reservation monitor to save state 2021-07-21 12:14:07 +02:00
CrystalGamma
d763d693e8 PowerPC: Move lwarx/stwcxd. reservation into PowerPCState 2021-07-21 12:12:19 +02:00