#
8a36bbcf |
| 19-Oct-2023 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: add X86_SPECIALs for MOVSX and MOVZX
Usually the registers are just moved into s->T0 without much care for their operand size. However, in some cases we can get more efficient code if
target/i386: add X86_SPECIALs for MOVSX and MOVZX
Usually the registers are just moved into s->T0 without much care for their operand size. However, in some cases we can get more efficient code if the operand fetching logic syncs with the emission function on what is nicer.
All the current uses are mostly demonstrative and only reduce the code in the emission functions, because the instructions do not support memory operands. However the logic is generic and applies to several more instructions such as MOVSXD (aka movslq), one-byte shift instructions, multiplications, XLAT, and indirect calls/jumps.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
5baf5641 |
| 19-Oct-2023 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: rename zext0/zext2 and make them closer to the manual
X86_SPECIAL_ZExtOp0 and X86_SPECIAL_ZExtOp2 are poorly named; they are a hack that is needed by scalar insertion and extraction ins
target/i386: rename zext0/zext2 and make them closer to the manual
X86_SPECIAL_ZExtOp0 and X86_SPECIAL_ZExtOp2 are poorly named; they are a hack that is needed by scalar insertion and extraction instructions, and not really related to zero extension: for PEXTR the zero extension is done by the generation functions, for PINSR the high bits are not used at all and in fact are *not* filled with zeroes when loaded into s->T1.
Rename the values to match the effect described in the manual, and explain better in the comments.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
b609db94 |
| 19-Oct-2023 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: reimplement check for validity of LOCK prefix
The previous check erroneously allowed CMP to be modified with LOCK. Instead, tag explicitly the instructions that do support LOCK.
Acked-
target/i386: reimplement check for validity of LOCK prefix
The previous check erroneously allowed CMP to be modified with LOCK. Instead, tag explicitly the instructions that do support LOCK.
Acked-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
3c95fd4e |
| 27-Oct-2023 |
Stefan Hajnoczi <stefanha@redhat.com> |
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* target/i386: implement SHA instructions * target/i386: check CPUID_PAE to determine 36 bit processor address space * target
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* target/i386: implement SHA instructions * target/i386: check CPUID_PAE to determine 36 bit processor address space * target/i386: improve validation of AVX instructions * require Linux 4.4 for KVM
# -----BEGIN PGP SIGNATURE----- # # iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmU5Vi4UHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroNVbwf9HCx+C0MITWjQ+rEkmtiy/Cn+ZsF1 # gbaL31ahymEU3vUcKZX8Z4ycmBFw9b3yvotTVR38lE9p+sKtSaGKUGV0btpS7oBB # y8IfnVmg5X1j4PtyDxFlLD48qg//2kVgJ6wtaDTSAkgQMOPM9UgHgQD+Ks7kOo8v # rReL46XVPEZTWt3syX0y87mFinjK2hXGqIdsnJ1uT614BAVVIrmO6aFNNN1FlsRb # NGRZevJTfEWjWVfWOhUiZdUGDz74sOXdshZX/teadeDJLtWaw0uytMN9qoTN33h/ # OsdR2fO7h8ZknGEc2F1fJEVh4sOfO4fGYAAJGzHP9AjUDV1IVVYELb79dg== # =WYTo # -----END PGP SIGNATURE----- # gpg: Signature made Thu 26 Oct 2023 02:53:50 JST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (24 commits) kvm: i8254: require KVM_CAP_PIT2 and KVM_CAP_PIT_STATE2 kvm: i386: require KVM_CAP_SET_IDENTITY_MAP_ADDR kvm: i386: require KVM_CAP_ADJUST_CLOCK kvm: i386: require KVM_CAP_MCE kvm: i386: require KVM_CAP_SET_VCPU_EVENTS and KVM_CAP_X86_ROBUST_SINGLESTEP kvm: i386: require KVM_CAP_XSAVE kvm: i386: require KVM_CAP_DEBUGREGS kvm: i386: move KVM_CAP_IRQ_ROUTING detection to kvm_arch_required_capabilities kvm: unify listeners for PIO address space kvm: require KVM_CAP_IOEVENTFD and KVM_CAP_IOEVENTFD_ANY_LENGTH kvm: assume that many ioeventfds can be created kvm: drop reference to KVM_CAP_PCI_2_3 kvm: require KVM_IRQFD for kernel irqchip kvm: require KVM_IRQFD for kernel irqchip kvm: require KVM_CAP_SIGNAL_MSI kvm: require KVM_CAP_INTERNAL_ERROR_DATA kvm: remove unnecessary stub target/i386: check CPUID_PAE to determine 36 bit processor address space target/i386: validate VEX.W for AVX instructions target/i386: group common checks in the decoding phase ...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
show more ...
|
#
e000687f |
| 09-Oct-2023 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: validate VEX.W for AVX instructions
Instructions in VEX exception class 6 generally look at the value of VEX.W. Note that the manual places some instructions incorrectly in class 4, fo
target/i386: validate VEX.W for AVX instructions
Instructions in VEX exception class 6 generally look at the value of VEX.W. Note that the manual places some instructions incorrectly in class 4, for example VPERMQ which has no non-VEX encoding and no legacy SSE analogue. AMD does a mess of its own, as documented in the comment that this patch adds.
Most of them are checked for VEX.W=0, and are listed in the manual (though with an omission) in table 2-16; VPERMQ and VPERMPD check for VEX.W=1, which is only listed in the instruction description. Others, such as VPSRLV, VPSLLV and the FMA3 instructions, use VEX.W to switch between a 32-bit and 64-bit operation.
Fix more of the class 4/class 6 mismatches, and implement the check for VEX.W in TCG.
Acked-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
183e6679 |
| 09-Oct-2023 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: group common checks in the decoding phase
In preparation for adding more similar checks, move the VEX.L=0 check and several X86_SPECIAL_* checks to a new field, where each bit represent
target/i386: group common checks in the decoding phase
In preparation for adding more similar checks, move the VEX.L=0 check and several X86_SPECIAL_* checks to a new field, where each bit represent a common check on unused bits, or a restriction on the processor mode.
Likewise, many SVM intercepts can be checked during the decoding phase, the main exception being the selective CR0 write, MSR and IOIO intercepts.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
e582b629 |
| 10-Oct-2023 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: implement SHA instructions
The implementation was validated with OpenSSL and with the test vectors in https://github.com/rust-lang/stdarch/blob/master/crates/core_arch/src/x86/sha.rs.
target/i386: implement SHA instructions
The implementation was validated with OpenSSL and with the test vectors in https://github.com/rust-lang/stdarch/blob/master/crates/core_arch/src/x86/sha.rs.
The instructions provide a ~25% improvement on hashing a 64 MiB file: runtime goes down from 1.8 seconds to 1.4 seconds; instruction count on the host goes down from 5.8 billion to 4.8 billion with slightly better IPC too. Good job Intel. ;)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
03a3a62f |
| 07-Sep-2023 |
Stefan Hajnoczi <stefanha@redhat.com> |
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* only build util/async-teardown.c when system build is requested * target/i386: fix BQL handling of the legacy FERR interrup
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* only build util/async-teardown.c when system build is requested * target/i386: fix BQL handling of the legacy FERR interrupts * target/i386: fix memory operand size for CVTPS2PD * target/i386: Add support for AMX-COMPLEX in CPUID enumeration * compile plugins on Darwin * configure and meson cleanups * drop mkvenv support for Python 3.7 and Debian10 * add wrap file for libblkio * tweak KVM stubs
# -----BEGIN PGP SIGNATURE----- # # iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmT5t6UUHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroMmjwf+MpvVuq+nn+3PqGUXgnzJx5ccA5ne # O9Xy8+1GdlQPzBw/tPovxXDSKn3HQtBfxObn2CCE1tu/4uHWpBA1Vksn++NHdUf2 # P0yoHxGskJu5iYYTtIcNw5cH2i+AizdiXuEjhfNjqD5Y234cFoHnUApt9e3zBvVO # cwGD7WpPuSb4g38hHkV6nKcx72o7b4ejDToqUVZJ2N+RkddSqB03fSdrOru0hR7x # V+lay0DYdFszNDFm05LJzfDbcrHuSryGA91wtty7Fzj6QhR/HBHQCUZJxMB5PI7F # Zy4Zdpu60zxtSxUqeKgIi7UhNFgMcax2Hf9QEqdc/B4ARoBbboh4q4u8kQ== # =dH7/ # -----END PGP SIGNATURE----- # gpg: Signature made Thu 07 Sep 2023 07:44:37 EDT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (51 commits) docs/system/replay: do not show removed command line option subprojects: add wrap file for libblkio sysemu/kvm: Restrict kvm_pc_setup_irq_routing() to x86 targets sysemu/kvm: Restrict kvm_has_pit_state2() to x86 targets sysemu/kvm: Restrict kvm_get_apic_state() to x86 targets sysemu/kvm: Restrict kvm_arch_get_supported_cpuid/msr() to x86 targets target/i386: Restrict declarations specific to CONFIG_KVM target/i386: Allow elision of kvm_hv_vpindex_settable() target/i386: Allow elision of kvm_enable_x2apic() target/i386: Remove unused KVM stubs target/i386/cpu-sysemu: Inline kvm_apic_in_kernel() target/i386/helper: Restrict KVM declarations to system emulation hw/i386/fw_cfg: Include missing 'cpu.h' header hw/i386/pc: Include missing 'cpu.h' header hw/i386/pc: Include missing 'sysemu/tcg.h' header Revert "mkvenv: work around broken pip installations on Debian 10" mkvenv: assume presence of importlib.metadata Python: Drop support for Python 3.7 configure: remove dead code meson: list leftover CONFIG_* symbols ...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
show more ...
|
#
a48b2697 |
| 29-Aug-2023 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: generalize operand size "ph" for use in CVTPS2PD
CVTPS2PD only loads a half-register for memory, like CVTPH2PS. It can reuse the "ph" packed half-precision size to load a half-register
target/i386: generalize operand size "ph" for use in CVTPS2PD
CVTPS2PD only loads a half-register for memory, like CVTPH2PS. It can reuse the "ph" packed half-precision size to load a half-register, but rename it to "xh" because it is now a variation of "x" (it is not used only for half-precision values).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
e52d57c8 |
| 24-Oct-2022 |
Stefan Hajnoczi <stefanha@redhat.com> |
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* target/i386: new decoder bugfix * target/i386: complete x86-v3 support for TCG
# -----BEGIN PGP SIGNATURE----- # # iQFHBAA
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* target/i386: new decoder bugfix * target/i386: complete x86-v3 support for TCG
# -----BEGIN PGP SIGNATURE----- # # iQFHBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmNTlqQUHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroOQNQf430MHbrtN9WKKiXv3684XxmcnoRqg # PHmaGg2SKp7UB+hI2FMYgCZWOl5s3cGTHtwX8byFCttmE4kI7HJR7IouW6znm57j # 7QVx2TJXIZgqSYcfYzfLu46yS6pNqJUA+mBv5In3Vqt4ZQT2szefVBg6BzmuF6lT # HXbu/llc3iVfW4SNLJOABXzKNbPacmmpmLjoporfwOHwHjv4iikuXNUOZ84FFL11 # 2tkdcff282q00IRgHm1lSyiRiqh+kAxzSDanMjOZbphBiE9gNJjLGoV5F2X63e1O # DQGg4wqBWP68O/r8Fj8tOUMCTW212DwWyv1+d/lQB+wwpJK+P4O14dCW # =Fd+y # -----END PGP SIGNATURE----- # gpg: Signature made Sat 22 Oct 2022 03:07:16 EDT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: target/i386: implement FMA instructions target/i386: implement F16C instructions target/i386: introduce function to set rounding mode from FPCW or MXCSR bits target/i386: decode-new: avoid out-of-bounds access to xmm_regs[-1]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
show more ...
|
#
2872b0f3 |
| 19-Oct-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: implement FMA instructions
The only issue with FMA instructions is that there are _a lot_ of them (30 opcodes, each of which comes in up to 4 versions depending on VEX.W and VEX.L; a to
target/i386: implement FMA instructions
The only issue with FMA instructions is that there are _a lot_ of them (30 opcodes, each of which comes in up to 4 versions depending on VEX.W and VEX.L; a total of 96 possibilities). However, they can be implement with only 6 helpers, two for scalar operations and four for packed operations. (Scalar versions do not do any merging; they only affect the bottom 32 or 64 bits of the output operand. Therefore, there is no separate XMM and YMM of the scalar helpers).
First, we can reduce the number of helpers to one third by passing four operands (one output and three inputs); the reordering of which operands go to the multiply and which go to the add is done in emit.c.
Second, the different instructions also dispatch to the same softfloat function, so the flags for float32_muladd and float64_muladd are passed in the helper as int arguments, with a little extra complication to handle FMADDSUB and FMSUBADD.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
cf5ec664 |
| 19-Oct-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: implement F16C instructions
F16C only consists of two instructions, which are a bit peculiar nevertheless.
First, they access only the low half of an YMM or XMM register for the packed
target/i386: implement F16C instructions
F16C only consists of two instructions, which are a bit peculiar nevertheless.
First, they access only the low half of an YMM or XMM register for the packed-half operand; the exact size still depends on the VEX.L flag. This is similar to the existing avx_movx flag, but not exactly because avx_movx is hardcoded to affect operand 2. To this end I added a "ph" format name; it's possible to reuse this approach for the VPMOVSX and VPMOVZX instructions, though that would also require adding two more formats for the low-quarter and low-eighth of an operand.
Second, VCVTPS2PH is somewhat weird because it *stores* the result of the instruction into memory rather than loading it.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
214a8da2 |
| 18-Oct-2022 |
Stefan Hajnoczi <stefanha@redhat.com> |
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* configure: don't enable firmware for targets that are not built * configure: don't use strings(1) * scsi, target/i386: swit
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* configure: don't enable firmware for targets that are not built * configure: don't use strings(1) * scsi, target/i386: switch from device_legacy_reset() to device_cold_reset() * target/i386: AVX support for TCG * target/i386: fix SynIC SINT assertion failure on guest reset * target/i386: Use atomic operations for pte updates and other cleanups * tests/tcg: extend SSE tests to AVX * virtio-scsi: send "REPORTED LUNS CHANGED" sense data upon disk hotplug events
# -----BEGIN PGP SIGNATURE----- # # iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmNOlOcUHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroNuvwgAj/Z5pI9KU33XiWKFR3bZf2lHh21P # xmTzNtPmnP1WHDY1DNug/UB+BLg3c+carpTf5n3B8aKI4X3FfxGSJvYlXy4BONFD # XqYMH3OZB5GaR8Wza9trNYjDs/9hOZus/0R6Hqdl/T38PlMjf8mmayULJIGdcFcJ # WJvITVntbcCwwbpyJbRC5BNigG8ZXTNRoKBgtFVGz6Ox+n0YydwKX5qU5J7xRfCU # lW41LjZ0Fk5lonH16+xuS4WD5EyrNt8cMKCGsxnyxhI7nehe/OGnYr9l+xZJclrh # inQlSwJv0IpUJcrGCI4Xugwux4Z7ZXv3JQ37FzsdZcv/ZXpGonXMeXNJ9A== # =o6x7 # -----END PGP SIGNATURE----- # gpg: Signature made Tue 18 Oct 2022 07:58:31 EDT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (53 commits) target/i386: remove old SSE decoder target/i386: move 3DNow to the new decoder tests/tcg: extend SSE tests to AVX target/i386: Enable AVX cpuid bits when using TCG target/i386: implement VLDMXCSR/VSTMXCSR target/i386: implement XSAVE and XRSTOR of AVX registers target/i386: reimplement 0x0f 0x28-0x2f, add AVX target/i386: reimplement 0x0f 0x10-0x17, add AVX target/i386: reimplement 0x0f 0xc2, 0xc4-0xc6, add AVX target/i386: reimplement 0x0f 0x38, add AVX target/i386: Use tcg gvec ops for pmovmskb target/i386: reimplement 0x0f 0x3a, add AVX target/i386: clarify (un)signedness of immediates from 0F3Ah opcodes target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVX target/i386: reimplement 0x0f 0x70-0x77, add AVX target/i386: reimplement 0x0f 0x78-0x7f, add AVX target/i386: reimplement 0x0f 0x50-0x5f, add AVX target/i386: reimplement 0x0f 0xd8-0xdf, 0xe8-0xef, 0xf8-0xff, add AVX target/i386: reimplement 0x0f 0x60-0x6f, add AVX target/i386: Introduce 256-bit vector helpers ...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
show more ...
|
#
71a0891d |
| 05-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: move 3DNow to the new decoder
This adds another kind of weirdness when you thought you had seen it all: an opcode byte that comes _after_ the address, not before. It's not worth adding
target/i386: move 3DNow to the new decoder
This adds another kind of weirdness when you thought you had seen it all: an opcode byte that comes _after_ the address, not before. It's not worth adding a new X86_SPECIAL_* constant for it, but it's actually not unlike VCMP; so, forgive me for exploiting the similarity and just deciding to dispatch to the right gen_helper_* call in a single code generation function.
In fact, the old decoder had a bug where s->rip_offset should have been set to 1 for 3DNow! instructions, and it's fixed now.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
16fc5726 |
| 14-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: reimplement 0x0f 0x38, add AVX
There are several special cases here:
1) extending moves have different widths for the helpers vs. for the memory loads, and the width for memory loads d
target/i386: reimplement 0x0f 0x38, add AVX
There are several special cases here:
1) extending moves have different widths for the helpers vs. for the memory loads, and the width for memory loads depends on VEX.L too. This is represented by X86_SPECIAL_AVXExtMov.
2) some instructions, such as variable-width shifts, select the vector element size via REX.W.
3) VSIB instructions (VGATHERxPy, VPGATHERxy) are also part of this group, and they have (among other things) two output operands.
3) the macros for 4-operand blends (which are under 0x0f 0x3a) have to be extended to support 2-operand blends. The 2-operand variant actually came a few years earlier, but it is clearer to implement them in the opposite order.
X86_TYPE_WM, introduced earlier for unaligned loads, is reused for helpers that accept a Reg* but have a M argument.
These three-byte opcodes also include AVX new instructions, for which the helpers were originally implemented by Paul Brook <paul@nowt.org>.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
6bbeb98d |
| 01-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVX
The more complicated ones here are d6-d7, e6-e7, f7. The others are trivial.
For LDDQU, using gen_load_sse directly might cor
target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVX
The more complicated ones here are d6-d7, e6-e7, f7. The others are trivial.
For LDDQU, using gen_load_sse directly might corrupt the register if the second part of the load fails. Therefore, add a custom X86_TYPE_WM value; like X86_TYPE_W it does call gen_load(), but it also rejects a value of 11 in the ModRM field like X86_TYPE_M.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
55a33286 |
| 05-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: validate SSE prefixes directly in the decoding table
Many SSE and AVX instructions are only valid with specific prefixes (none, 66, F3, F2). Introduce a direct way to encode this in th
target/i386: validate SSE prefixes directly in the decoding table
Many SSE and AVX instructions are only valid with specific prefixes (none, 66, F3, F2). Introduce a direct way to encode this in the decoding table to avoid using decode groups too much.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
20581aad |
| 17-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: validate VEX prefixes via the instructions' exception classes
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
caa01fad |
| 01-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: add CPUID feature checks to new decoder
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6ba13999 |
| 23-Aug-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: add ALU load/writeback core
Add generic code generation that takes care of preparing operands around calls to decode.e.gen in a table-driven manner, so that ALU operations need not take
target/i386: add ALU load/writeback core
Add generic code generation that takes care of preparing operands around calls to decode.e.gen in a table-driven manner, so that ALU operations need not take care of that.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
b3e22b23 |
| 23-Aug-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible from the Intel manual. Centra
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible from the Intel manual. Centralizing the decode the operands makes it more homogeneous, for example all immediates are signed. All modrm handling is in one function, and can be shared between SSE and ALU instructions (including XMM<->GPR instructions). The SSE/AVX decoder will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for the new decoder is more verbose, but the control flow is simpler. Conditionals are not nested and have small bodies. All instruction groups are resolved even before operands are decoded, and code generation is separated as much as possible within small functions that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback as much in common code as possible. All ALU operations for example are implemented as T0=f(T0,T1). For non-ALU instructions, read-modify-write memory operations are rare, but registers do not have TCGv equivalents: therefore, the common logic sets up pointer temporaries with the operands, while load and writeback are handled by gvec or by helpers.
These principles make future code review and extensibility simpler, at the cost of having a relatively large amount of code in the form of this patch. Even EVEX should not be _too_ hard to implement (it's just a crazy large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old decoder with the new one. The old decoder takes care of parsing prefixes and then optionally drops to the new one. The changes to the old decoder are minimal and allow it to be replaced incrementally with the new one.
There is a debugging mechanism through a "LIMIT" environment variable. In user-mode emulation, the variable is the number of instructions decoded by the new decoder before permanently switching to the old one. In system emulation, the variable is the highest opcode that is decoded by the new decoder (this is less friendly, but it's the best that can be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|