#
7170a17e |
| 17-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: reimplement 0x0f 0x10-0x17, add AVX
These are mostly moves, and yet are a total pain. The main issue is that:
1) some instructions are selected by mod==11 (register operand) vs. mod=0
target/i386: reimplement 0x0f 0x10-0x17, add AVX
These are mostly moves, and yet are a total pain. The main issue is that:
1) some instructions are selected by mod==11 (register operand) vs. mod=00/01/10 (memory operand)
2) stores to memory are two-operand operations, while the 3-register and load-from-memory versions operate on the entire contents of the destination; this makes it easier to separate the gen_* function for the store case
3) it's inefficient to load into xmm_T0 only to move the value out again, so the gen_* function for the load case is separated too
The manual also has various mistakes in the operands here, for example the store case of MOVHPS operates on a 128-bit source (albeit discarding the bottom 64 bits) and therefore should be Mq,Vdq rather than Mq,Vq. Likewise for the destination and source of MOVHLPS.
VUNPCK?PS and VUNPCK?PD are the same as VUNPCK?DQ and VUNPCK?QDQ, but encoded as prefixes rather than separate operands. The helpers can be reused however.
For MOVSLDUP, MOVSHDUP and MOVDDUP I chose to reimplement them as helpers. I named the helper for MOVDDUP "movdldup" in preparation for possible future introduction of MOVDHDUP and to clarify the similarity with MOVSLDUP.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
aba2b8ec |
| 06-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: reimplement 0x0f 0xc2, 0xc4-0xc6, add AVX
Nothing special going on here, for once.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@
target/i386: reimplement 0x0f 0xc2, 0xc4-0xc6, add AVX
Nothing special going on here, for once.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
16fc5726 |
| 14-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: reimplement 0x0f 0x38, add AVX
There are several special cases here:
1) extending moves have different widths for the helpers vs. for the memory loads, and the width for memory loads d
target/i386: reimplement 0x0f 0x38, add AVX
There are several special cases here:
1) extending moves have different widths for the helpers vs. for the memory loads, and the width for memory loads depends on VEX.L too. This is represented by X86_SPECIAL_AVXExtMov.
2) some instructions, such as variable-width shifts, select the vector element size via REX.W.
3) VSIB instructions (VGATHERxPy, VPGATHERxy) are also part of this group, and they have (among other things) two output operands.
3) the macros for 4-operand blends (which are under 0x0f 0x3a) have to be extended to support 2-operand blends. The 2-operand variant actually came a few years earlier, but it is clearer to implement them in the opposite order.
X86_TYPE_WM, introduced earlier for unaligned loads, is reused for helpers that accept a Reg* but have a M argument.
These three-byte opcodes also include AVX new instructions, for which the helpers were originally implemented by Paul Brook <paul@nowt.org>.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
79068477 |
| 06-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: reimplement 0x0f 0x3a, add AVX
The more complicated operations here are insertions and extractions. Otherwise, there are just more entries than usual because the PS/PD/SS/SD variations
target/i386: reimplement 0x0f 0x3a, add AVX
The more complicated operations here are insertions and extractions. Otherwise, there are just more entries than usual because the PS/PD/SS/SD variations are encoded in the opcode rater than in the prefixes.
These three-byte opcodes also include AVX new instructions, whose implementation in the helpers was originally done by Paul Brook <paul@nowt.org>.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
6bbeb98d |
| 01-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVX
The more complicated ones here are d6-d7, e6-e7, f7. The others are trivial.
For LDDQU, using gen_load_sse directly might cor
target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVX
The more complicated ones here are d6-d7, e6-e7, f7. The others are trivial.
For LDDQU, using gen_load_sse directly might corrupt the register if the second part of the load fails. Therefore, add a custom X86_TYPE_WM value; like X86_TYPE_W it does call gen_load(), but it also rejects a value of 11 in the ModRM field like X86_TYPE_M.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
ce4fcb94 |
| 02-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: reimplement 0x0f 0x70-0x77, add AVX
This includes shifts by immediate, which use bits 3-5 of the ModRM byte as an opcode extension. With the exception of 128-bit shifts, they are imple
target/i386: reimplement 0x0f 0x70-0x77, add AVX
This includes shifts by immediate, which use bits 3-5 of the ModRM byte as an opcode extension. With the exception of 128-bit shifts, they are implemented using gvec.
This also covers VZEROALL and VZEROUPPER, which use the same opcode as EMMS. If we were wanting to optimize out gen_clear_ymmh then this would be one of the starting points. The implementation of the VZEROALL and VZEROUPPER helpers is by Paul Brook.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
d1c1a422 |
| 01-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: reimplement 0x0f 0x78-0x7f, add AVX
These are a mixed batch, including the first two horizontal (66 and F2 only) operations, more moves, and SSE4a extract/insert.
Because SSE4a is pret
target/i386: reimplement 0x0f 0x78-0x7f, add AVX
These are a mixed batch, including the first two horizontal (66 and F2 only) operations, more moves, and SSE4a extract/insert.
Because SSE4a is pretty rare, I chose to leave the helper as they are, but it is possible to unify them by loading index and length from the source XMM register and generating deposit or extract TCG ops.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
03b45880 |
| 01-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: reimplement 0x0f 0x50-0x5f, add AVX
These are mostly floating-point SSE operations. The odd ones out are MOVMSK and CVTxx2yy, the others are straightforward.
Unary operations are a bi
target/i386: reimplement 0x0f 0x50-0x5f, add AVX
These are mostly floating-point SSE operations. The odd ones out are MOVMSK and CVTxx2yy, the others are straightforward.
Unary operations are a bit special in AVX because they have 2 operands for PD/PS operands (VEX.vvvv must be 1111b), and 3 operands for SD/SS. They are handled using X86_OP_GROUP3 for compactness.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
1d0efbdb |
| 05-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: reimplement 0x0f 0xd8-0xdf, 0xe8-0xef, 0xf8-0xff, add AVX
These are more simple integer instructions present in both MMX and SSE/AVX, with no holes that were later occupied by newer ins
target/i386: reimplement 0x0f 0xd8-0xdf, 0xe8-0xef, 0xf8-0xff, add AVX
These are more simple integer instructions present in both MMX and SSE/AVX, with no holes that were later occupied by newer instructions.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
92ec056a |
| 20-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: reimplement 0x0f 0x60-0x6f, add AVX
These are both MMX and SSE/AVX instructions, except for vmovdqu. In both cases the inputs and output is in s->ptr{0,1,2}, so the only difference bet
target/i386: reimplement 0x0f 0x60-0x6f, add AVX
These are both MMX and SSE/AVX instructions, except for vmovdqu. In both cases the inputs and output is in s->ptr{0,1,2}, so the only difference between MMX, SSE, and AVX is which helper to call.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
620f7556 |
| 09-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: provide 3-operand versions of unary scalar helpers
Compared to Paul's implementation, the new decoder will use a different approach to implement AVX's merging of dst with src1 on scalar
target/i386: provide 3-operand versions of unary scalar helpers
Compared to Paul's implementation, the new decoder will use a different approach to implement AVX's merging of dst with src1 on scalar operations. Adjust the old SSE decoder to be compatible with new-style helpers.
The affected instructions are CVTSx2Sx, ROUNDSx, RSQRTSx, SQRTSx, RCPSx.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
f05f9789 |
| 26-Aug-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: extend helpers to support VEX.V 3- and 4- operand encodings
Add to the helpers all the operands that are needed to implement AVX.
Extracted from a patch by Paul Brook <paul@nowt.org>.
target/i386: extend helpers to support VEX.V 3- and 4- operand encodings
Add to the helpers all the operands that are needed to implement AVX.
Extracted from a patch by Paul Brook <paul@nowt.org>.
Message-Id: <20220424220204.2493824-26-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
1d0b9261 |
| 24-Aug-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: move scalar 0F 38 and 0F 3A instruction to new decoder
Because these are the only VEX instructions that QEMU supports, the new decoder is entered on the first byte of a valid VEX prefix
target/i386: move scalar 0F 38 and 0F 3A instruction to new decoder
Because these are the only VEX instructions that QEMU supports, the new decoder is entered on the first byte of a valid VEX prefix, and VEX decoding only needs to be done in decode-new.c.inc.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
20581aad |
| 17-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: validate VEX prefixes via the instructions' exception classes
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
268dc464 |
| 10-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: add CPUID[EAX=7,ECX=0].ECX to DisasContext
TCG will shortly implement VAES instructions, so add the relevant feature word to the DisasContext.
Reviewed-by: Richard Henderson <richard.h
target/i386: add CPUID[EAX=7,ECX=0].ECX to DisasContext
TCG will shortly implement VAES instructions, so add the relevant feature word to the DisasContext.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
6ba13999 |
| 23-Aug-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: add ALU load/writeback core
Add generic code generation that takes care of preparing operands around calls to decode.e.gen in a table-driven manner, so that ALU operations need not take
target/i386: add ALU load/writeback core
Add generic code generation that takes care of preparing operands around calls to decode.e.gen in a table-driven manner, so that ALU operations need not take care of that.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
b3e22b23 |
| 23-Aug-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible from the Intel manual. Centra
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible from the Intel manual. Centralizing the decode the operands makes it more homogeneous, for example all immediates are signed. All modrm handling is in one function, and can be shared between SSE and ALU instructions (including XMM<->GPR instructions). The SSE/AVX decoder will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for the new decoder is more verbose, but the control flow is simpler. Conditionals are not nested and have small bodies. All instruction groups are resolved even before operands are decoded, and code generation is separated as much as possible within small functions that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback as much in common code as possible. All ALU operations for example are implemented as T0=f(T0,T1). For non-ALU instructions, read-modify-write memory operations are rare, but registers do not have TCGv equivalents: therefore, the common logic sets up pointer temporaries with the operands, while load and writeback are handled by gvec or by helpers.
These principles make future code review and extensibility simpler, at the cost of having a relatively large amount of code in the form of this patch. Even EVEX should not be _too_ hard to implement (it's just a crazy large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old decoder with the new one. The old decoder takes care of parsing prefixes and then optionally drops to the new one. The changes to the old decoder are minimal and allow it to be replaced incrementally with the new one.
There is a debugging mechanism through a "LIMIT" environment variable. In user-mode emulation, the variable is the number of instructions decoded by the new decoder before permanently switching to the old one. In system emulation, the variable is the highest opcode that is decoded by the new decoder (this is less friendly, but it's the best that can be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
a61ef762 |
| 18-Oct-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: make rex_w available even in 32-bit mode
REX.W can be used even in 32-bit mode by AVX instructions, where it is retroactively renamed to VEX.W. Make the field available even in 32-bit
target/i386: make rex_w available even in 32-bit mode
REX.W can be used even in 32-bit mode by AVX instructions, where it is retroactively renamed to VEX.W. Make the field available even in 32-bit mode but keep the REX_W() macro as it was; this way, that the handling of dflag does not use it by mistake and the AVX code more clearly points at the special VEX behavior of the bit.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
12a2c9c7 |
| 02-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
target/i386: make ldo/sto operations consistent with ldq
ldq takes a pointer to the first byte to load the 64-bit word in; ldo takes a pointer to the first byte of the ZMMReg. Make them consistent,
target/i386: make ldo/sto operations consistent with ldq
ldq takes a pointer to the first byte to load the 64-bit word in; ldo takes a pointer to the first byte of the ZMMReg. Make them consistent, which will be useful in the new SSE decoder's load/writeback routines.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
913f0836 |
| 16-Oct-2022 |
Richard Henderson <richard.henderson@linaro.org> |
target/i386: Save and restore pc_save before tcg_remove_ops_after
Restore pc_save while undoing any state change that may have happened while decoding the instruction. Leave a TODO about removing a
target/i386: Save and restore pc_save before tcg_remove_ops_after
Restore pc_save while undoing any state change that may have happened while decoding the instruction. Leave a TODO about removing all of that when the table-based decoder is complete.
Cc: Paolo Bonzini <pbonzini@redhat.com> Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221016222303.288551-1-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
e3a79e0e |
| 01-Oct-2022 |
Richard Henderson <richard.henderson@linaro.org> |
target/i386: Enable TARGET_TB_PCREL
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221001140935.465607-27-richard.hen
target/i386: Enable TARGET_TB_PCREL
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221001140935.465607-27-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
7db973be |
| 01-Oct-2022 |
Richard Henderson <richard.henderson@linaro.org> |
target/i386: Inline gen_jmp_im
Expand this function at each of its callers.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-
target/i386: Inline gen_jmp_im
Expand this function at each of its callers.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221001140935.465607-26-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
f771ca6a |
| 01-Oct-2022 |
Richard Henderson <richard.henderson@linaro.org> |
target/i386: Add cpu_eip
Create a tcg global temp for this, and use it instead of explicit stores.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Paolo Bonzini <pbonzi
target/i386: Add cpu_eip
Create a tcg global temp for this, and use it instead of explicit stores.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221001140935.465607-25-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
75ec746a |
| 01-Oct-2022 |
Richard Henderson <richard.henderson@linaro.org> |
target/i386: Create eip_cur_tl
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221001140935.465607-24-richard.henderso
target/i386: Create eip_cur_tl
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221001140935.465607-24-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
900cc7e5 |
| 01-Oct-2022 |
Richard Henderson <richard.henderson@linaro.org> |
target/i386: Merge gen_jmp_tb and gen_goto_tb into gen_jmp_rel
These functions have only one caller, and the logic is more obvious this way.
Signed-off-by: Richard Henderson <richard.henderson@lina
target/i386: Merge gen_jmp_tb and gen_goto_tb into gen_jmp_rel
These functions have only one caller, and the logic is more obvious this way.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221001140935.465607-23-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|