mve_helper.c - OpenGrok history log for /qemu/target/arm/tcg/mve

Revision	Date	Author	Comments
# 1b15a97d	13-Aug-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Implement MVE shift-by-scalar Implement the MVE instructions which perform shifts by a scalar. These are VSHL T2, VRSHL T2, VQSHL T1 and VQRSHL T2. They take the shift amount in a gener target/arm: Implement MVE shift-by-scalar Implement the MVE instructions which perform shifts by a scalar. These are VSHL T2, VRSHL T2, VQSHL T1 and VQRSHL T2. They take the shift amount in a general purpose register and shift every element in the vector by that amount. Mostly we can reuse the helper functions for shift-by-immediate; we do need two new helpers for VQRSHL. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> show more ...
# 6b895bf8	13-Aug-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Implement MVE VMLAS Implement the MVE VMLAS insn, which multiplies a vector by a vector and adds a scalar. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard H target/arm: Implement MVE VMLAS Implement the MVE VMLAS insn, which multiplies a vector by a vector and adds a scalar. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> show more ...
# c386443b	13-Aug-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Implement MVE VPSEL Implement the MVE VPSEL insn, which sets each byte of the destination vector Qd to the byte from either Qn or Qm depending on the value of the corresponding bit in VP target/arm: Implement MVE VPSEL Implement the MVE VPSEL insn, which sets each byte of the destination vector Qd to the byte from either Qn or Qm depending on the value of the corresponding bit in VPR.P0. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> show more ...
# cce81873	13-Aug-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Implement MVE integer vector-vs-scalar comparisons Implement the MVE integer vector comparison instructions that compare each element against a scalar from a general purpose register. T target/arm: Implement MVE integer vector-vs-scalar comparisons Implement the MVE integer vector comparison instructions that compare each element against a scalar from a general purpose register. These are "VCMP (vector)" encodings T4, T5 and T6 and "VPT (vector)" encodings T4, T5 and T6. We have to move the decodetree pattern for VPST, because it overlaps with VCMP T4 with size = 0b11. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> show more ...
# eff5d9a9	13-Aug-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Implement MVE integer vector comparisons Implement the MVE integer vector comparison instructions. These are "VCMP (vector)" encodings T1, T2 and T3, and "VPT (vector)" encodings T1, T2 target/arm: Implement MVE integer vector comparisons Implement the MVE integer vector comparison instructions. These are "VCMP (vector)" encodings T1, T2 and T3, and "VPT (vector)" encodings T1, T2 and T3. These insns compare corresponding elements in each vector, and update the VPR.P0 predicate bits with the results of the comparison. VPT also sets the VPR.MASK01 and VPR.MASK23 fields -- it is effectively "VCMP then VPST". Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> show more ...
# 395b92d5	13-Aug-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Implement MVE incrementing/decrementing dup insns Implement the MVE incrementing/decrementing dup insns VIDUP, VDDUP, VIWDUP and VDWDUP. These fill the elements of a vector with success target/arm: Implement MVE incrementing/decrementing dup insns Implement the MVE incrementing/decrementing dup insns VIDUP, VDDUP, VIWDUP and VDWDUP. These fill the elements of a vector with successively incrementing values, starting at the offset specified in a general purpose register. The final value of the offset is written back to this register. The wrapping variants take a second general purpose register which specifies the point where the count should wrap back to 0. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> show more ...
# c1bd78cb	13-Aug-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Implement MVE VMULL (polynomial) Implement the MVE VMULL (polynomial) insn. Unlike Neon, this comes in two flavours: 8x8->16 and a 16x16->32. Also unlike Neon, the inputs are in either target/arm: Implement MVE VMULL (polynomial) Implement the MVE VMULL (polynomial) insn. Unlike Neon, this comes in two flavours: 8x8->16 and a 16x16->32. Also unlike Neon, the inputs are in either the low or the high half of each double-width element. The assembler for this insn indicates the size with "P8" or "P16", encoded into bit 28 as size = 0 or 1. We choose to follow the same encoding as VQDMULL and decode this into a->size as MO_16 or MO_32 indicating the size of the result elements. This then carries through to the helper function names where it then matches up with the existing pmull_h() which does an 8x8->16 operation and a new pmull_w() which does the 16x16->32. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> show more ...
# 41704cc2	13-Aug-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Fix VLDRB/H/W for predicated elements For vector loads, predicated elements are zeroed, instead of retaining their previous values (as happens for most data processing operations). This target/arm: Fix VLDRB/H/W for predicated elements For vector loads, predicated elements are zeroed, instead of retaining their previous values (as happens for most data processing operations). This means we need to distinguish "beat not executed due to ECI" (don't touch destination element) from "beat executed but predicated out" (zero destination element). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> show more ...
# e3152d02	13-Aug-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Fix VPT advance when ECI is non-zero We were not paying attention to the ECI state when advancing the VPT state. Architecturally, VPT state advance happens for every beat (see the pseud target/arm: Fix VPT advance when ECI is non-zero We were not paying attention to the ECI state when advancing the VPT state. Architecturally, VPT state advance happens for every beat (see the pseudocode VPTAdvance()), so on every beat the 4 bits of VPR.P0 corresponding to the current beat are inverted if required, and at the end of beats 1 and 3 the VPR MASK fields are updated. This means that if the ECI state says we should not be executing all 4 beats then we need to skip some of the updating of the VPR that we currently do in mve_advance_vpt(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> show more ...
# e0d40070	13-Aug-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Factor out mve_eci_mask() In some situations we need a mask telling us which parts of the vector correspond to beats that are not being executed because of ECI, separately from the combi target/arm: Factor out mve_eci_mask() In some situations we need a mask telling us which parts of the vector correspond to beats that are not being executed because of ECI, separately from the combined "which bytes are predicated away" mask. Factor this mask calculation out of mve_element_mask() into its own function. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> show more ...
# 3f4f1880	13-Aug-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Fix calculation of LTP mask when LR is 0 In mve_element_mask(), we calculate a mask for tail predication which should have a number of 1 bits based on the value of LR. However, our MAKE target/arm: Fix calculation of LTP mask when LR is 0 In mve_element_mask(), we calculate a mask for tail predication which should have a number of 1 bits based on the value of LR. However, our MAKE_64BIT_MASK() macro has undefined behaviour when passed a zero length. Special case this to give the all-zeroes mask we require. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> show more ...
# fdcf2269	13-Aug-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Fix MVE 48-bit SQRSHRL for small right shifts We got an edge case wrong in the 48-bit SQRSHRL implementation: if the shift is to the right, although it always makes the result smaller th target/arm: Fix MVE 48-bit SQRSHRL for small right shifts We got an edge case wrong in the 48-bit SQRSHRL implementation: if the shift is to the right, although it always makes the result smaller than the input value it might not be within the 48-bit range the result is supposed to be if the input had some bits in [63..48] set and the shift didn't bring all of those within the [47..0] range. Handle this similarly to the way we already do for this case in do_uqrshl48_d(): extend the calculated result from 48 bits, and return that if not saturating or if it doesn't change the result; otherwise fall through to return a saturated value. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> show more ...
# 95351aa7	13-Aug-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Fix 48-bit saturating shifts In do_sqrshl48_d() and do_uqrshl48_d() we got some of the edge cases wrong and failed to saturate correctly: (1) In do_sqrshl48_d() we used the same code th target/arm: Fix 48-bit saturating shifts In do_sqrshl48_d() and do_uqrshl48_d() we got some of the edge cases wrong and failed to saturate correctly: (1) In do_sqrshl48_d() we used the same code that do_shrshl_bhs() does to obtain the saturated most-negative and most-positive 48-bit signed values for the large-shift-left case. This gives (1 << 47) for saturate-to-most-negative, but we weren't sign-extending this value to the 64-bit output as the pseudocode requires. (2) For left shifts by less than 48, we copied the "8/16 bit" code from do_sqrshl_bhs() and do_uqrshl_bhs(). This doesn't do the right thing because it assumes the C type we're working with is at least twice the number of bits we're saturating to (so that a shift left by bits-1 can't shift anything off the top of the value). This isn't true for bits == 48, so we would incorrectly return 0 rather than the most-positive value for situations like "shift (1 << 44) right by 20". Instead check for saturation by doing the shift and signextend and then testing whether shifting back left again gives the original value. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> show more ...
# a5e59e8d	13-Aug-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Fix mask handling for MVE narrowing operations In the MVE helpers for the narrowing operations (DO_VSHRN and DO_VSHRN_SAT) we were using the wrong bits of the predicate mask for the 'top target/arm: Fix mask handling for MVE narrowing operations In the MVE helpers for the narrowing operations (DO_VSHRN and DO_VSHRN_SAT) we were using the wrong bits of the predicate mask for the 'top' versions of the insn. This is because the loop works over the double-sized input elements and shifts the predicate mask by that many bits each time, but when we write out the half-sized output we must look at the mask bits for whichever half of the element we are writing to. Correct this by shifting the whole mask right by ESIZE bits for the 'top' insns. This allows us also to simplify the saturation bit checking (where we had noticed that we needed to look at a different mask bit for the 'top' insn.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> show more ...
# ed5a59d6	13-Aug-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Fix signed VADDV A cut-and-paste error meant we handled signed VADDV like unsigned VADDV; fix the type used. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard target/arm: Fix signed VADDV A cut-and-paste error meant we handled signed VADDV like unsigned VADDV; fix the type used. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> show more ...
# c88ff884	13-Aug-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Fix MVE VSLI by 0 and VSRI by <dt> In the MVE shift-and-insert insns, we special case VSLI by 0 and VSRI by <dt>. VSRI by <dt> means "don't update the destination", which is what we've i target/arm: Fix MVE VSLI by 0 and VSRI by <dt> In the MVE shift-and-insert insns, we special case VSLI by 0 and VSRI by <dt>. VSRI by <dt> means "don't update the destination", which is what we've implemented. However VSLI by 0 is "set destination to the input", so we don't want to use the same special-casing that we do for VSRI by <dt>. Since the generic logic gives the right answer for a shift by 0, just use that. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> show more ...
# 04ea4d3c	28-Jun-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Implement MVE shifts by register Implement the MVE shifts by register, which perform shifts on a single general-purpose register. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> target/arm: Implement MVE shifts by register Implement the MVE shifts by register, which perform shifts on a single general-purpose register. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-19-peter.maydell@linaro.org show more ...
# 46321d47	28-Jun-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Implement MVE shifts by immediate Implement the MVE shifts by immediate, which perform shifts on a single general-purpose register. These patterns overlap with the long-shift-by-immedia target/arm: Implement MVE shifts by immediate Implement the MVE shifts by immediate, which perform shifts on a single general-purpose register. These patterns overlap with the long-shift-by-immediates, so we have to rearrange the grouping a little here. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-18-peter.maydell@linaro.org show more ...
# 0aa4b4c3	28-Jun-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Implement MVE long shifts by register Implement the MVE long shifts by register, which perform shifts on a pair of general-purpose registers treated as a 64-bit quantity, with the shift target/arm: Implement MVE long shifts by register Implement the MVE long shifts by register, which perform shifts on a pair of general-purpose registers treated as a 64-bit quantity, with the shift count in another general-purpose register, which might be either positive or negative. Like the long-shifts-by-immediate, these encodings sit in the space that was previously the UNPREDICTABLE MOVS/ORRS with Rm==13,15. Because LSLL_rr and ASRL_rr overlap with both MOV_rxri/ORR_rrri and also with CSEL (as one of the previously-UNPREDICTABLE Rm==13 cases), we have to move the CSEL pattern into the same decodetree group. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-17-peter.maydell@linaro.org show more ...
# f4ae6c8c	28-Jun-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Implement MVE long shifts by immediate The MVE extension to v8.1M includes some new shift instructions which sit entirely within the non-coprocessor part of the encoding space and which target/arm: Implement MVE long shifts by immediate The MVE extension to v8.1M includes some new shift instructions which sit entirely within the non-coprocessor part of the encoding space and which operate only on general-purpose registers. They take up the space which was previously UNPREDICTABLE MOVS and ORRS encodings with Rm == 13 or 15. Implement the long shifts by immediate, which perform shifts on a pair of general-purpose registers treated as a 64-bit quantity, with an immediate shift count between 1 and 32. Awkwardly, because the MOVS and ORRS trans functions do not UNDEF for the Rm==13,15 case, we need to explicitly emit code to UNDEF for the cases where v8.1M now requires that. (Trying to change MOVS and ORRS is too difficult, because the functions that generate the code are shared between a dozen different kinds of arithmetic or logical instruction for all A32, T16 and T32 encodings, and for some insns and some encodings Rm==13,15 are valid.) We make the helper functions we need for UQSHLL and SQSHLL take a 32-bit value which the helper casts to int8_t because we'll need these helpers also for the shift-by-register insns, where the shift count might be < 0 or > 32. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-16-peter.maydell@linaro.org show more ...
# d43ebd9d	28-Jun-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Implement MVE VADDLV Implement the MVE VADDLV insn; this is similar to VADDV, except that it accumulates 32-bit elements into a 64-bit accumulator stored in a pair of general-purpose reg target/arm: Implement MVE VADDLV Implement the MVE VADDLV insn; this is similar to VADDV, except that it accumulates 32-bit elements into a 64-bit accumulator stored in a pair of general-purpose registers. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-15-peter.maydell@linaro.org show more ...
# 2e6a4ce0	28-Jun-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Implement MVE VSHLC Implement the MVE VSHLC insn, which performs a shift left of the entire vector with carry in bits provided from a general purpose register and carry out bits written target/arm: Implement MVE VSHLC Implement the MVE VSHLC insn, which performs a shift left of the entire vector with carry in bits provided from a general purpose register and carry out bits written back to that register. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-14-peter.maydell@linaro.org show more ...
# d6f9e011	28-Jun-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Implement MVE saturating narrowing shifts Implement the MVE saturating shift-right-and-narrow insns VQSHRN, VQSHRUN, VQRSHRN and VQRSHRUN. do_srshr() is borrowed from sve_helper.c. Sig target/arm: Implement MVE saturating narrowing shifts Implement the MVE saturating shift-right-and-narrow insns VQSHRN, VQSHRUN, VQRSHRN and VQRSHRUN. do_srshr() is borrowed from sve_helper.c. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-13-peter.maydell@linaro.org show more ...
# 162e2655	28-Jun-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Implement MVE VSHRN, VRSHRN Implement the MVE shift-right-and-narrow insn VSHRN and VRSHRN. do_urshr() is borrowed from sve_helper.c. Signed-off-by: Peter Maydell <peter.maydell@linaro target/arm: Implement MVE VSHRN, VRSHRN Implement the MVE shift-right-and-narrow insn VSHRN and VRSHRN. do_urshr() is borrowed from sve_helper.c. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-12-peter.maydell@linaro.org show more ...
# a78b25fa	28-Jun-2021	Peter Maydell <peter.maydell@linaro.org>	target/arm: Implement MVE VSRI, VSLI Implement the MVE VSRI and VSLI insns, which perform a shift-and-insert operation. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard target/arm: Implement MVE VSRI, VSLI Implement the MVE VSRI and VSLI insns, which perform a shift-and-insert operation. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-11-peter.maydell@linaro.org show more ...
1 234 5