1.. _decodetree: 2 3======================== 4Decodetree Specification 5======================== 6 7A *decodetree* is built from instruction *patterns*. A pattern may 8represent a single architectural instruction or a group of same, depending 9on what is convenient for further processing. 10 11Each pattern has both *fixedbits* and *fixedmask*, the combination of which 12describes the condition under which the pattern is matched:: 13 14 (insn & fixedmask) == fixedbits 15 16Each pattern may have *fields*, which are extracted from the insn and 17passed along to the translator. Examples of such are registers, 18immediates, and sub-opcodes. 19 20In support of patterns, one may declare *fields*, *argument sets*, and 21*formats*, each of which may be re-used to simplify further definitions. 22 23Fields 24====== 25 26Syntax:: 27 28 field_def := '%' identifier ( field )* ( !function=identifier )? 29 field := unnamed_field | named_field 30 unnamed_field := number ':' ( 's' ) number 31 named_field := identifier ':' ( 's' ) number 32 33For *unnamed_field*, the first number is the least-significant bit position 34of the field and the second number is the length of the field. If the 's' is 35present, the field is considered signed. 36 37A *named_field* refers to some other field in the instruction pattern 38or format. Regardless of the length of the other field where it is 39defined, it will be inserted into this field with the specified 40signedness and bit width. 41 42Field definitions that involve loops (i.e. where a field is defined 43directly or indirectly in terms of itself) are errors. 44 45A format can include fields that refer to named fields that are 46defined in the instruction pattern(s) that use the format. 47Conversely, an instruction pattern can include fields that refer to 48named fields that are defined in the format it uses. However you 49cannot currently do both at once (i.e. pattern P uses format F; F has 50a field A that refers to a named field B that is defined in P, and P 51has a field C that refers to a named field D that is defined in F). 52 53If multiple ``fields`` are present, they are concatenated. 54In this way one can define disjoint fields. 55 56If ``!function`` is specified, the concatenated result is passed through the 57named function, taking and returning an integral value. 58 59One may use ``!function`` with zero ``fields``. This case is called 60a *parameter*, and the named function is only passed the ``DisasContext`` 61and returns an integral value extracted from there. 62 63A field with no ``fields`` and no ``!function`` is in error. 64 65Field examples: 66 67+---------------------------+---------------------------------------------+ 68| Input | Generated code | 69+===========================+=============================================+ 70| %disp 0:s16 | sextract(i, 0, 16) | 71+---------------------------+---------------------------------------------+ 72| %imm9 16:6 10:3 | extract(i, 16, 6) << 3 | extract(i, 10, 3) | 73+---------------------------+---------------------------------------------+ 74| %disp12 0:s1 1:1 2:10 | sextract(i, 0, 1) << 11 | | 75| | extract(i, 1, 1) << 10 | | 76| | extract(i, 2, 10) | 77+---------------------------+---------------------------------------------+ 78| %shimm8 5:s8 13:1 | expand_shimm8(sextract(i, 5, 8) << 1 | | 79| !function=expand_shimm8 | extract(i, 13, 1)) | 80+---------------------------+---------------------------------------------+ 81| %sz_imm 10:2 sz:3 | expand_sz_imm(extract(i, 10, 2) << 3 | | 82| !function=expand_sz_imm | extract(a->sz, 0, 3)) | 83+---------------------------+---------------------------------------------+ 84 85Argument Sets 86============= 87 88Syntax:: 89 90 args_def := '&' identifier ( args_elt )+ ( !extern )? 91 args_elt := identifier (':' identifier)? 92 93Each *args_elt* defines an argument within the argument set. 94If the form of the *args_elt* contains a colon, the first 95identifier is the argument name and the second identifier is 96the argument type. If the colon is missing, the argument 97type will be ``int``. 98 99Each argument set will be rendered as a C structure "arg_$name" 100with each of the fields being one of the member arguments. 101 102If ``!extern`` is specified, the backing structure is assumed 103to have been already declared, typically via a second decoder. 104 105Argument sets are useful when one wants to define helper functions 106for the translator functions that can perform operations on a common 107set of arguments. This can ensure, for instance, that the ``AND`` 108pattern and the ``OR`` pattern put their operands into the same named 109structure, so that a common ``gen_logic_insn`` may be able to handle 110the operations common between the two. 111 112Argument set examples:: 113 114 ®3 ra rb rc 115 &loadstore reg base offset 116 &longldst reg base offset:int64_t 117 118 119Formats 120======= 121 122Syntax:: 123 124 fmt_def := '@' identifier ( fmt_elt )+ 125 fmt_elt := fixedbit_elt | field_elt | field_ref | args_ref 126 fixedbit_elt := [01.-]+ 127 field_elt := identifier ':' 's'? number 128 field_ref := '%' identifier | identifier '=' '%' identifier 129 args_ref := '&' identifier 130 131Defining a format is a handy way to avoid replicating groups of fields 132across many instruction patterns. 133 134A *fixedbit_elt* describes a contiguous sequence of bits that must 135be 1, 0, or don't care. The difference between '.' and '-' 136is that '.' means that the bit will be covered with a field or a 137final 0 or 1 from the pattern, and '-' means that the bit is really 138ignored by the cpu and will not be specified. 139 140A *field_elt* describes a simple field only given a width; the position of 141the field is implied by its position with respect to other *fixedbit_elt* 142and *field_elt*. 143 144If any *fixedbit_elt* or *field_elt* appear, then all bits must be defined. 145Padding with a *fixedbit_elt* of all '.' is an easy way to accomplish that. 146 147A *field_ref* incorporates a field by reference. This is the only way to 148add a complex field to a format. A field may be renamed in the process 149via assignment to another identifier. This is intended to allow the 150same argument set be used with disjoint named fields. 151 152A single *args_ref* may specify an argument set to use for the format. 153The set of fields in the format must be a subset of the arguments in 154the argument set. If an argument set is not specified, one will be 155inferred from the set of fields. 156 157It is recommended, but not required, that all *field_ref* and *args_ref* 158appear at the end of the line, not interleaving with *fixedbit_elf* or 159*field_elt*. 160 161Format examples:: 162 163 @opr ...... ra:5 rb:5 ... 0 ....... rc:5 164 @opi ...... ra:5 lit:8 1 ....... rc:5 165 166Patterns 167======== 168 169Syntax:: 170 171 pat_def := identifier ( pat_elt )+ 172 pat_elt := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt 173 fmt_ref := '@' identifier 174 const_elt := identifier '=' number 175 176The *fixedbit_elt* and *field_elt* specifiers are unchanged from formats. 177A pattern that does not specify a named format will have one inferred 178from a referenced argument set (if present) and the set of fields. 179 180A *const_elt* allows a argument to be set to a constant value. This may 181come in handy when fields overlap between patterns and one has to 182include the values in the *fixedbit_elt* instead. 183 184The decoder will call a translator function for each pattern matched. 185 186Pattern examples:: 187 188 addl_r 010000 ..... ..... .... 0000000 ..... @opr 189 addl_i 010000 ..... ..... .... 0000000 ..... @opi 190 191which will, in part, invoke:: 192 193 trans_addl_r(ctx, &arg_opr, insn) 194 195and:: 196 197 trans_addl_i(ctx, &arg_opi, insn) 198 199Pattern Groups 200============== 201 202Syntax:: 203 204 group := overlap_group | no_overlap_group 205 overlap_group := '{' ( pat_def | group )+ '}' 206 no_overlap_group := '[' ( pat_def | group )+ ']' 207 208A *group* begins with a lone open-brace or open-bracket, with all 209subsequent lines indented two spaces, and ending with a lone 210close-brace or close-bracket. Groups may be nested, increasing the 211required indentation of the lines within the nested group to two 212spaces per nesting level. 213 214Patterns within overlap groups are allowed to overlap. Conflicts are 215resolved by selecting the patterns in order. If all of the fixedbits 216for a pattern match, its translate function will be called. If the 217translate function returns false, then subsequent patterns within the 218group will be matched. 219 220Patterns within no-overlap groups are not allowed to overlap, just 221the same as ungrouped patterns. Thus no-overlap groups are intended 222to be nested inside overlap groups. 223 224The following example from PA-RISC shows specialization of the *or* 225instruction:: 226 227 { 228 { 229 nop 000010 ----- ----- 0000 001001 0 00000 230 copy 000010 00000 r1:5 0000 001001 0 rt:5 231 } 232 or 000010 rt2:5 r1:5 cf:4 001001 0 rt:5 233 } 234 235When the *cf* field is zero, the instruction has no side effects, 236and may be specialized. When the *rt* field is zero, the output 237is discarded and so the instruction has no effect. When the *rt2* 238field is zero, the operation is ``reg[r1] | 0`` and so encodes 239the canonical register copy operation. 240 241The output from the generator might look like:: 242 243 switch (insn & 0xfc000fe0) { 244 case 0x08000240: 245 /* 000010.. ........ ....0010 010..... */ 246 if ((insn & 0x0000f000) == 0x00000000) { 247 /* 000010.. ........ 00000010 010..... */ 248 if ((insn & 0x0000001f) == 0x00000000) { 249 /* 000010.. ........ 00000010 01000000 */ 250 extract_decode_Fmt_0(&u.f_decode0, insn); 251 if (trans_nop(ctx, &u.f_decode0)) return true; 252 } 253 if ((insn & 0x03e00000) == 0x00000000) { 254 /* 00001000 000..... 00000010 010..... */ 255 extract_decode_Fmt_1(&u.f_decode1, insn); 256 if (trans_copy(ctx, &u.f_decode1)) return true; 257 } 258 } 259 extract_decode_Fmt_2(&u.f_decode2, insn); 260 if (trans_or(ctx, &u.f_decode2)) return true; 261 return false; 262 } 263