xref: /qemu/docs/devel/decodetree.rst (revision 0e3aff9ec34059512d597eacfcf4d1b5d4570c50)
1.. _decodetree:
2
3========================
4Decodetree Specification
5========================
6
7A *decodetree* is built from instruction *patterns*.  A pattern may
8represent a single architectural instruction or a group of same, depending
9on what is convenient for further processing.
10
11Each pattern has both *fixedbits* and *fixedmask*, the combination of which
12describes the condition under which the pattern is matched::
13
14  (insn & fixedmask) == fixedbits
15
16Each pattern may have *fields*, which are extracted from the insn and
17passed along to the translator.  Examples of such are registers,
18immediates, and sub-opcodes.
19
20In support of patterns, one may declare *fields*, *argument sets*, and
21*formats*, each of which may be re-used to simplify further definitions.
22
23Fields
24======
25
26Syntax::
27
28  field_def     := '%' identifier ( field )* ( !function=identifier )?
29  field         := unnamed_field | named_field
30  unnamed_field := number ':' ( 's' ) number
31  named_field   := identifier ':' ( 's' ) number
32
33For *unnamed_field*, the first number is the least-significant bit position
34of the field and the second number is the length of the field.  If the 's' is
35present, the field is considered signed.
36
37A *named_field* refers to some other field in the instruction pattern
38or format. Regardless of the length of the other field where it is
39defined, it will be inserted into this field with the specified
40signedness and bit width.
41
42Field definitions that involve loops (i.e. where a field is defined
43directly or indirectly in terms of itself) are errors.
44
45A format can include fields that refer to named fields that are
46defined in the instruction pattern(s) that use the format.
47Conversely, an instruction pattern can include fields that refer to
48named fields that are defined in the format it uses. However you
49cannot currently do both at once (i.e. pattern P uses format F; F has
50a field A that refers to a named field B that is defined in P, and P
51has a field C that refers to a named field D that is defined in F).
52
53If multiple ``fields`` are present, they are concatenated.
54In this way one can define disjoint fields.
55
56If ``!function`` is specified, the concatenated result is passed through the
57named function, taking and returning an integral value.
58
59One may use ``!function`` with zero ``fields``.  This case is called
60a *parameter*, and the named function is only passed the ``DisasContext``
61and returns an integral value extracted from there.
62
63A field with no ``fields`` and no ``!function`` is in error.
64
65Field examples:
66
67+---------------------------+---------------------------------------------+
68| Input                     | Generated code                              |
69+===========================+=============================================+
70| %disp   0:s16             | sextract(i, 0, 16)                          |
71+---------------------------+---------------------------------------------+
72| %imm9   16:6 10:3         | extract(i, 16, 6) << 3 | extract(i, 10, 3)  |
73+---------------------------+---------------------------------------------+
74| %disp12 0:s1 1:1 2:10     | sextract(i, 0, 1) << 11 |                   |
75|                           |    extract(i, 1, 1) << 10 |                 |
76|                           |    extract(i, 2, 10)                        |
77+---------------------------+---------------------------------------------+
78| %shimm8 5:s8 13:1         | expand_shimm8(sextract(i, 5, 8) << 1 |      |
79|   !function=expand_shimm8 |               extract(i, 13, 1))            |
80+---------------------------+---------------------------------------------+
81| %sz_imm 10:2 sz:3         | expand_sz_imm(extract(i, 10, 2) << 3 |      |
82|   !function=expand_sz_imm |               extract(a->sz, 0, 3))         |
83+---------------------------+---------------------------------------------+
84
85Argument Sets
86=============
87
88Syntax::
89
90  args_def    := '&' identifier ( args_elt )+ ( !extern )?
91  args_elt    := identifier (':' identifier)?
92
93Each *args_elt* defines an argument within the argument set.
94If the form of the *args_elt* contains a colon, the first
95identifier is the argument name and the second identifier is
96the argument type.  If the colon is missing, the argument
97type will be ``int``.
98
99Each argument set will be rendered as a C structure "arg_$name"
100with each of the fields being one of the member arguments.
101
102If ``!extern`` is specified, the backing structure is assumed
103to have been already declared, typically via a second decoder.
104
105Argument sets are useful when one wants to define helper functions
106for the translator functions that can perform operations on a common
107set of arguments.  This can ensure, for instance, that the ``AND``
108pattern and the ``OR`` pattern put their operands into the same named
109structure, so that a common ``gen_logic_insn`` may be able to handle
110the operations common between the two.
111
112Argument set examples::
113
114  &reg3       ra rb rc
115  &loadstore  reg base offset
116  &longldst   reg base offset:int64_t
117
118
119Formats
120=======
121
122Syntax::
123
124  fmt_def      := '@' identifier ( fmt_elt )+
125  fmt_elt      := fixedbit_elt | field_elt | field_ref | args_ref
126  fixedbit_elt := [01.-]+
127  field_elt    := identifier ':' 's'? number
128  field_ref    := '%' identifier | identifier '=' '%' identifier
129  args_ref     := '&' identifier
130
131Defining a format is a handy way to avoid replicating groups of fields
132across many instruction patterns.
133
134A *fixedbit_elt* describes a contiguous sequence of bits that must
135be 1, 0, or don't care.  The difference between '.' and '-'
136is that '.' means that the bit will be covered with a field or a
137final 0 or 1 from the pattern, and '-' means that the bit is really
138ignored by the cpu and will not be specified.
139
140A *field_elt* describes a simple field only given a width; the position of
141the field is implied by its position with respect to other *fixedbit_elt*
142and *field_elt*.
143
144If any *fixedbit_elt* or *field_elt* appear, then all bits must be defined.
145Padding with a *fixedbit_elt* of all '.' is an easy way to accomplish that.
146
147A *field_ref* incorporates a field by reference.  This is the only way to
148add a complex field to a format.  A field may be renamed in the process
149via assignment to another identifier.  This is intended to allow the
150same argument set be used with disjoint named fields.
151
152A single *args_ref* may specify an argument set to use for the format.
153The set of fields in the format must be a subset of the arguments in
154the argument set.  If an argument set is not specified, one will be
155inferred from the set of fields.
156
157It is recommended, but not required, that all *field_ref* and *args_ref*
158appear at the end of the line, not interleaving with *fixedbit_elf* or
159*field_elt*.
160
161Format examples::
162
163  @opr    ...... ra:5 rb:5 ... 0 ....... rc:5
164  @opi    ...... ra:5 lit:8    1 ....... rc:5
165
166Patterns
167========
168
169Syntax::
170
171  pat_def      := identifier ( pat_elt )+
172  pat_elt      := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt
173  fmt_ref      := '@' identifier
174  const_elt    := identifier '=' number
175
176The *fixedbit_elt* and *field_elt* specifiers are unchanged from formats.
177A pattern that does not specify a named format will have one inferred
178from a referenced argument set (if present) and the set of fields.
179
180A *const_elt* allows a argument to be set to a constant value.  This may
181come in handy when fields overlap between patterns and one has to
182include the values in the *fixedbit_elt* instead.
183
184The decoder will call a translator function for each pattern matched.
185
186Pattern examples::
187
188  addl_r   010000 ..... ..... .... 0000000 ..... @opr
189  addl_i   010000 ..... ..... .... 0000000 ..... @opi
190
191which will, in part, invoke::
192
193  trans_addl_r(ctx, &arg_opr, insn)
194
195and::
196
197  trans_addl_i(ctx, &arg_opi, insn)
198
199Pattern Groups
200==============
201
202Syntax::
203
204  group            := overlap_group | no_overlap_group
205  overlap_group    := '{' ( pat_def | group )+ '}'
206  no_overlap_group := '[' ( pat_def | group )+ ']'
207
208A *group* begins with a lone open-brace or open-bracket, with all
209subsequent lines indented two spaces, and ending with a lone
210close-brace or close-bracket.  Groups may be nested, increasing the
211required indentation of the lines within the nested group to two
212spaces per nesting level.
213
214Patterns within overlap groups are allowed to overlap.  Conflicts are
215resolved by selecting the patterns in order.  If all of the fixedbits
216for a pattern match, its translate function will be called.  If the
217translate function returns false, then subsequent patterns within the
218group will be matched.
219
220Patterns within no-overlap groups are not allowed to overlap, just
221the same as ungrouped patterns.  Thus no-overlap groups are intended
222to be nested inside overlap groups.
223
224The following example from PA-RISC shows specialization of the *or*
225instruction::
226
227  {
228    {
229      nop   000010 ----- ----- 0000 001001 0 00000
230      copy  000010 00000 r1:5  0000 001001 0 rt:5
231    }
232    or      000010 rt2:5 r1:5  cf:4 001001 0 rt:5
233  }
234
235When the *cf* field is zero, the instruction has no side effects,
236and may be specialized.  When the *rt* field is zero, the output
237is discarded and so the instruction has no effect.  When the *rt2*
238field is zero, the operation is ``reg[r1] | 0`` and so encodes
239the canonical register copy operation.
240
241The output from the generator might look like::
242
243  switch (insn & 0xfc000fe0) {
244  case 0x08000240:
245    /* 000010.. ........ ....0010 010..... */
246    if ((insn & 0x0000f000) == 0x00000000) {
247        /* 000010.. ........ 00000010 010..... */
248        if ((insn & 0x0000001f) == 0x00000000) {
249            /* 000010.. ........ 00000010 01000000 */
250            extract_decode_Fmt_0(&u.f_decode0, insn);
251            if (trans_nop(ctx, &u.f_decode0)) return true;
252        }
253        if ((insn & 0x03e00000) == 0x00000000) {
254            /* 00001000 000..... 00000010 010..... */
255            extract_decode_Fmt_1(&u.f_decode1, insn);
256            if (trans_copy(ctx, &u.f_decode1)) return true;
257        }
258    }
259    extract_decode_Fmt_2(&u.f_decode2, insn);
260    if (trans_or(ctx, &u.f_decode2)) return true;
261    return false;
262  }
263