xref: /qemu/docs/devel/code-provenance.rst (revision 3d40db0efc22520fa6c399cf73960dced423b048)
1.. _code-provenance:
2
3Code provenance
4===============
5
6Certifying patch submissions
7~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8
9The QEMU community **mandates** all contributors to certify provenance of
10patch submissions they make to the project. To put it another way,
11contributors must indicate that they are legally permitted to contribute to
12the project.
13
14Certification is achieved with a low overhead by adding a single line to the
15bottom of every git commit::
16
17   Signed-off-by: YOUR NAME <YOUR@EMAIL>
18
19The addition of this line asserts that the author of the patch is contributing
20in accordance with the clauses specified in the
21`Developer's Certificate of Origin <https://developercertificate.org>`__:
22
23.. _dco:
24
25  Developer's Certificate of Origin 1.1
26
27  By making a contribution to this project, I certify that:
28
29  (a) The contribution was created in whole or in part by me and I
30      have the right to submit it under the open source license
31      indicated in the file; or
32
33  (b) The contribution is based upon previous work that, to the best
34      of my knowledge, is covered under an appropriate open source
35      license and I have the right under that license to submit that
36      work with modifications, whether created in whole or in part
37      by me, under the same open source license (unless I am
38      permitted to submit under a different license), as indicated
39      in the file; or
40
41  (c) The contribution was provided directly to me by some other
42      person who certified (a), (b) or (c) and I have not modified
43      it.
44
45  (d) I understand and agree that this project and the contribution
46      are public and that a record of the contribution (including all
47      personal information I submit with it, including my sign-off) is
48      maintained indefinitely and may be redistributed consistent with
49      this project or the open source license(s) involved.
50
51The name used with "Signed-off-by" does not need to be your legal name, nor
52birth name, nor appear on any government ID. It is the identity you choose to
53be known by in the community, but should not be anonymous, nor misrepresent
54whom you are.
55
56It is generally expected that the name and email addresses used in one of the
57``Signed-off-by`` lines, matches that of the git commit ``Author`` field.
58It's okay if you subscribe or contribute to the list via more than one
59address, but using multiple addresses in one commit just confuses
60things.
61
62If the person sending the mail is not one of the patch authors, they are
63nonetheless expected to add their own ``Signed-off-by`` to comply with the
64DCO clause (c).
65
66Multiple authorship
67~~~~~~~~~~~~~~~~~~~
68
69It is not uncommon for a patch to have contributions from multiple authors. In
70this scenario, git commits will usually be expected to have a ``Signed-off-by``
71line for each contributor involved in creation of the patch. Some edge cases:
72
73  * The non-primary author's contributions were so trivial that they can be
74    considered not subject to copyright. In this case the secondary authors
75    need not include a ``Signed-off-by``.
76
77    This case most commonly applies where QEMU reviewers give short snippets
78    of code as suggested fixes to a patch. The reviewers don't need to have
79    their own ``Signed-off-by`` added unless their code suggestion was
80    unusually large, but it is common to add ``Suggested-by`` as a credit
81    for non-trivial code.
82
83  * Both contributors work for the same employer and the employer requires
84    copyright assignment.
85
86    It can be said that in this case a ``Signed-off-by`` is indicating that
87    the person has permission to contribute from their employer who is the
88    copyright holder. It is nonetheless still preferable to include a
89    ``Signed-off-by`` for each contributor, as in some countries employees are
90    not able to assign copyright to their employer, and it also covers any
91    time invested outside working hours.
92
93When multiple ``Signed-off-by`` tags are present, they should be strictly kept
94in order of authorship, from oldest to newest.
95
96Other commit tags
97~~~~~~~~~~~~~~~~~
98
99While the ``Signed-off-by`` tag is mandatory, there are a number of other tags
100that are commonly used during QEMU development:
101
102 * **``Reviewed-by``**: when a QEMU community member reviews a patch on the
103   mailing list, if they consider the patch acceptable, they should send an
104   email reply containing a ``Reviewed-by`` tag. Subsystem maintainers who
105   review a patch should add this even if they are also adding their
106   ``Signed-off-by`` to the same commit.
107
108 * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch that
109   touches their subsystem, but intends to allow a different maintainer to
110   queue it and send a pull request, they would send a mail containing a
111   ``Acked-by`` tag. Where a patch touches multiple subsystems, ``Acked-by``
112   only implies review of the maintainers' own areas of responsibility. If a
113   maintainer wants to indicate they have done a full review they should use
114   a ``Reviewed-by`` tag.
115
116 * **``Tested-by``**: when a QEMU community member has functionally tested the
117   behaviour of the patch in some manner, they should send an email reply
118   containing a ``Tested-by`` tag.
119
120 * **``Reported-by``**: when a QEMU community member reports a problem via the
121   mailing list, or some other informal channel that is not the issue tracker,
122   it is good practice to credit them by including a ``Reported-by`` tag on
123   any patch fixing the issue. When the problem is reported via the GitLab
124   issue tracker, however, it is sufficient to just include a link to the
125   issue.
126
127 * **``Suggested-by``**: when a reviewer or other 3rd party makes non-trivial
128   suggestions for how to change a patch, it is good practice to credit them
129   by including a ``Suggested-by`` tag.
130
131Subsystem maintainer requirements
132~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
133
134When a subsystem maintainer accepts a patch from a contributor, in addition to
135the normal code review points, they are expected to validate the presence of
136suitable ``Signed-off-by`` tags.
137
138At the time they queue the patch in their subsystem tree, the maintainer
139**must** also then add their own ``Signed-off-by`` to indicate that they have
140done the aforementioned validation. This is in addition to any of their own
141``Reviewed-by`` tags the subsystem maintainer may wish to include.
142
143When the maintainer modifies the patch after pulling into their tree, they
144should record their contribution.  This is typically done via a note in the
145commit message, just prior to the maintainer's ``Signed-off-by``::
146
147    Signed-off-by: Cory Contributor <cory.contributor@example.com>
148    [Comment rephrased for clarity]
149    Signed-off-by: Mary Maintainer <mary.maintainer@mycorp.test>
150
151
152Tools for adding ``Signed-off-by``
153~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
154
155There are a variety of ways tools can support adding ``Signed-off-by`` tags
156for patches, avoiding the need for contributors to manually type in this
157repetitive text each time.
158
159git commands
160^^^^^^^^^^^^
161
162When creating, or amending, a commit the ``-s`` flag to ``git commit`` will
163append a suitable line matching the configured git author details.
164
165If preparing patches using the ``git format-patch`` tool, the ``-s`` flag can
166be used to append a suitable line in the emails it creates, without modifying
167the local commits. Alternatively to modify all the local commits on a branch::
168
169  git rebase master -x 'git commit --amend --no-edit -s'
170
171emacs
172^^^^^
173
174In the file ``$HOME/.emacs.d/abbrev_defs`` add:
175
176.. code:: elisp
177
178  (define-abbrev-table 'global-abbrev-table
179    '(
180      ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1)
181      ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1)
182      ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1)
183      ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1)
184     ))
185
186with this change, if you type (for example) ``8rev`` followed by ``<space>``
187or ``<enter>`` it will expand to the whole phrase.
188
189vim
190^^^
191
192In the file ``$HOME/.vimrc`` add::
193
194  iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr>
195  iabbrev 8ack Acked-by: YOUR NAME <your@email.addr>
196  iabbrev 8test Tested-by: YOUR NAME <your@email.addr>
197  iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr>
198
199with this change, if you type (for example) ``8rev`` followed by ``<space>``
200or ``<enter>`` it will expand to the whole phrase.
201
202Re-starting abandoned work
203~~~~~~~~~~~~~~~~~~~~~~~~~~
204
205For a variety of reasons there are some patches that get submitted to QEMU but
206never merged. An unrelated contributor may decide (months or years later) to
207continue working from the abandoned patch and re-submit it with extra changes.
208
209The general principles when picking up abandoned work are:
210
211 * Continue to credit the original author for their work, by maintaining their
212   original ``Signed-off-by``
213 * Indicate where the original patch was obtained from (mailing list, bug
214   tracker, author's git repo, etc) when sending it for review
215 * Acknowledge the extra work of the new contributor by including their
216   ``Signed-off-by`` in the patch in addition to the orignal author's
217 * Indicate who is responsible for what parts of the patch. This is typically
218   done via a note in the commit message, just prior to the new contributor's
219   ``Signed-off-by``::
220
221    Signed-off-by: Some Person <some.person@example.com>
222    [Rebased and added support for 'foo']
223    Signed-off-by: New Person <new.person@mycorp.test>
224
225In complicated cases, or if otherwise unsure, ask for advice on the project
226mailing list.
227
228It is also recommended to attempt to contact the original author to let them
229know you are interested in taking over their work, in case they still intended
230to return to the work, or had any suggestions about the best way to continue.
231
232Inclusion of generated files
233~~~~~~~~~~~~~~~~~~~~~~~~~~~~
234
235Files in patches contributed to QEMU are generally expected to be provided
236only in the preferred format for making modifications. The implication of
237this is that the output of code generators or compilers is usually not
238appropriate to contribute to QEMU.
239
240For reasons of practicality there are some exceptions to this rule, where
241generated code is permitted, provided it is also accompanied by the
242corresponding preferred source format. This is done where it is impractical
243to expect those building QEMU to run the code generation or compilation
244process. A non-exhaustive list of examples is:
245
246 * Images: where an bitmap image is created from a vector file it is common
247   to include the rendered bitmaps at desired resolution(s), since subtle
248   changes in the rasterization process / tools may affect quality. The
249   original vector file is expected to accompany any generated bitmaps.
250
251 * Firmware: QEMU includes pre-compiled binary ROMs for a variety of guest
252   firmwares. When such binary ROMs are contributed, the corresponding source
253   must also be provided, either directly, or through a git submodule link.
254
255 * Dockerfiles: the majority of the dockerfiles are automatically generated
256   from a canonical list of build dependencies maintained in tree, together
257   with the libvirt-ci git submodule link. The generated dockerfiles are
258   included in tree because it is desirable to be able to directly build
259   container images from a clean git checkout.
260
261 * eBPF: QEMU includes some generated eBPF machine code, since the required
262   eBPF compilation tools are not broadly available on all targetted OS
263   distributions. The corresponding eBPF C code for the binary is also
264   provided. This is a time-limited exception until the eBPF toolchain is
265   sufficiently broadly available in distros.
266
267In all cases above, the existence of generated files must be acknowledged
268and justified in the commit that introduces them.
269
270Tools which perform changes to existing code with deterministic algorithmic
271manipulation, driven by user specified inputs, are not generally considered
272to be "generators".
273
274For instance, using Coccinelle to convert code from one pattern to another
275pattern, or fixing documentation typos with a spell checker, or transforming
276code using sed / awk / etc, are not considered to be acts of code
277generation. Where an automated manipulation is performed on code, however,
278this should be declared in the commit message.
279
280At times contributors may use or create scripts/tools to generate an initial
281boilerplate code template which is then filled in to produce the final patch.
282The output of such a tool would still be considered the "preferred format",
283since it is intended to be a foundation for further human authored changes.
284Such tools are acceptable to use, provided there is clearly defined copyright
285and licensing for their output. Note in particular the caveats applying to AI
286content generators below.
287
288Use of AI content generators
289~~~~~~~~~~~~~~~~~~~~~~~~~~~~
290
291TL;DR:
292
293  **Current QEMU project policy is to DECLINE any contributions which are
294  believed to include or derive from AI generated content. This includes
295  ChatGPT, Claude, Copilot, Llama and similar tools.**
296
297The increasing prevalence of AI-assisted software development results in a
298number of difficult legal questions and risks for software projects, including
299QEMU.  Of particular concern is content generated by `Large Language Models
300<https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
301
302The QEMU community requires that contributors certify their patch submissions
303are made in accordance with the rules of the `Developer's Certificate of
304Origin (DCO) <dco>`.
305
306To satisfy the DCO, the patch contributor has to fully understand the
307copyright and license status of content they are contributing to QEMU. With AI
308content generators, the copyright and license status of the output is
309ill-defined with no generally accepted, settled legal foundation.
310
311Where the training material is known, it is common for it to include large
312volumes of material under restrictive licensing/copyright terms. Even where
313the training material is all known to be under open source licenses, it is
314likely to be under a variety of terms, not all of which will be compatible
315with QEMU's licensing requirements.
316
317How contributors could comply with DCO terms (b) or (c) for the output of AI
318content generators commonly available today is unclear.  The QEMU project is
319not willing or able to accept the legal risks of non-compliance.
320
321The QEMU project thus requires that contributors refrain from using AI content
322generators on patches intended to be submitted to the project, and will
323decline any contribution if use of AI is either known or suspected.
324
325This policy does not apply to other uses of AI, such as researching APIs or
326algorithms, static analysis, or debugging, provided their output is not to be
327included in contributions.
328
329Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
330ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
331generation agents which are built on top of such tools.
332
333This policy may evolve as AI tools mature and the legal situation is
334clarifed. In the meanwhile, requests for exceptions to this policy will be
335evaluated by the QEMU project on a case by case basis. To be granted an
336exception, a contributor will need to demonstrate clarity of the license and
337copyright status for the tool's output in relation to its training model and
338code, to the satisfaction of the project maintainers.
339