1.. _code-provenance: 2 3Code provenance 4=============== 5 6Certifying patch submissions 7~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 8 9The QEMU community **mandates** all contributors to certify provenance of 10patch submissions they make to the project. To put it another way, 11contributors must indicate that they are legally permitted to contribute to 12the project. 13 14Certification is achieved with a low overhead by adding a single line to the 15bottom of every git commit:: 16 17 Signed-off-by: YOUR NAME <YOUR@EMAIL> 18 19The addition of this line asserts that the author of the patch is contributing 20in accordance with the clauses specified in the 21`Developer's Certificate of Origin <https://developercertificate.org>`__: 22 23.. _dco: 24 25 Developer's Certificate of Origin 1.1 26 27 By making a contribution to this project, I certify that: 28 29 (a) The contribution was created in whole or in part by me and I 30 have the right to submit it under the open source license 31 indicated in the file; or 32 33 (b) The contribution is based upon previous work that, to the best 34 of my knowledge, is covered under an appropriate open source 35 license and I have the right under that license to submit that 36 work with modifications, whether created in whole or in part 37 by me, under the same open source license (unless I am 38 permitted to submit under a different license), as indicated 39 in the file; or 40 41 (c) The contribution was provided directly to me by some other 42 person who certified (a), (b) or (c) and I have not modified 43 it. 44 45 (d) I understand and agree that this project and the contribution 46 are public and that a record of the contribution (including all 47 personal information I submit with it, including my sign-off) is 48 maintained indefinitely and may be redistributed consistent with 49 this project or the open source license(s) involved. 50 51The name used with "Signed-off-by" does not need to be your legal name, nor 52birth name, nor appear on any government ID. It is the identity you choose to 53be known by in the community, but should not be anonymous, nor misrepresent 54whom you are. 55 56It is generally expected that the name and email addresses used in one of the 57``Signed-off-by`` lines, matches that of the git commit ``Author`` field. 58It's okay if you subscribe or contribute to the list via more than one 59address, but using multiple addresses in one commit just confuses 60things. 61 62If the person sending the mail is not one of the patch authors, they are 63nonetheless expected to add their own ``Signed-off-by`` to comply with the 64DCO clause (c). 65 66Multiple authorship 67~~~~~~~~~~~~~~~~~~~ 68 69It is not uncommon for a patch to have contributions from multiple authors. In 70this scenario, git commits will usually be expected to have a ``Signed-off-by`` 71line for each contributor involved in creation of the patch. Some edge cases: 72 73 * The non-primary author's contributions were so trivial that they can be 74 considered not subject to copyright. In this case the secondary authors 75 need not include a ``Signed-off-by``. 76 77 This case most commonly applies where QEMU reviewers give short snippets 78 of code as suggested fixes to a patch. The reviewers don't need to have 79 their own ``Signed-off-by`` added unless their code suggestion was 80 unusually large, but it is common to add ``Suggested-by`` as a credit 81 for non-trivial code. 82 83 * Both contributors work for the same employer and the employer requires 84 copyright assignment. 85 86 It can be said that in this case a ``Signed-off-by`` is indicating that 87 the person has permission to contribute from their employer who is the 88 copyright holder. It is nonetheless still preferable to include a 89 ``Signed-off-by`` for each contributor, as in some countries employees are 90 not able to assign copyright to their employer, and it also covers any 91 time invested outside working hours. 92 93When multiple ``Signed-off-by`` tags are present, they should be strictly kept 94in order of authorship, from oldest to newest. 95 96Other commit tags 97~~~~~~~~~~~~~~~~~ 98 99While the ``Signed-off-by`` tag is mandatory, there are a number of other tags 100that are commonly used during QEMU development: 101 102 * **``Reviewed-by``**: when a QEMU community member reviews a patch on the 103 mailing list, if they consider the patch acceptable, they should send an 104 email reply containing a ``Reviewed-by`` tag. Subsystem maintainers who 105 review a patch should add this even if they are also adding their 106 ``Signed-off-by`` to the same commit. 107 108 * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch that 109 touches their subsystem, but intends to allow a different maintainer to 110 queue it and send a pull request, they would send a mail containing a 111 ``Acked-by`` tag. Where a patch touches multiple subsystems, ``Acked-by`` 112 only implies review of the maintainers' own areas of responsibility. If a 113 maintainer wants to indicate they have done a full review they should use 114 a ``Reviewed-by`` tag. 115 116 * **``Tested-by``**: when a QEMU community member has functionally tested the 117 behaviour of the patch in some manner, they should send an email reply 118 containing a ``Tested-by`` tag. 119 120 * **``Reported-by``**: when a QEMU community member reports a problem via the 121 mailing list, or some other informal channel that is not the issue tracker, 122 it is good practice to credit them by including a ``Reported-by`` tag on 123 any patch fixing the issue. When the problem is reported via the GitLab 124 issue tracker, however, it is sufficient to just include a link to the 125 issue. 126 127 * **``Suggested-by``**: when a reviewer or other 3rd party makes non-trivial 128 suggestions for how to change a patch, it is good practice to credit them 129 by including a ``Suggested-by`` tag. 130 131Subsystem maintainer requirements 132~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 133 134When a subsystem maintainer accepts a patch from a contributor, in addition to 135the normal code review points, they are expected to validate the presence of 136suitable ``Signed-off-by`` tags. 137 138At the time they queue the patch in their subsystem tree, the maintainer 139**must** also then add their own ``Signed-off-by`` to indicate that they have 140done the aforementioned validation. This is in addition to any of their own 141``Reviewed-by`` tags the subsystem maintainer may wish to include. 142 143When the maintainer modifies the patch after pulling into their tree, they 144should record their contribution. This is typically done via a note in the 145commit message, just prior to the maintainer's ``Signed-off-by``:: 146 147 Signed-off-by: Cory Contributor <cory.contributor@example.com> 148 [Comment rephrased for clarity] 149 Signed-off-by: Mary Maintainer <mary.maintainer@mycorp.test> 150 151 152Tools for adding ``Signed-off-by`` 153~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 154 155There are a variety of ways tools can support adding ``Signed-off-by`` tags 156for patches, avoiding the need for contributors to manually type in this 157repetitive text each time. 158 159git commands 160^^^^^^^^^^^^ 161 162When creating, or amending, a commit the ``-s`` flag to ``git commit`` will 163append a suitable line matching the configured git author details. 164 165If preparing patches using the ``git format-patch`` tool, the ``-s`` flag can 166be used to append a suitable line in the emails it creates, without modifying 167the local commits. Alternatively to modify all the local commits on a branch:: 168 169 git rebase master -x 'git commit --amend --no-edit -s' 170 171emacs 172^^^^^ 173 174In the file ``$HOME/.emacs.d/abbrev_defs`` add: 175 176.. code:: elisp 177 178 (define-abbrev-table 'global-abbrev-table 179 '( 180 ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1) 181 ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1) 182 ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1) 183 ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1) 184 )) 185 186with this change, if you type (for example) ``8rev`` followed by ``<space>`` 187or ``<enter>`` it will expand to the whole phrase. 188 189vim 190^^^ 191 192In the file ``$HOME/.vimrc`` add:: 193 194 iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr> 195 iabbrev 8ack Acked-by: YOUR NAME <your@email.addr> 196 iabbrev 8test Tested-by: YOUR NAME <your@email.addr> 197 iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr> 198 199with this change, if you type (for example) ``8rev`` followed by ``<space>`` 200or ``<enter>`` it will expand to the whole phrase. 201 202Re-starting abandoned work 203~~~~~~~~~~~~~~~~~~~~~~~~~~ 204 205For a variety of reasons there are some patches that get submitted to QEMU but 206never merged. An unrelated contributor may decide (months or years later) to 207continue working from the abandoned patch and re-submit it with extra changes. 208 209The general principles when picking up abandoned work are: 210 211 * Continue to credit the original author for their work, by maintaining their 212 original ``Signed-off-by`` 213 * Indicate where the original patch was obtained from (mailing list, bug 214 tracker, author's git repo, etc) when sending it for review 215 * Acknowledge the extra work of the new contributor by including their 216 ``Signed-off-by`` in the patch in addition to the orignal author's 217 * Indicate who is responsible for what parts of the patch. This is typically 218 done via a note in the commit message, just prior to the new contributor's 219 ``Signed-off-by``:: 220 221 Signed-off-by: Some Person <some.person@example.com> 222 [Rebased and added support for 'foo'] 223 Signed-off-by: New Person <new.person@mycorp.test> 224 225In complicated cases, or if otherwise unsure, ask for advice on the project 226mailing list. 227 228It is also recommended to attempt to contact the original author to let them 229know you are interested in taking over their work, in case they still intended 230to return to the work, or had any suggestions about the best way to continue. 231 232Inclusion of generated files 233~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 234 235Files in patches contributed to QEMU are generally expected to be provided 236only in the preferred format for making modifications. The implication of 237this is that the output of code generators or compilers is usually not 238appropriate to contribute to QEMU. 239 240For reasons of practicality there are some exceptions to this rule, where 241generated code is permitted, provided it is also accompanied by the 242corresponding preferred source format. This is done where it is impractical 243to expect those building QEMU to run the code generation or compilation 244process. A non-exhaustive list of examples is: 245 246 * Images: where an bitmap image is created from a vector file it is common 247 to include the rendered bitmaps at desired resolution(s), since subtle 248 changes in the rasterization process / tools may affect quality. The 249 original vector file is expected to accompany any generated bitmaps. 250 251 * Firmware: QEMU includes pre-compiled binary ROMs for a variety of guest 252 firmwares. When such binary ROMs are contributed, the corresponding source 253 must also be provided, either directly, or through a git submodule link. 254 255 * Dockerfiles: the majority of the dockerfiles are automatically generated 256 from a canonical list of build dependencies maintained in tree, together 257 with the libvirt-ci git submodule link. The generated dockerfiles are 258 included in tree because it is desirable to be able to directly build 259 container images from a clean git checkout. 260 261 * eBPF: QEMU includes some generated eBPF machine code, since the required 262 eBPF compilation tools are not broadly available on all targetted OS 263 distributions. The corresponding eBPF C code for the binary is also 264 provided. This is a time-limited exception until the eBPF toolchain is 265 sufficiently broadly available in distros. 266 267In all cases above, the existence of generated files must be acknowledged 268and justified in the commit that introduces them. 269 270Tools which perform changes to existing code with deterministic algorithmic 271manipulation, driven by user specified inputs, are not generally considered 272to be "generators". 273 274For instance, using Coccinelle to convert code from one pattern to another 275pattern, or fixing documentation typos with a spell checker, or transforming 276code using sed / awk / etc, are not considered to be acts of code 277generation. Where an automated manipulation is performed on code, however, 278this should be declared in the commit message. 279 280At times contributors may use or create scripts/tools to generate an initial 281boilerplate code template which is then filled in to produce the final patch. 282The output of such a tool would still be considered the "preferred format", 283since it is intended to be a foundation for further human authored changes. 284Such tools are acceptable to use, provided there is clearly defined copyright 285and licensing for their output. Note in particular the caveats applying to AI 286content generators below. 287 288Use of AI content generators 289~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 290 291TL;DR: 292 293 **Current QEMU project policy is to DECLINE any contributions which are 294 believed to include or derive from AI generated content. This includes 295 ChatGPT, Claude, Copilot, Llama and similar tools.** 296 297The increasing prevalence of AI-assisted software development results in a 298number of difficult legal questions and risks for software projects, including 299QEMU. Of particular concern is content generated by `Large Language Models 300<https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs). 301 302The QEMU community requires that contributors certify their patch submissions 303are made in accordance with the rules of the `Developer's Certificate of 304Origin (DCO) <dco>`. 305 306To satisfy the DCO, the patch contributor has to fully understand the 307copyright and license status of content they are contributing to QEMU. With AI 308content generators, the copyright and license status of the output is 309ill-defined with no generally accepted, settled legal foundation. 310 311Where the training material is known, it is common for it to include large 312volumes of material under restrictive licensing/copyright terms. Even where 313the training material is all known to be under open source licenses, it is 314likely to be under a variety of terms, not all of which will be compatible 315with QEMU's licensing requirements. 316 317How contributors could comply with DCO terms (b) or (c) for the output of AI 318content generators commonly available today is unclear. The QEMU project is 319not willing or able to accept the legal risks of non-compliance. 320 321The QEMU project thus requires that contributors refrain from using AI content 322generators on patches intended to be submitted to the project, and will 323decline any contribution if use of AI is either known or suspected. 324 325This policy does not apply to other uses of AI, such as researching APIs or 326algorithms, static analysis, or debugging, provided their output is not to be 327included in contributions. 328 329Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's 330ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content 331generation agents which are built on top of such tools. 332 333This policy may evolve as AI tools mature and the legal situation is 334clarifed. In the meanwhile, requests for exceptions to this policy will be 335evaluated by the QEMU project on a case by case basis. To be granted an 336exception, a contributor will need to demonstrate clarity of the license and 337copyright status for the tool's output in relation to its training model and 338code, to the satisfaction of the project maintainers. 339