.. _code-provenance:

Code provenance
===============

Certifying patch submissions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The QEMU community **mandates** all contributors to certify provenance of
patch submissions they make to the project. To put it another way,
contributors must indicate that they are legally permitted to contribute to
the project.

Certification is achieved with a low overhead by adding a single line to the
bottom of every git commit::

   Signed-off-by: YOUR NAME <YOUR@EMAIL>

The addition of this line asserts that the author of the patch is contributing
in accordance with the clauses specified in the
`Developer's Certificate of Origin <https://developercertificate.org>`__:

.. _dco:

  Developer's Certificate of Origin 1.1

  By making a contribution to this project, I certify that:

  (a) The contribution was created in whole or in part by me and I
      have the right to submit it under the open source license
      indicated in the file; or

  (b) The contribution is based upon previous work that, to the best
      of my knowledge, is covered under an appropriate open source
      license and I have the right under that license to submit that
      work with modifications, whether created in whole or in part
      by me, under the same open source license (unless I am
      permitted to submit under a different license), as indicated
      in the file; or

  (c) The contribution was provided directly to me by some other
      person who certified (a), (b) or (c) and I have not modified
      it.

  (d) I understand and agree that this project and the contribution
      are public and that a record of the contribution (including all
      personal information I submit with it, including my sign-off) is
      maintained indefinitely and may be redistributed consistent with
      this project or the open source license(s) involved.

The name used with "Signed-off-by" does not need to be your legal name, nor
birth name, nor appear on any government ID. It is the identity you choose to
be known by in the community, but should not be anonymous, nor misrepresent
whom you are.

It is generally expected that the name and email addresses used in one of the
``Signed-off-by`` lines, matches that of the git commit ``Author`` field.
It's okay if you subscribe or contribute to the list via more than one
address, but using multiple addresses in one commit just confuses
things.

If the person sending the mail is not one of the patch authors, they are
nonetheless expected to add their own ``Signed-off-by`` to comply with the
DCO clause (c).

Multiple authorship
~~~~~~~~~~~~~~~~~~~

It is not uncommon for a patch to have contributions from multiple authors. In
this scenario, git commits will usually be expected to have a ``Signed-off-by``
line for each contributor involved in creation of the patch. Some edge cases:

  * The non-primary author's contributions were so trivial that they can be
    considered not subject to copyright. In this case the secondary authors
    need not include a ``Signed-off-by``.

    This case most commonly applies where QEMU reviewers give short snippets
    of code as suggested fixes to a patch. The reviewers don't need to have
    their own ``Signed-off-by`` added unless their code suggestion was
    unusually large, but it is common to add ``Suggested-by`` as a credit
    for non-trivial code.

  * Both contributors work for the same employer and the employer requires
    copyright assignment.

    It can be said that in this case a ``Signed-off-by`` is indicating that
    the person has permission to contribute from their employer who is the
    copyright holder. It is nonetheless still preferable to include a
    ``Signed-off-by`` for each contributor, as in some countries employees are
    not able to assign copyright to their employer, and it also covers any
    time invested outside working hours.

When multiple ``Signed-off-by`` tags are present, they should be strictly kept
in order of authorship, from oldest to newest.

Other commit tags
~~~~~~~~~~~~~~~~~

While the ``Signed-off-by`` tag is mandatory, there are a number of other tags
that are commonly used during QEMU development:

 * **``Reviewed-by``**: when a QEMU community member reviews a patch on the
   mailing list, if they consider the patch acceptable, they should send an
   email reply containing a ``Reviewed-by`` tag. Subsystem maintainers who
   review a patch should add this even if they are also adding their
   ``Signed-off-by`` to the same commit.

 * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch that
   touches their subsystem, but intends to allow a different maintainer to
   queue it and send a pull request, they would send a mail containing a
   ``Acked-by`` tag. Where a patch touches multiple subsystems, ``Acked-by``
   only implies review of the maintainers' own areas of responsibility. If a
   maintainer wants to indicate they have done a full review they should use
   a ``Reviewed-by`` tag.

 * **``Tested-by``**: when a QEMU community member has functionally tested the
   behaviour of the patch in some manner, they should send an email reply
   containing a ``Tested-by`` tag.

 * **``Reported-by``**: when a QEMU community member reports a problem via the
   mailing list, or some other informal channel that is not the issue tracker,
   it is good practice to credit them by including a ``Reported-by`` tag on
   any patch fixing the issue. When the problem is reported via the GitLab
   issue tracker, however, it is sufficient to just include a link to the
   issue.

 * **``Suggested-by``**: when a reviewer or other 3rd party makes non-trivial
   suggestions for how to change a patch, it is good practice to credit them
   by including a ``Suggested-by`` tag.

Subsystem maintainer requirements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When a subsystem maintainer accepts a patch from a contributor, in addition to
the normal code review points, they are expected to validate the presence of
suitable ``Signed-off-by`` tags.

At the time they queue the patch in their subsystem tree, the maintainer
**must** also then add their own ``Signed-off-by`` to indicate that they have
done the aforementioned validation. This is in addition to any of their own
``Reviewed-by`` tags the subsystem maintainer may wish to include.

When the maintainer modifies the patch after pulling into their tree, they
should record their contribution.  This is typically done via a note in the
commit message, just prior to the maintainer's ``Signed-off-by``::

    Signed-off-by: Cory Contributor <cory.contributor@example.com>
    [Comment rephrased for clarity]
    Signed-off-by: Mary Maintainer <mary.maintainer@mycorp.test>


Tools for adding ``Signed-off-by``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are a variety of ways tools can support adding ``Signed-off-by`` tags
for patches, avoiding the need for contributors to manually type in this
repetitive text each time.

git commands
^^^^^^^^^^^^

When creating, or amending, a commit the ``-s`` flag to ``git commit`` will
append a suitable line matching the configured git author details.

If preparing patches using the ``git format-patch`` tool, the ``-s`` flag can
be used to append a suitable line in the emails it creates, without modifying
the local commits. Alternatively to modify all the local commits on a branch::

  git rebase master -x 'git commit --amend --no-edit -s'

emacs
^^^^^

In the file ``$HOME/.emacs.d/abbrev_defs`` add:

.. code:: elisp

  (define-abbrev-table 'global-abbrev-table
    '(
      ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1)
      ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1)
      ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1)
      ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1)
     ))

with this change, if you type (for example) ``8rev`` followed by ``<space>``
or ``<enter>`` it will expand to the whole phrase.

vim
^^^

In the file ``$HOME/.vimrc`` add::

  iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr>
  iabbrev 8ack Acked-by: YOUR NAME <your@email.addr>
  iabbrev 8test Tested-by: YOUR NAME <your@email.addr>
  iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr>

with this change, if you type (for example) ``8rev`` followed by ``<space>``
or ``<enter>`` it will expand to the whole phrase.

Re-starting abandoned work
~~~~~~~~~~~~~~~~~~~~~~~~~~

For a variety of reasons there are some patches that get submitted to QEMU but
never merged. An unrelated contributor may decide (months or years later) to
continue working from the abandoned patch and re-submit it with extra changes.

The general principles when picking up abandoned work are:

 * Continue to credit the original author for their work, by maintaining their
   original ``Signed-off-by``
 * Indicate where the original patch was obtained from (mailing list, bug
   tracker, author's git repo, etc) when sending it for review
 * Acknowledge the extra work of the new contributor by including their
   ``Signed-off-by`` in the patch in addition to the orignal author's
 * Indicate who is responsible for what parts of the patch. This is typically
   done via a note in the commit message, just prior to the new contributor's
   ``Signed-off-by``::

    Signed-off-by: Some Person <some.person@example.com>
    [Rebased and added support for 'foo']
    Signed-off-by: New Person <new.person@mycorp.test>

In complicated cases, or if otherwise unsure, ask for advice on the project
mailing list.

It is also recommended to attempt to contact the original author to let them
know you are interested in taking over their work, in case they still intended
to return to the work, or had any suggestions about the best way to continue.

Inclusion of generated files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Files in patches contributed to QEMU are generally expected to be provided
only in the preferred format for making modifications. The implication of
this is that the output of code generators or compilers is usually not
appropriate to contribute to QEMU.

For reasons of practicality there are some exceptions to this rule, where
generated code is permitted, provided it is also accompanied by the
corresponding preferred source format. This is done where it is impractical
to expect those building QEMU to run the code generation or compilation
process. A non-exhaustive list of examples is:

 * Images: where an bitmap image is created from a vector file it is common
   to include the rendered bitmaps at desired resolution(s), since subtle
   changes in the rasterization process / tools may affect quality. The
   original vector file is expected to accompany any generated bitmaps.

 * Firmware: QEMU includes pre-compiled binary ROMs for a variety of guest
   firmwares. When such binary ROMs are contributed, the corresponding source
   must also be provided, either directly, or through a git submodule link.

 * Dockerfiles: the majority of the dockerfiles are automatically generated
   from a canonical list of build dependencies maintained in tree, together
   with the libvirt-ci git submodule link. The generated dockerfiles are
   included in tree because it is desirable to be able to directly build
   container images from a clean git checkout.

 * eBPF: QEMU includes some generated eBPF machine code, since the required
   eBPF compilation tools are not broadly available on all targetted OS
   distributions. The corresponding eBPF C code for the binary is also
   provided. This is a time-limited exception until the eBPF toolchain is
   sufficiently broadly available in distros.

In all cases above, the existence of generated files must be acknowledged
and justified in the commit that introduces them.

Tools which perform changes to existing code with deterministic algorithmic
manipulation, driven by user specified inputs, are not generally considered
to be "generators".

For instance, using Coccinelle to convert code from one pattern to another
pattern, or fixing documentation typos with a spell checker, or transforming
code using sed / awk / etc, are not considered to be acts of code
generation. Where an automated manipulation is performed on code, however,
this should be declared in the commit message.

At times contributors may use or create scripts/tools to generate an initial
boilerplate code template which is then filled in to produce the final patch.
The output of such a tool would still be considered the "preferred format",
since it is intended to be a foundation for further human authored changes.
Such tools are acceptable to use, provided there is clearly defined copyright
and licensing for their output. Note in particular the caveats applying to AI
content generators below.

Use of AI content generators
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TL;DR:

  **Current QEMU project policy is to DECLINE any contributions which are
  believed to include or derive from AI generated content. This includes
  ChatGPT, Claude, Copilot, Llama and similar tools.**

The increasing prevalence of AI-assisted software development results in a
number of difficult legal questions and risks for software projects, including
QEMU.  Of particular concern is content generated by `Large Language Models
<https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).

The QEMU community requires that contributors certify their patch submissions
are made in accordance with the rules of the `Developer's Certificate of
Origin (DCO) <dco>`.

To satisfy the DCO, the patch contributor has to fully understand the
copyright and license status of content they are contributing to QEMU. With AI
content generators, the copyright and license status of the output is
ill-defined with no generally accepted, settled legal foundation.

Where the training material is known, it is common for it to include large
volumes of material under restrictive licensing/copyright terms. Even where
the training material is all known to be under open source licenses, it is
likely to be under a variety of terms, not all of which will be compatible
with QEMU's licensing requirements.

How contributors could comply with DCO terms (b) or (c) for the output of AI
content generators commonly available today is unclear.  The QEMU project is
not willing or able to accept the legal risks of non-compliance.

The QEMU project thus requires that contributors refrain from using AI content
generators on patches intended to be submitted to the project, and will
decline any contribution if use of AI is either known or suspected.

This policy does not apply to other uses of AI, such as researching APIs or
algorithms, static analysis, or debugging, provided their output is not to be
included in contributions.

Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
generation agents which are built on top of such tools.

This policy may evolve as AI tools mature and the legal situation is
clarifed. In the meanwhile, requests for exceptions to this policy will be
evaluated by the QEMU project on a case by case basis. To be granted an
exception, a contributor will need to demonstrate clarity of the license and
copyright status for the tool's output in relation to its training model and
code, to the satisfaction of the project maintainers.