So you want to use AI-generated code in your software or maybe your developers already are using it. Is it too risky? Large language model technology is progressing at rapid speeds, and policy makers are ill-equipped to catch up quickly. Anything resembling legal clarity may take years to come about. Some organizations are deciding not to use AI at all for code generation, while others are using it cautiously — but everyone has questions.
While this is a global issue, this discussion will focus on activity in the United States as this is where most companies providing generative AI are based and therefore where most legal challenges are taking place. However the U.S settles, it may settle very differently in other jurisdictions. We are not lawyers, in the U.S. or anywhere else, and none of this should be construed as legal advice. Likewise, this blog is only a snapshot of where things are at this specific moment in time. The policies and legal precedents surrounding generative AI are still evolving and subject to all kinds of changes including unexpected twists and upsets.
With that out of the way, let’s get to the heart of the matter: two simple questions users of AI-generated code are asking and two complex and possibly unsatisfying answers.
If your developers are using AI-generated code, and that AI was trained on open source software, you are likely concerned that the generated code is sufficiently similar enough to open source software to require compliance with a license. If so, the worst-case scenario puts your project at risk of being subsumed under a GPL license. However, the best case scenario of simply requiring an attribution might not be much better. If it means tracking AI-generated code and identifying the open source code it’s similar to, that task has not yet been solved by commercial software aside from cumbersome plagiarism detectors with high false positive rates. It’s a tricky situation, and there isn’t much guidance on best practices yet.
Good news for users of GitHub Copilot, though. In late September Microsoft, GitHub’s parent company, made an announcement regarding copyright and code generated by their models. If you get sued because you used code generated by Copilot, Microsoft promises to pick up the bill if you used Copilot with the appropriate filters (like duplicate detection) turned on.
According to Microsoft, “As customers ask whether they can use Microsoft’s Copilot services and the output they generate without worrying about copyright claims, we are providing a straightforward answer: yes, you can, and if you are challenged on copyright grounds, we will assume responsibility for the potential legal risks involved.“
That doesn’t mean the law is guaranteed to settle on Microsoft’s side, but it does signal loudly that they’re confident they have a strong legal case. A lawsuit alleging Microsoft, GitHub, and OpenAI infringed on open source licenses and copyrights when training their models is working its way through the U.S. legal system and likely will be for some time. Microsoft argues that anyone has a right to look over public code on GitHub to understand and learn from it and even write similar — but not outright copied -– code, and that includes their models. OpenAI hasn’t promised to pay legal fees for its users, but if Microsoft’s argument holds up, it will be good news for OpenAI and its users too.
But what about your project as a whole? Can you copyright the work as a whole if you used AI-generated code within it? The U.S. Copyright Office has issued some guidance on the topic of AI-generated materials but cites no examples of AI-written (or partially written) software. At some point, a copyright application that involves software and AI-generated code will surely come to light, but for now we must consider general guidance and examples of other art forms and infer how their treatment might translate.
Whether or not a work can be given copyright protections seems to hinge on the degree to which AI-generated code is used and how much human involvement shapes the work.
The guidance contends there are cases in which “a work containing AI-generated material will also contain sufficient human authorship to support a copyright claim. For example, a human may select or arrange AI-generated material in a sufficiently creative way that ‘the resulting work as a whole constitutes an original work of authorship.’”
Moreover, a human can originally modify AI-generated material “to such a degree that the modifications meet the standard for copyright protection.”
The guidance goes on to say that “in these cases, copyright will only protect the human-authored aspects of the work, which are ‘independent of’ and do ‘not affect’ the copyright status of the AI-generated material itself.”
It’s clear that an entirely AI-generated program or library could not qualify for copyright status but what percentage of the work needs to be human-authored or modified in order to qualify is a subjective line that has yet to be tested. This brings us to one last important thing from the U.S. Copyright Office guidance: an application needs to disclose that AI-generated content was used. If you don’t disclose the use of AI, and it’s discovered later, the Office may cancel the copyright registration in its entirety.
For some applications, the U.S. Copyright Office is requiring disclosure of exactly which parts are AI-generated. For others, it seems an applicant can simply fill out the “author” field to say what content was created by the author and what was generated by AI.
Whether or not a copyright application for software would require explicit outlining of which parts are AI-generated will probably depend on the project and how big the AI-generated components are. It’s a low tech and imperfect solution, but it might be a good idea to tell your developers that if they use AI-generated code any larger than what would amount to a simple autocomplete, they should make a searchable comment or tag identifying that code. That information may come in handy later, whether it’s for submitting a copyright registration application or for removing and replacing AI-generated code to limit risk in the future.
So, should you use AI-generated code? It depends entirely on your risk posture. Happy coding and may the courts be forever in your favor.