Copyright law protects original works of authorship, consisting mostly of written, artistic, musical, and dramatic works, but also including software. But what's considered an original work? And how is authorship defined? In recent years, works generated by artificial intelligence (AI) programs have been forcing courts to revisit these questions.
As owners of AI are profiting, the owners of copyrighted works that these AI models are trained on are left to wonder what their rights are. Do they own a portion of the copyright interest in the work produced by the AI if the AI uses their work as training data? Should they be compensated by the owners of the AI?
A copyright is an intellectual property right that protects original works of authorship that have been "fixed in a tangible medium." It protects works like:
As soon as you create a work and put it in a tangible medium (for example, if it's a song, record it or if it's a poem, write it down), then that work is copyrighted. But you should still register your copyright. You can only sue others for copyright infringement if your copyrighted work is registered with the U.S. Copyright Office.
Copyright owners have the exclusive right to reproduce, distribute, perform, and display their copyrighted work. They can also create derivatives of their copyrighted work. A derivative work is a work based on preexisting copyrighted work that's been changed somehow. For example, the Harry Potter movies are derivative works of the Harry Potter books because the books have been adapted into motion pictures.
To qualify for copyright protection, your work needs to be:
(17 U.S.C. § 102 (2023).)
Copyrights can only last for a limited period. How long a copyright lasts depends on when it's created as well as a few other factors. (For more information, read about how long copyright protection lasts.)
AI models are computer programs that are trained on data to mimic an aspect of human behavior (like language). With larger AI, the data is usually scraped from the internet. The AI is given data from humans, and it learns patterns and rules from that set of training data. When it produces its work—a mimic of human behavior—the AI uses the information it learned from the human-created data.
For example, suppose you want an AI to create a song like Mozart—and thus mimic a particular human's behavior. The training data you'd give the AI could be all of the songs Mozart has written. The AI would then try to predict Mozart's compositions using the training data. As it gets more data, it becomes better at predicting how the songs sound until it can compose a song like Mozart himself.
AI is usually built by computer programmers and researchers who are employed by a company or university. The programmers and researchers take time to write the computer code and algorithms that run the AI.
Meanwhile, the training data that the AI is using to get better at its task usually includes copyrighted works that were grabbed from websites. The AI needs the training data just like it needs its computer code to complete its task. Both what the researchers and programmers provide and what the copyright owners provide are essential.
So, who owns the copyright of a work that AI generates? It would seem like there are three choices:
(To learn more about copyright ownership, read our article about who owns the rights to a copyright.)
If this were a regular case of copyright, then we could likely eliminate the programmers and researchers. Their work was probably commissioned by their employer as part of their normal work as an employee. In that case, their work was made for hire, and copyright law would recognize the employer as the true owner of any copyrighted work. (The employer would probably own the computer code that the programmers and researchers created.)
So, that leaves the employer and the copyright owners of the training data. Perhaps they both own the AI-generated work?
The answer to who owns the copyright in your typical AI-generated work is: none of the above. In fact, no one owns the AI-generated work. The work that the AI generates simply can't be copyrighted.
Let's go back to the requirements for a work to be copyrighted. You'll recall that one of the requirements is that the work must be created by a human. AI isn't human, so any work it produces can't be copyrighted. If its work can't be copyrighted, then no one—not the researchers, the researchers' employer, nor the copyright owners of the training data—can own the copyright.
The "human authorship" requirement isn't actually fleshed out in the U.S. Copyright Act alongside the other copyright requirements. The original drafters of the Copyright Act might not have seen a need to make the distinction. But as technology developed and questions were raised over the years, this qualification needed to be addressed.
In court cases dating back to the late 1800s, copyright law was determined to protect works "founded in the creative powers of the mind." (In re Trade-Mark Cases, 100 U.S. 82 (1879).)
More recently, courts and the U.S. Copyright Office have used prior courts' reasoning to define and expand on the Copyright Act. Specifically, the Copyright Office has determined that an original work of authorship means, among other things, that the work must be created by a human author. The Copyright Office has found that works created by nonhuman authors include:
In the same way, the Copyright Office has determined that works produced by computer programs (like AI) can't be copyrighted.
One of the most famous court cases about copyrighting a nonhuman's work is the court case revolving around the rights in a selfie picture taken by a monkey. In 2011, photographer David Slater gave a news agency some photographs that a macaque had taken of itself using Slater's camera.
In the years that followed, Slater argued (mostly unsuccessfully) that he had a copyright claim to the selfies because his work in gaining the trust of the monkeys and setting up the camera contributed to the creation of the selfie image. In response, the U.S. Copyright Office published an opinion that re-asserted that the Office wouldn't register works not created by a human author.
In an ironic twist, the organization People for the Ethical Treatment of Animals (PETA) sued Slater and said that he was infringing on the monkey's copyright to the selfie images by using the images in a book he'd published. The court backed the Copyright Office's opinion and said that works created by animals can't be copyrighted. This time, the law benefited Slater, and the case was dismissed.
The court case received international attention and would go on to serve as one of the more popular examples used against AI creating copyrighted works.
We've established that the works generated by AI aren't copyrightable, meaning no one owns the copyrights to those works. But that doesn't mean that there aren't any copyrights involved in AI-generated works. In fact, numerous copyrighted works are used to train AI models. The questions posed by the AI's use of these copyrighted works are:
Unfortunately, the answer is that it's still up for debate. Broadly, there are two sides to the argument:
To answer whether copyrighted works can be used to train AI without running afoul of intellectual property rights, we need to explore what the fair use doctrine is and what it says.
When a work is copyrighted, you can't use it without infringing on the owner's rights unless you have the owner's permission to use it or you have a defense against a claim of copyright infringement. One of the most common defenses is the fair use defense. Under the fair use doctrine, you can use a copyrighted work in certain circumstances.
A court will weigh four factors to determine whether your use is fair use:
Let's look at how each factor of the fair use doctrine can weigh in favor of and against fair use when it comes to training AI with copyrighted works.
The purpose and character of the copyrighted use. Some AI models are used by students and researchers for educational purposes, such as for a research paper in an academic journal. Other AI are marketed and sold to the public to use. If the AI is being used for educational purposes, then this factor weighs in favor of fair use. If the AI is being used for commercial purposes and profited off of, then this factor weighs against a fair use defense.
The nature of the copyrighted work. Some of the data used to train AI programs is factual and informational and comes from news sources, biographies, and research articles. Other training data is imaginative and fictional, such as poems, songs, and stories. If the training data is factual or informational, then this factor weighs in favor of fair use. If the data is imaginative or fictional, then this factor weighs against a fair use defense.
The amount of copyrighted work used. Programmers might use an entire book or song to train AI. Other programmers might instead use only a passage or line from a book or a few notes from a song. If the programmers are using only a short passage or line from a book or a couple of notes from a song, then this factor weighs in favor of fair use. If the programmers are using large portions of a copyrighted work, such as most or all of a book or song, then this factor weighs against a fair use defense.
The effect of the use on the copyrighted work's value. How this factor applies depends on whether someone's original work or talent is being replaced by the AI. If an AI can be trained to generate a song or image in the same style as a specific artist, then the value of that artist's copyrighted work would be harmed. If an AI is being trained to do a task unrelated to the copyright owner's work, then the copyrighted work probably isn't harmed. For instance, an AI trained to speak French as a chatbot on a website probably wouldn't hurt the value of a copyrighted book written in French that it used as training data.
Courts that consider and weigh these factors can determine whether the fair use defense applies. If it does, then the owners of the AI can use the copyrighted works. If it doesn't, then they can't without the copyright owners' permission.
In most cases, the creators of AI models haven't received permission from the copyright owners to use their work to create training data. Many copyright owners don't even know their work was used at all. The question then is whether the AI owners have a fair use defense against a copyright infringement claim.
If the AI creators have a fair use defense against copyright infringement, then there's no need to compensate the copyright owners of the training data. If the AI creators don't have a fair use defense, then the copyright owners should be compensated. Just as any other copyrighted work can be licensed (usually for a fee), the owners of the training data should be able to license out their copyrighted works.
As mentioned earlier, you can use an AI-generated work without infringing on the AI's or the AI owner's rights because the work can't be copyrighted. But what if the AI work in question has copied someone else's work?
For instance, suppose Ben publishes an AI-generated article that effectively copies (or "plagiarizes") an article written by Gwen Tennyson. Ben puts his name as the author of the AI-generated article without getting any permission from Gwen. Does Gwen, as the author of the article that the AI plagiarized, have any rights? Is Ben responsible for the AI's plagiarism?
If the work that's been copied is copyrighted, then the author of the copyrighted work has enforceable legal rights in their work just as if you had directly copied from them. You're responsible—whether you wrote the article or not—for making sure that the works that you put your name on don't violate anyone else's copyrights.
Going back to our example, if Gwen's work is copyrighted, then she would have copyright protections. Ben can't use Gwen's work without her permission regardless of whether he produced the work that copied her. As long as he's distributing the work without her permission, he's infringing on her copyright.
If you believe your copyrighted work has been used illegally, you should talk to a copyright lawyer. They can help you determine whether you have a valid claim of copyright infringement and what your next steps should be. If you've built an AI program using data that's copyrighted, you should consider speaking with a copyright attorney as well. They can help you determine whether your use counts as fair use and how you should legally treat the training data.
Be cautious about taking AI-generated work and putting your name on it. The AI program could've copied parts of its work from others. While you can't infringe on the AI owner's copyright, you can infringe on others' copyrights. If you're unsure about what you can use and how you can use it, consult a copyright lawyer. They can give you guidance on how to avoid copyright infringement claims.
Need a lawyer? Start here.