How Copyright Law Treats AI

You might’ve heard of ChatGPT or seen an image online created by AI. But who gets credit for the work AI creates? Learn whether AI-generated work can be copyrighted and what rights copyright owners have when their work is used to train AI.

By , Attorney · University of North Carolina School of Law

Copyright law protects original works of authorship, consisting mostly of written, artistic, musical, and dramatic works, but also including software. But what's considered an original work? And how is authorship defined? In recent years, works generated by artificial intelligence (AI) programs have been forcing courts to revisit these questions.

As owners of AI are profiting, the owners of copyrighted works that these AI models are trained on are left to wonder what their rights are. Do they own a portion of the copyright interest in the work produced by the AI if the AI uses their work as training data? Should they be compensated by the owners of the AI?

A copyright is an intellectual property right that protects original works of authorship that have been "fixed in a tangible medium." It protects works like:

As soon as you create a work and put it in a tangible medium (for example, if it's a song, record it or if it's a poem, write it down), then that work is copyrighted. But you should still register your copyright. You can only sue others for copyright infringement if your copyrighted work is registered with the U.S. Copyright Office.

Copyright owners have the exclusive right to reproduce, distribute, perform, and display their copyrighted work. They can also create derivatives of their copyrighted work. A derivative work is a work based on preexisting copyrighted work that's been changed somehow. For example, the Harry Potter movies are derivative works of the Harry Potter books because the books have been adapted into motion pictures.

To qualify for copyright protection, your work needs to be:

  • Your work needs to be in a medium that can be reproduced.
  • Authored by a human. A human—not an animal, machine, or supernatural being—must have created the work.
  • Copyrightable subject matter. It must fall into a category of work that can be copyrighted. These categories include artistic, dramatic, and musical works as well as computer code.
  • You need to have created the work independently without copying another's work.
  • Your work must show at least a minimal degree of creativity.

(17 U.S.C. § 102 (2023).)

Copyrights can only last for a limited period. How long a copyright lasts depends on when it's created as well as a few other factors. (For more information, read about how long copyright protection lasts.)

How Does AI Generate Works?

AI models are computer programs that are trained on data to mimic an aspect of human behavior (like language). With larger AI, the data is usually scraped from the internet. The AI is given data from humans, and it learns patterns and rules from that set of training data. When it produces its work—a mimic of human behavior—the AI uses the information it learned from the human-created data.

For example, suppose you want an AI to create a song like Mozart—and thus mimic a particular human's behavior. The training data you'd give the AI could be all of the songs Mozart has written. The AI would then try to predict Mozart's compositions using the training data. As it gets more data, it becomes better at predicting how the songs sound until it can compose a song like Mozart himself.

AI is usually built by computer programmers and researchers who are employed by a company or university. The programmers and researchers take time to write the computer code and algorithms that run the AI.

Meanwhile, the training data that the AI is using to get better at its task usually includes copyrighted works that were grabbed from websites. The AI needs the training data just like it needs its computer code to complete its task. Both what the researchers and programmers provide and what the copyright owners provide are essential.

So, who owns the copyright of a work that AI generates? It would seem like there are three choices:

  • the programmers and researchers
  • the employer of the programmers and researchers, and
  • the copyright owners of the works used as training data.

(To learn more about copyright ownership, read our article about who owns the rights to a copyright.)

If this were a regular case of copyright, then we could likely eliminate the programmers and researchers. Their work was probably commissioned by their employer as part of their normal work as an employee. In that case, their work was made for hire, and copyright law would recognize the employer as the true owner of any copyrighted work. (The employer would probably own the computer code that the programmers and researchers created.)

So, that leaves the employer and the copyright owners of the training data. Perhaps they both own the AI-generated work?

The answer to who owns the copyright in your typical AI-generated work is: none of the above. In fact, no one owns the AI-generated work. The work that the AI generates simply can't be copyrighted.

No One Owns the Copyright to the AI-Generated Work

Let's go back to the requirements for a work to be copyrighted. You'll recall that one of the requirements is that the work must be created by a human. AI isn't human, so any work it produces can't be copyrighted. If its work can't be copyrighted, then no one—not the researchers, the researchers' employer, nor the copyright owners of the training data—can own the copyright.

The "human authorship" requirement isn't actually fleshed out in the U.S. Copyright Act alongside the other copyright requirements. The original drafters of the Copyright Act might not have seen a need to make the distinction. But as technology developed and questions were raised over the years, this qualification needed to be addressed.

Why Nonhumans Can't Create Copyrighted Works

In court cases dating back to the late 1800s, copyright law was determined to protect works "founded in the creative powers of the mind." (In re Trade-Mark Cases, 100 U.S. 82 (1879).)

More recently, courts and the U.S. Copyright Office have used prior courts' reasoning to define and expand on the Copyright Act. Specifically, the Copyright Office has determined that an original work of authorship means, among other things, that the work must be created by a human author. The Copyright Office has found that works created by nonhuman authors include:

  • a photograph taken by a monkey
  • a mural painted by an elephant
  • driftwood shaped by the ocean, and
  • a song naming the Holy Spirit as the author.

In the same way, the Copyright Office has determined that works produced by computer programs (like AI) can't be copyrighted.

The Monkey Selfie Court Case

One of the most famous court cases about copyrighting a nonhuman's work is the court case revolving around the rights in a selfie picture taken by a monkey. In 2011, photographer David Slater gave a news agency some photographs that a macaque had taken of itself using Slater's camera.

In the years that followed, Slater argued (mostly unsuccessfully) that he had a copyright claim to the selfies because his work in gaining the trust of the monkeys and setting up the camera contributed to the creation of the selfie image. In response, the U.S. Copyright Office published an opinion that re-asserted that the Office wouldn't register works not created by a human author.

In an ironic twist, the organization People for the Ethical Treatment of Animals (PETA) sued Slater and said that he was infringing on the monkey's copyright to the selfie images by using the images in a book he'd published. The court backed the Copyright Office's opinion and said that works created by animals can't be copyrighted. This time, the law benefited Slater, and the case was dismissed.

The court case received international attention and would go on to serve as one of the more popular examples used against AI creating copyrighted works.

We've established that the works generated by AI aren't copyrightable, meaning no one owns the copyrights to those works. But that doesn't mean that there aren't any copyrights involved in AI-generated works. In fact, numerous copyrighted works are used to train AI models. The questions posed by the AI's use of these copyrighted works are:

  • Are the researchers and programmers (and their employers) allowed to use the copyrighted works to train the AI?
  • Should the copyright owners be compensated?

Generally, copyright owners believe that their works can be used as long as they consent and are compensated. But creators of AI models usually don't get the copyright owners' permission and don't pay the owners for their works.

Can You Train AI on Copyrighted Material?

Unfortunately, the answer is that it's still up for debate. Broadly, there are two sides to the argument:

  • The owners of the AI are allowed to use the copyrighted works to train the AI under copyright law's fair use doctrine.
  • The owners of the AI aren't allowed to use the copyrighted works to train the AI because the use does not qualify under the fair use doctrine.

To answer whether copyrighted works can be used to train AI without running afoul of intellectual property rights, we need to explore what the fair use doctrine is and what it says.

What Is the Fair Use Defense Doctrine?

When a work is copyrighted, you can't use it without infringing on the owner's rights unless you have the owner's permission to use it or you have a defense against a claim of copyright infringement. One of the most common defenses is the fair use defense. Under the fair use doctrine, you can use a copyrighted work in certain circumstances.

A court will weigh four factors to determine whether your use is fair use:

  1. the purpose and character of the copyrighted use (such as whether your use is commercial or educational)
  2. the nature of the copyrighted work (including whether the copyrighted work is creative or informational)
  3. the amount of the copyrighted work used in relation to the copyrighted work as a whole (for example, whether you copied one line in a book or a whole chapter), and
  4. the effect of the use upon the potential market for or value of the copyrighted work (in other words, whether the copyright owner was financially harmed or hindered by your use of their work).

How Does the Fair Use Doctrine Apply to AI Training Data?

Let's look at how each factor of the fair use doctrine can weigh in favor of and against fair use when it comes to training AI with copyrighted works.

The purpose and character of the copyrighted use. Some AI models are used by students and researchers for educational purposes, such as for a research paper in an academic journal. Other AI are marketed and sold to the public to use. If the AI is being used for educational purposes, then this factor weighs in favor of fair use. If the AI is being used for commercial purposes and profited off of, then this factor weighs against a fair use defense.

The nature of the copyrighted work. Some of the data used to train AI programs is factual and informational and comes from news sources, biographies, and research articles. Other training data is imaginative and fictional, such as poems, songs, and stories. If the training data is factual or informational, then this factor weighs in favor of fair use. If the data is imaginative or fictional, then this factor weighs against a fair use defense.

The amount of copyrighted work used. Programmers might use an entire book or song to train AI. Other programmers might instead use only a passage or line from a book or a few notes from a song. If the programmers are using only a short passage or line from a book or a couple of notes from a song, then this factor weighs in favor of fair use. If the programmers are using large portions of a copyrighted work, such as most or all of a book or song, then this factor weighs against a fair use defense.

The effect of the use on the copyrighted work's value. How this factor applies depends on whether someone's original work or talent is being replaced by the AI. If an AI can be trained to generate a song or image in the same style as a specific artist, then the value of that artist's copyrighted work would be harmed. If an AI is being trained to do a task unrelated to the copyright owner's work, then the copyrighted work probably isn't harmed. For instance, an AI trained to speak French as a chatbot on a website probably wouldn't hurt the value of a copyrighted book written in French that it used as training data.

Courts that consider and weigh these factors can determine whether the fair use defense applies. If it does, then the owners of the AI can use the copyrighted works. If it doesn't, then they can't without the copyright owners' permission.

Should the Copyright Owners of the Training Data Be Compensated?

In most cases, the creators of AI models haven't received permission from the copyright owners to use their work to create training data. Many copyright owners don't even know their work was used at all. The question then is whether the AI owners have a fair use defense against a copyright infringement claim.

If the AI creators have a fair use defense against copyright infringement, then there's no need to compensate the copyright owners of the training data. If the AI creators don't have a fair use defense, then the copyright owners should be compensated. Just as any other copyrighted work can be licensed (usually for a fee), the owners of the training data should be able to license out their copyrighted works.

What If AI Copies Someone Else's Work?

As mentioned earlier, you can use an AI-generated work without infringing on the AI's or the AI owner's rights because the work can't be copyrighted. But what if the AI work in question has copied someone else's work?

For instance, suppose Ben publishes an AI-generated article that effectively copies (or "plagiarizes") an article written by Gwen Tennyson. Ben puts his name as the author of the AI-generated article without getting any permission from Gwen. Does Gwen, as the author of the article that the AI plagiarized, have any rights? Is Ben responsible for the AI's plagiarism?

If the work that's been copied is copyrighted, then the author of the copyrighted work has enforceable legal rights in their work just as if you had directly copied from them. You're responsible—whether you wrote the article or not—for making sure that the works that you put your name on don't violate anyone else's copyrights.

Going back to our example, if Gwen's work is copyrighted, then she would have copyright protections. Ben can't use Gwen's work without her permission regardless of whether he produced the work that copied her. As long as he's distributing the work without her permission, he's infringing on her copyright.

Artists, authors, and other copyright owners have started to make their cases against AI companies in court. In January 2023, a group of artists sued Stability AI, DeviantArt, and Midjourney. These companies use AI models to create images using text prompts. The artists claim that these AI image-generators are using artists' works to train AI to create derivative works. However, in October 2023, the judge dismissed most of the claims in the lawsuit. Though the case is still ongoing.

Two other high-profile cases were filed in 2023 over the alleged use of copyrighted works in AI. Sarah Silverman and other authors sued OpenAI and Meta in July and the New York Times sued OpenAI and Microsoft in December.

The final decisions from these lawsuits could provide a foundation for AI copyright law and affect how AI tools are trained and developed.

Sarah Silverman and Other Authors Sue OpenAI and Meta for Copyright Infringement

In July 2023, comedian Sarah Silverman and authors Christopher Golden and Richard Kadrey joined together to file two copyright infringement lawsuits: one against OpenAI and one against Meta. OpenAI owns ChatGPT, an AI model that's trained on massive amounts of text to simulate a human-like conversation. Meta owns LLaMA, a set of AI models that can generally be described as a not-yet-finished competitor to ChatGPT.

The lawsuits allege, among other things, that OpenAI and Meta trained their AI models on copyrighted works without the copyright owners' consent and without compensation. The alleged copyright violations in the lawsuit are that:

  • The tech companies copied copyrighted works to train their models, creating AI programs that are, themselves, infringing derivative works.
  • The output the AI produces (for example, a response to a prompt) is an infringing derivative work because the output is based on expressive text taken directly from the copyrighted works.

As copyright owners, Silverman, Golden, and Kadrey have the exclusive right to copy, distribute, and create derivatives of their copyrighted work. The lawsuits allege that OpenAI and Meta violated those rights and more when training and releasing their models. The suits claim that the plaintiffs—and potentially thousands of copyright owners whose works were similarly used—are losing out financially because of this unauthorized use.

In November 2023, the judge threw out part of the authors' case against Meta. The judge determined that the authors didn't argue that their works were substantially similar to LLaMA's output, which was necessary to prove copyright infringement. In dismissing the claims, the judge left room for Silverman and the others to amend their claims to address the gap. In December 2023, the plaintiffs filed an amended complaint dropping most of their previously-dismissed claims. Importantly, however, the judge—and Meta—haven't addressed the plaintiff's claim that Meta violated the authors' copyrights by using their works to train its AI model.

The New York Times Sues OpenAI and Microsoft Over ChatGPT

In December 2023—in the wake of the partial dismissal in the Silverman case—the New York Times launched a copyright lawsuit against OpenAI and Microsoft. The Times alleges that the companies used millions of New York Times articles to train ChatGPT and other chat bots. As a result, the chatbots generate content that directly competes with the media outlet. The case comes after failed private talks between New York Times and the defendants over intellectual property.

On January 8, 2024, OpenAI publicly responded to The Times lawsuit. OpenAI denied it violated copyright laws. In a blog post, OpenAI said it used to publicly-available material to train its AI models and its use is protected by the fair use doctrine. OpenAI and Microsoft have 21 days to respond to the complaint.

If you believe your copyrighted work has been used illegally, you should talk to a copyright lawyer. They can help you determine whether you have a valid claim of copyright infringement and what your next steps should be. If you've built an AI program using data that's copyrighted, you should consider speaking with a copyright attorney as well. They can help you determine whether your use counts as fair use and how you should legally treat the training data.

Be cautious about taking AI-generated work and putting your name on it. The AI program could've copied parts of its work from others. While you can't infringe on the AI owner's copyright, you can infringe on others' copyrights. If you're unsure about what you can use and how you can use it, consult a copyright lawyer. They can give you guidance on how to avoid copyright infringement claims.

Talk to a Lawyer

Need a lawyer? Start here.

How it Works

  1. Briefly tell us about your case
  2. Provide your contact information
  3. Choose attorneys to contact you
Get Professional Help

Talk to a Intellectual Property attorney.

How It Works

  1. Briefly tell us about your case
  2. Provide your contact information
  3. Choose attorneys to contact you