Do Data Privacy Laws Protect Your Personal Information From AI?

AI models are being trained on individuals’ personal data without much regulation. Will the law change that?

By Amanda Hayes, Attorney University of North Carolina School of Law

Updated 8/09/2023

Nolo was born in 1971 as a publisher of self-help legal books. Guided by the motto “law for all,” our attorney authors and editors have been explaining the law to everyday people ever since. Learn more about our history and our editorial standards.

Each article that we publish has been written or reviewed by one of our editors, who together have over 100 years of experience practicing law. We strive to keep our information current as laws change. Learn more about our editorial standards.

Data privacy is one of the biggest concerns with emerging artificial intelligence technology. The most prominent AI models—large language learning models (LLMs) like OpenAI's ChatGPT, Google's Bard, and Meta's LLaMa—are trained on vast quantities of data. The more data these AI models consume, the better they become at simulating human thought and conversation.

People naturally wonder how much data these AI models have access to. How much should they have access to? And what are the risks if they have our personal information?

Lawmakers in and outside the U.S. have started to limit how data—and, importantly, personal information—that's used to train AI is collected, stored, processed, and delivered.

In This Article

Personal Information in AI Training Data
Data Privacy Laws: Protecting Personal Information
Data Privacy Laws Outside the U.S.: The GDPR
What's Next for AI Data Privacy?

Personal Information in AI Training Data

AI models are trained on large swaths of text coming mostly from websites, books, and newspapers. But where does that data come from? It's a straightforward question that usually gets a less-than-forthcoming answer. Generally, the owners of popular LLMs vaguely declare that their data is from public sources. For example:

OpenAI's ChatGPT-4: On its website, OpenAI says that ChatGPT-4 was trained on "publicly available data (such as internet data)" and data OpenAI has licensed.
Google's Bard: Google's privacy policy says that Google Bard might be trained on information that's publicly available online or other public sources.
Meta's LLaMa: Meta, through GitHub, has said that LLaMa is trained primarily using data from Common Crawl, a web crawling organization that provides access to years of webpage data.

Data sourced from public sources usually contains personal information like names, email addresses, and birthdates. This information can be taken from databases, articles, blogs, forums, and social media. And people whose personal data is being fed to AI models often don't know that what they've shared online is being used in these training sets. Again, AI developers haven't given details about what kinds of personal data have been collected and from whom.

Should the fact that data is publicly available mean that anyone is allowed to use it for any reason? Many say no, worrying that training data can and inevitably will be revealed to anyone who asks the AI the right questions.

Data Privacy Laws: Protecting Personal Information

Data privacy is an area of law that has to do with access to our personal information. Lawmakers try to protect people's personal information through consumer protection and privacy laws. The idea is to stop businesses from using consumer data in ways that would be unfair, deceptive, or harmful.

In data privacy laws, "personal information" or "personal data" usually refers to information that can directly or indirectly identify someone. Typically, personal information or data can include:

names
email addresses
phone numbers
dates of birth
gender
religion
national origin
Social Security numbers
passport numbers
driver's license numbers
biometric data
web cookies and IP addresses
financial information, and
employment information.

How Federal Law Protects Personal Information

In the U.S., the Federal Trade Commission (FTC) is tasked with protecting consumers' privacy and security. Under Section 5 of the FTC Act, the FTC is responsible for preventing people and businesses from using "unfair or deceptive acts or practices" while conducting business in the U.S. (15 U.S.C. § 45 (2023).)

The FTC can specify the kinds of business practices that are considered unfair or deceptive, such as the unreasonable collection and processing of personal information. The federal agency can also launch investigations and charge businesses with violating consumer protection laws. Businesses that violate the law can be forced to pay civil penalties and restitution to consumers.

The FTC's Investigation of OpenAI

In July 2023, the FTC opened an investigation into OpenAI. In its letter to the company, the FTC said it was investigating whether, through the use of its LLMs (like ChatGPT), OpenAI has violated Section 5 of the FTC Act.

The federal agency demanded that OpenAI provide information about its LLMs to see whether the tech company has engaged in unfair or deceptive practices with regard to:

privacy or data security, or
risks of harm to consumers, including in terms of their reputations.

The FTC is particularly interested in the use of personal data to train ChatGPT and ChatGPT's ability to generate statements containing people's personal information. The agency asked OpenAI whether the company had taken any steps to address risks related to ChatGPT generating statements with actual personal information.

The law enforcement agency seems to be looking for information about how OpenAI collects, processes, and generates personal data. The inquiry should reveal what efforts the company put toward protecting personal data and whether these efforts protect consumers enough. [end sidebar]

Although it has the FTC, the U.S. doesn't have a comprehensive data privacy law. But some lawmakers are trying to change that.

In June 2022, the House Energy and Commerce Committee introduced the American Data Privacy Act, a bill that would provide rights and protections to consumers. These rights would include the right to access, correct, delete, and consent to the data that'll be collected and processed. As of August 2023, the ADPA hasn't been passed in either the House or Senate.

How States Protect Personal Information

Whereas U.S. laws apply throughout the country, state laws apply to businesses that operate within those states and to consumers who reside there. States have approached data privacy in varying ways. Some have no consumer data privacy laws. A handful have comprehensive privacy laws.

For example, California has the California Privacy Rights Act (CPRA), a law that took effect on January 1, 2023 and that expands on the California Consumer Privacy Act of 2018. The CPRA is one of the most protective state measures for consumer privacy. It includes the rights to:

know the personal information a business collects about you and how that information is used, shared, and sold
opt out of the sale or sharing of your personal information
limit the use and disclosure of sensitive personal information collected about you
delete your personal information collected by a business, and
correct inaccurate personal information that a business has about you.

(Cal. Civ. Code §§ 1798.140 and following (2023).)

Despite California's stricter regulations and the FTC's investigation into ChatGPT, the U.S., in general, is considered behind other nations when it comes to consumer protection and data privacy laws.

Perhaps the most widely known data protection law is the General Data Protection Regulation (GDPR). The GDPR is a relatively strict European Union (EU) law that protects personal data and privacy. (It went into effect in May 2018.) While the law applies only to EU member states, many countries have used it as a model and put similar regulations into place.

The GDPR applies to most businesses that process personal data. Under the GDPR, companies can collect and process personal data only under limited circumstances and have to follow strict protocols for collecting, storing, and processing that data.

Personal Information Under the GDPR

The GDPR allows companies to process—for example, collect, record, store, organize, or use—personal data only if one of the following is true:

the person or entity (known as the "data subject") gives their consent
the company is executing a contract that the data subject is a party to (for example, a background check agreement)
the company has to process the data to comply with legal obligations (like a court order)
the company needs to process the data to protect someone's life
the company needs to process the data to perform a task that's in the public interest, or
the company has a legitimate interest to process the data.

Most of these situations, except for the last one, are relatively obvious to identify. However, proving you have a legitimate interest to process personal data is tricky. To determine whether you have a legitimate interest, you must:

show the interest is a legitimate one
prove that you need to process the personal information to achieve that interest (in other words, there's no better way to achieve the interest), and
weigh that legitimate interest (like fraud prevention or IT security) against the data subject's interests, rights, and freedoms.

(Article 6 of the GDPR (2023).)

When it comes to Bard, Google has cited "legitimate interests" as its basis for collecting personal data from EU users. In its Privacy Help Hub for Bard, Google says that it's collecting personal information so it can "provide, maintain, improve, and develop Google products, services, and machine learning technologies."

Whatever the justification is for processing personal data, the GDPR requires that companies make sure the data is accurate, up to date, and secure. Companies also need to be transparent with the data subject about the processing of their personal data. For example, the company must let the person know generally why their personal data was collected and for how long their personal information will be stored.

Enforcing the GDPR Against AI Companies

Some EU member countries have taken action against AI companies to enforce consumer rights under the GDPR. Here are a couple of examples.

Italy temporarily banned ChatGPT. In March 2023, Italy banned ChatGPT due to concerns about the chatbot's potential GDPR violations. Italy took issue with how OpenAI was collecting its training data from Italian consumers and the fact that inappropriate data could reach underage users. Italy gave OpenAI 20 days to develop an action plan to address these issues and to fully comply with the GDPR. By the end of April 2023, OpenAI had made changes such as verifying users' ages when they sign up and providing a way for people to remove their personal information from ChatGPT's training data. In response to OpenAI's improvements, Italy lifted its ban.

Ireland stalled Google's EU launch of Bard. Before launching its AI model in the EU in July 2023, Google worked to create stricter privacy policies to satisfy the demands of the Irish Data Protection Commissioner. In an attempt to comply with GDPR rules and to provide more transparency, Google made various changes to Bard pre-launch, including requiring users to create a Google account to use Bard and authenticate that they're 18 years of age or older.

The United Kingdom, on the other hand, is taking a more relaxed approach to AI regulation. Even though the UK is no longer an EU member state, it incorporates the GDPR into its Data Protection Act. The UK has said that it doesn't plan to create new data privacy laws geared toward AI but will give voluntary guidance on existing laws. For example, the UK Information Commissioner's Office has provided companies with best practices and principles to consider when adopting AI within their industry.

What's Next for AI Data Privacy?

In the U.S., the FTC's investigation of OpenAI and ChatGPT could be an indicator of how serious the government will get in regulating the way that AI companies gather, use, and share our personal information. If lawmakers decide to get serious about the issue, they could create data protection laws that provide ways for Americans to better control their personal information. If the U.S. and other countries follow the EU's lead, companies will have to reconsider how they use personal information to train AI.

Talk to a Lawyer

Need a lawyer? Start here.

Practice Area

Zip code

How it Works

Briefly tell us about your case
Provide your contact information
Choose attorneys to contact you

In This Article

Personal Information in AI Training Data
Data Privacy Laws: Protecting Personal Information
Data Privacy Laws Outside the U.S.: The GDPR
What’s Next for AI Data Privacy?

Get Professional Help

Talk to a Consumer Protection attorney.

How It Works

Briefly tell us about your case
Provide your contact information
Choose attorneys to contact you

Copyright ©2025 MH Sub I, LLC dba Nolo ® Self-help services may not be permitted in all states. The information provided on this site is not legal advice, does not constitute a lawyer referral service, and no attorney-client or confidential relationship is or will be formed by use of the site. The attorney listings on this site are paid attorney advertising. In some states, the information on this website may be considered a lawyer referral service. Please reference the Terms of Use and the Supplemental Terms for specific information related to your state. Your use of this website constitutes acceptance of the Terms of Use, Supplemental Terms, Privacy Policy, Cookie Policy, and Consumer Health Data Notice.

Your Privacy Choices Privacy Options Checkmark

AI and Data Privacy: Does the Law Protect Your Personal Information?