ChatGPT is a data privacy nightmare

A mobile phone displaying the ChatGPT website

Home / 高清福利片 & opinion / 高清福利片 / February / ChatGPT is a data privacy nightmare

Opinion_

ChatGPT is a data privacy nightmare. If you've ever posted online, you ought to be concerned

8 February 2023

ChatGPT has taken the world by storm, fuelled by our personal data

Professor Uri Gal from the University of Sydney Business School explains why ChatGPT's language model and data policy should sound a warning to consumers.

ChatGPT has taken the world by storm. Within two months of its release it reached 100 million聽, making it the fastest-growing consumer聽. Users are attracted to the tool鈥檚聽聽鈥� and concerned by its potential to cause disruption in聽.

A much less discussed implication is the privacy risks ChatGPT poses to each and every one of us. Just yesterday,聽聽its own conversational AI called Bard, and others will surely follow. Technology companies working on AI have well and truly entered an arms race.

The problem is it鈥檚 fuelled by our personal data.

300 billion words. How many are yours?

ChatGPT is underpinned by a large language model that requires massive amounts of data to function and improve. The more data the model is trained on, the better it gets at detecting patterns, anticipating what will come next and generating plausible text.

OpenAI, the company behind ChatGPT, fed the tool some聽聽systematically scraped from the internet: books, articles, websites and posts 鈥� including personal information obtained without consent.

If you鈥檝e ever written a blog post or product review, or commented on an article online, there鈥檚 a good chance this information was consumed by ChatGPT.

So why is that an issue?

The data collection used to train ChatGPT is problematic for several reasons.

First, none of us were asked whether OpenAI could use our data. This is a clear violation of privacy, especially when data are sensitive and can be used to identify us, our family members, or our location.

Even when data are publicly available their use can breach what we call聽. This is a fundamental principle in legal discussions of privacy. It requires that individuals鈥� information is not revealed outside of the context in which it was originally produced.

Also, OpenAI offers no procedures for individuals to check whether the company stores their personal information, or to request it be deleted. This is a guaranteed right in accordance with the European General Data Protection Regulation () 鈥� although it鈥檚 still under debate whether ChatGPT is compliant聽.

This 鈥渞ight to be forgotten鈥� is particularly important in cases where the information is inaccurate or misleading, which seems to be a聽聽with ChatGPT.

Moreover, the scraped data ChatGPT was trained on can be proprietary or copyrighted.聽For instance, when I prompted it, the tool produced the first few passages from Joseph Heller鈥檚 book Catch-22 鈥� a copyrighted text.

A ChatGPT prompt and response demonstrating the program plagiarising a novel — ChatGPT doesn鈥檛 consider copyright protection when generating outputs. Anyone using the outputs elsewhere could be inadvertently plagiarising.聽Image: Provided

Finally, OpenAI did not pay for the data it scraped from the internet. The individuals, website owners and companies that produced it were not compensated. This is particularly noteworthy considering OpenAI was recently聽, more than double its聽.

OpenAI has also just聽, a paid subscription plan that will offer customers ongoing access to the tool, faster response times and priority access to new features. This plan will contribute to expected聽.

None of this would have been possible without data 鈥� our data 鈥� collected and used without our permission.

A flimsy privacy policy

Another privacy risk involves the data provided to ChatGPT in the form of user prompts. When we ask the tool to answer questions or perform tasks, we may inadvertently hand over聽聽and put it in the public domain.

For instance, an attorney may prompt the tool to review a draft divorce agreement, or a programmer may ask it to check a piece of code. The agreement and code, in addition to the outputted essays, are now part of ChatGPT鈥檚 database. This means they can be used to further train the tool, and be included in responses to other people鈥檚 prompts.

Beyond this, OpenAI gathers a broad scope of other user information. According to the company鈥檚聽, it collects users鈥� IP address, browser type and settings, and data on users鈥� interactions with the site 鈥� including the type of content users engage with, features they use and actions they take.

It also collects information about users鈥� browsing activities over time and across websites. Alarmingly, OpenAI states it may聽聽with unspecified third parties, without informing them, to meet their business objectives.

Time to rein it in?

Some experts believe ChatGPT is聽聽鈥� a realisation of technological development that can revolutionise the way we work, learn, write and even think. Its potential benefits notwithstanding, we must remember OpenAI is a private, for-profit company whose interests and commercial imperatives do not necessarily align with greater societal needs.

The privacy risks that come attached to ChatGPT should sound a warning. And as consumers of a growing number of AI technologies, we should be extremely careful about what information we share with such tools.

This article was first published in聽The Conversation听补蝉听聽Uri Gal is a Professor of Business Information Systems at the University of Sydney Business School.

Media contact

Harrison Vesey

Media Advisor (Business)

Mobile

+61 479 198 803

harrison.vesey@sydney.edu.au

Opinion_

10 August 2020

How the shady world of the data industry strips away our freedoms

Practices of big technology companies pose threats to our privacy and democracy, writes Professor Uri Gal from the University of Sydney Business School.

Opinion_

08 December 2022