高清福利片

Opinion_

OpenAI's data hunger raises privacy concerns

23 September 2024
It is not difficult to imagine a scenario in which centralised control over many kinds of data would let OpenAI exert significant influence over people, writes Professor Uri Gal in The Conversation.
Professor Uri Gal

Professor Uri Gal

Last month, OpenAI聽聽a yet-to-be enacted Californian law that aims to set basic safety standards for developers of large artificial intelligence (AI) models. This was a change of posture for the company, whose chief executive Sam Altman has previously聽聽of AI regulation.

The former nonprofit organisation, which shot to prominence in 2022 with the release of ChatGPT, is now valued at聽. It remains at the forefront of AI development, with the release last week of聽聽designed to tackle more complex tasks.

The company has made several moves in recent months suggesting a growing appetite for data acquisition. This isn鈥檛 just the text or images used for training current generative AI tools, but may also include intimate data related to online behaviour, personal interactions and health.

There is no evidence OpenAI plans to bring these different streams of data together, but doing so would offer strong commercial benefits. Even the possibility of access to such wide-ranging information raises significant questions about privacy and the ethical implications of centralised data control.

Media deals

This year, OpenAI has signed聽聽with media companies including Time magazine, the Financial Times, Axel Springer, Le Monde, Prisa Media, and most recently Cond茅 Nast, owner of the likes of Vogue, The New Yorker, Vanity Fair and Wired.

The partnerships grant OpenAI access to large amounts of content. OpenAI鈥檚 products may also be used to analyse user behaviour and interaction metrics such as reading habits, preferences, and engagement patterns across platforms.

If OpenAI gained access to this data, the company could gain a comprehensive understanding of how users engage with various types of content, which could be used for in-depth user profiling and tracking.

Video, biometrics and health

OpenAI has also invested in聽. The aim is to enhance the cameras with advanced AI capabilities.

Video footage collected by AI-powered webcams could translate to more sensitive biometric data, such as facial expressions and inferred psychological states.

In July, OpenAI and Thrive Global launched Thrive AI Health. The company says it will use AI to 鈥溾 in health.

While Thrive AI Health says it will have 鈥渞obust privacy and security guardrails鈥, it is unclear what these will look like.

Previous AI health projects have involved extensive sharing of personal data, such as a partnership between Microsoft and Providence Health in the United States and another between Google DeepMind and the Royal Free London NHS Foundation Trust in the United Kingdom. In the latter case, DeepMind聽聽for its use of private health data.

Sam Altman鈥檚 eyeball-scanning side project

Altman also has investments in other data-hungry ventures, most notably a controversial cryptocurrency project called WorldCoin (which he cofounded). WorldCoin aims to create a global financial network and identification system using biometric identification, specifically iris scans.

The company claims it has already聽聽across almost 40 countries. Meanwhile, more than a dozen jurisdictions have either suspended its operations or scrutinised its data processing.

Bavarian authorities are currently deliberating on聽. A negative ruling could see the company barred from operating in Europe.

The main concerns being investigated include the collection and storage of sensitive biometric data.

Why does this matter?

Existing AI models such as OpenAI鈥檚 flagship GPT-4o have largely been trained on聽聽from the internet. However, future models will need more data 鈥 and it鈥檚聽.

Last year, the company聽聽it wanted AI models 鈥渢o deeply understand all subject matters, industries, cultures, and languages鈥, which would require 鈥渁s broad a training dataset as possible鈥.

In this context, OpenAI鈥檚 pursuit of media partnerships, investments in biometric and health data collection technologies, and the CEO鈥檚 links to controversial projects such as Worldcoin, begin to paint a concerning picture.

By gaining access to vast amounts of user data, OpenAI is positioning itself to build the next wave of AI models 鈥 but privacy may be a casualty.

The risks are multifaceted. Large collections of personal data are vulnerable to breaches and misuse, such as the聽聽in which almost half of Australians had their personal and medical data stolen.

The potential for large-scale data consolidation also raises concerns about profiling and surveillance. Again, there is no evidence that OpenAI currently plans to engage in such practices.

However, OpenAI鈥檚 privacy policies have been聽. Tech companies more broadly also have a long history of聽.

It is not difficult to imagine a scenario in which centralised control over many kinds of data would let OpenAI exert significant influence over people, in both personal and public domains.

Will safety take a back seat?

OpenAI鈥檚 recent history does little to assuage safety and privacy concerns. In November 2023, Altman was聽聽as chief executive, reportedly due to internal conflicts over the company鈥檚 strategic direction.

Altman has been a strong advocate for the rapid commercialisation and deployment of AI technologies. He has reportedly often prioritised growth and market penetration聽.

Altman鈥檚 removal from the role was brief,聽聽and a significant shakeup of OpenAI鈥檚 board. This suggests the company鈥檚 leadership now endorses his aggressive approach to AI deployment, despite potential risks.

Against this backdrop, the implications of OpenAI鈥檚 recent opposition to the California bill extend beyond a single policy disagreement. The anti-regulation stance suggests a troubling trend.


This article was first published in .听聽is a聽Professor in Business Information Systems at the University of Sydney Business School.

Media contact

Harrison Vesey

Media Advisor (Business)

Related articles