DeepSeek AI’s privacy dilemma: Are data protection concerns valid?

DeepSeek has witnessed record popularity since two of its cost-efficient AI models, released in quick succession, were touted as exhibiting performance on-par with large language models (LLMs) developed by US rivals such as OpenAI and Google.

But DeepSeek’s rise has been accompanied by a range of concerns among users regarding data privacy, cybersecurity, disinformation, and more. Some of these concerns have been fueled by the AI research lab’s Chinese origins while others have pointed to the open-source nature of its AI technology.

The US Navy has reportedly warned its members not to use DeepSeek’s AI services “for any work-related tasks or personal use,” citing potential security and ethical concerns. However, tech industry figures such as Perplexity CEO Aravind Srinivas have repeatedly sought to allay such worries by pointing out that DeepSeek’s AI can be downloaded and run locally on your laptop or other devices.

So far, DeepSeek has rolled out several AI models designed for coding, writing tasks, image generation, etc.

However, average users are more likely to access DeepSeek’s AI by downloading its app on iOS and Android devices or using the web version. In its privacy policy, DeepSeek unequivocally states: “We store the information we collect in secure servers located in the People’s Republic of China.” As per the privacy policy, the user data collected by DeepSeek is broadly categorised into: Information provided by the user: Text or audio inputs, prompts, uploaded files, feedback, chat history, email address, phone number, date of birth, and username, etc.

Automatically collected information: Device model, operating system, IP address, cookies, crash reports, keystroke patterns or rhythms, etc. Information from other sources: If a user creates a DeepSeek account using Google or Apple sign-on, it “may collect information from the service, such as access token.” It may also collect user data such as mobile identifiers, hashed email addresses and phone numbers, and cookie identifiers shared by advertisers.

As per the privacy policy, DeepSeek may use prompts from users to develop new AI models. The company says it will “review, improve, and develop the service, including by monitoring interactions and usage across your devices, analysing how people are using it, and by training and improving our technology.”

It further states that the user data can be accessed by DeepSeek’s corporate group and will be shared with law enforcement agencies, public authorities, and others in compliance with legal obligations. DeepSeek’s data collection is in line with practices of other generative AI platforms. For instance, OpenAI’s ChatGPT was also criticised in the past for collecting vast amounts of user data. The AI chatbot was even briefly banned in Italy over privacy concerns.

“Risks for privacy and data protection come from both the way that LLMs are trained and developed and the way they function for end users,” Privacy International, a UK-based non-profit organisation advocating for digital rights, said in a report.

Privacy experts have also pointed out that it is possible for personal data to be extracted from LLMs by feeding it the right prompts. In its lawsuit against OpenAI, The New York Times had said that it came across examples of ChatGPT reproducing its articles verbatim. In 2023, Google Deepmind researchers also claimed that they had found ways to trick ChatGPT into spitting out potentially sensitive personal data.

“The possibility to use LLMs (in particular ones that have been made available with open source weights) to make deepfakes, to imitate someone’s style and so on shows how uncontrolled its outputs can be,” Privacy International said. Users may also not be aware that the prompts they are feeding into LLMs are being absorbed into datasets to further train AI models, it added.

Additionally, the US Federal Trade Commission (FTC) has noted that AI tools “are prone to adversarial inputs or attacks that put personal data at risk.” DeepSeek confirmed on Tuesday, January 28, that it was hit by a large-scale cyberattack, forcing it to pause new user sign-ups on its web chatbot interface.

To be sure, DeepSeek users can delete their chat history as well as their accounts via the Settings tab in the mobile app. However, it appears that there is no way for users to opt out of having their interactions used for AI training purposes.

And while DeepSeek has made the underlying code and weights of its reasoning model (R1) open-source, the training datasets and instructions used for training R1 are not publicly available, according to TechCrunch. The storage of DeepSeek user data in servers located in China is already inviting scrutiny from various countries. US government officials are reportedly looking into the national security implications of the app, and Italy’s privacy watchdog is seeking more information from the company on data protection.

But when it comes to privacy and data protection, perhaps the strongest argument in favour of DeepSeek is that its open-source AI models can be downloaded and installed locally on a computer.