ChatGPT and Data Privacy


In April 2023, German artist Boris Eldagsen won the open creative award for his photographic entry entitled, Pseudomnesia: The Electrician. But, the confusing part of the event for the judges and the audience was that he refused to receive the award. The reason was that the photograph was generated by an Artificial Intelligence (AI) tool. It was reported that Eldagsen “said he used the picture to test the competition and to create a discussion about the future of photography.” Was it the shortcoming of the judges that they couldn’t discern what was real and what was fake?

Generative AI presents the challenge of blurring the lines between what is real, and what is artificially created. Similarly, regarding textual content, the sheer confidence that a chatbot exudes while generating sometimes biased and falsified information does not give even the slightest hint to unsuspecting users about the content’s reliability.

Such threats that the AI tools like DALL-E, BARD, and, most importantly, ChatGPT pose aren’t limited to biased or falsified information. From a privacy perspective, the technology poses serious concerns.

What is ChatGPT & Why Does Its Data Collection Practices Raise Eyebrows?

Although there have been many attempts at AI chatbots, by far, ChatGPT is the only tool that has had such a broad and immediate global impact. The tool not only garnered immediate traction as soon as it was launched, but it also broke the record as one of the best tools that attracted hundreds of millions of users in its first three months.

What makes ChatGPT so amazing is that it uses billions of bytes of data taken from online blogs, e-books, community forums and threads, and similar places to train itself. Now, it can generate highly detailed responses against a user’s prompt in a fraction of a second. In fact, the new ChatGPT 4, which has been trained on billions of parameters more than the earlier version, has passed a number of fairly difficult exams, such as the Uniform Bar Exam, Law School Admission Test (LSAT), and the Scholastic Assessment Test (SAT).

Now the question is, why its data collection practices are opposed if every piece of information that it is trained on is already public? It is true that freely available public data doesn’t constitute any data privacy regulation, such as the General Data Protection Regulation (GDPR). However, there are certain exceptions to that rule. For starters, if the publicly available information includes sensitive personal data, such as racial data, ethnic origin, and political or religious views, then the data protection regulation may apply.

Moreover, the developers at OpenAI, use the submitted prompts to continuously train ChatGPT. It should be kept in mind that it is not completely visible to what extent the tool collects and retains such information. More importantly, the system immediately uses the prompts to further train itself. This means that the data provided by the user is now part of the collective AI intelligence, never to be deleted, which closely straddles the line of non-compliance with the right to delete and the right to erase as covered under GDPR and similar other global data privacy laws.

Organizational Confidential Information is at Risk

ChatGPT has been used for a wide range of purposes. People are leveraging AI responses for academic research purposes, marketing communications, and even for developing or reviewing programming code. This presents a high risk of sharing confidential or proprietary information. Since the tool is efficient at fixing things like grammatical or contextual mistakes in texts or errors in codes, people willingly feed confidential information to the AI tool. For instance, several employees of Samsung leaked proprietary data to ChatGPT when they used it to check their programming code.

The Risk of Biased Information

ChatGPT is trained on the input it receives from users. Prior to its release, it was trained on the information available on the internet up until 2021. The resulting content highly depends on the quality and nature of the data it is trained on. Since the internet is brimming with all sorts of  fake, and biased content, AI tools will replicate biases that exist in the data. The same can be said for content involving discrimination.

How Are The World Leaders Responding to ChatGPT and AI?

Italy was the first Western country to temporarily ban ChatGPT out of data privacy concerns. The country received complaints from artists and media agencies against the AI tool for using their work without their consent, which ultimately triggered GDPR regulation. While the ban was recently reversed once some privacy guarantees were presented by OpenAI, it shows how a technology that was designed to benefit people could have unintended consequences.

Conclusion

There’s no doubt that ChatGPT and other AI tools have the potential to transform how we seek information or communicate. However, data privacy laws raise genuine concerns about the potential risks and implications of AI. Privacy laws are slowly trying to catch up with AI tools and technology that have been accelerated after the immense success of ChatGPT. In April 2023, the Group of Seven (G7) nations showed their keen interest in regulating ChatGPT and similar generative AI technologies. However, their interest is tempered by the desire to seek a risk-based approach rather than a draconian solution. Fortunately, between the efforts of nations, as well as organizations such as NIST, who have released guidance for AI development, we may be able to build a bridge between privacy and AI.


About the Author:

With a strong background in the SaaS and IaaS industry, Syed Sayem Mustufa has extensive experience in Marketing. Over the years, Sayem has served some of the top data intelligence and cybersecurity brands, including Securiti.ai. He loves nothing more than breaking down and simplifying highly complex product details into easy-to-understand benefits for end users.

Editor’s Note: The opinions expressed in this and other guest author articles are solely those of the contributor, and do not necessarily reflect those of Tripwire.



Source link