How To Privacy-Proof the Coming AI Wave


Everyone has noticed that we have entered the AI era. AI is everywhere: to improve customer experience, reduce costs, generate stunning and surreal images. The size of the Artificial Intelligence market is expected to reach a value of US$184.00bn, with a projection of year-to-year growth of 28.46%. Meanwhile, startup creation continues to boom and many of them heavily feature AI in their objective and means; partially because it’s really the new frontier and partially, understandably, because it’s one of the best selling points to raise money.

Big companies are working on improving their models, training on larger and larger datasets, adding capabilities to win the fight for the “best model”. Lot of startups, on the other hand, are using those models to build applications, with expected-to-be killer services that the user doesn’t know how he was able to live without. But still, in this AI revolution, there are security and privacy aspects which are left a bit aside. In particular, privacy concerns are most of the time overlooked, while the danger here is enormous: it’s your personal data which is used in this fight, and you are leaking some of your most unique and private assets here. Alarmingly, a recent survey indicates that, for developers, AI is the second biggest threat to privacy just after cybercrime: the reality is that with increasingly sophisticated malicious tools potentially in the hands of a new generation of cybercriminals, AI is likely bound to become the number one menace within a few years.

Are we doomed? Well, there could certainly be solutions, or at least, mitigations. First, as a user, you could (or should?) refuse to send your private data to companies. Yes, there is data which are sent without you really knowing or accepting, another subject which authorities should regulate. Here, we are speaking about data you deliberately send to third parties. Sure, having an analysis of your DNA to predict future disease or find your ancestors looks cool. But sending your data leaks your DNA forever. It’s a pity since it has recently been shown that it is possible to replace those computation by computation over encrypted DNA, with no way for the provider to see your data in the clear. Here obviously, we are speaking about the different Privacy-Enhancing Technologies (PETs) which are growing, and in particular about Fully Homomorphic Encryption (FHE). With FHE, you could have the same services, the ML inferences, but done on encrypted data. There is a cost to pay: more integration work for the service provider (even if a lot of efforts was done by FHE companies to make the FHE toolkits more and more user-friendly to non-cryptographers), and also a longer inference. All of this is certainly acceptable in lot of cases, to keep your data private: instead of doing a DNA ancestry computation in one minute, it would maybe take 10 minutes or 1 hour, but for us as users, it’s much better and we would agree to pay for this extra-security on our data.

Then, what about training? Here as well, we could encourage companies to train on encrypted datasets, with PETs. Certainly here, an effort by authorities and regulations could be needed, to force companies to avoid the easy-but-non-private path. It’s a shame that clear data is used for training, leaking your data in the future inferences some other people will make. Training on encrypted data has also been shown to be doable, at least to some extent, and regulation or concerns should double the efforts to make it even more practical. However, let’s not forget that here PETs are not the solution to the whole problem, and other security measures like Differential Privacy would also be needed.

The key is to be able to reconcile the need for big data sets with a collaborative approach that can, at the same time, improve results without compromising privacy. The development and use of open-source resources – research papers, software, tools – should be encouraged by authorities and regulators and valued by companies. For example, our company makes open-source tools for developers, for them to add privacy in their applications for their users. We’ve worked on our tools to make them easy to use, without any particular knowledge of cryptography. People shouldn’t have to care about privacy: not because it’s not important but because it should be there de-facto, everywhere and transparently. In our case, simplicity came by using the development tools the developers are already used to: Python for AI, Solidity for blockchain. We took inspiration from Scikit-Learn and PyTorch for our AI framework, such that it is already familiar to experienced ML practitioners. Now we believe that using FHE is easier than ever.

This highlights the urgent need for robust security measures and transparent practices to safeguard personal information against unintended disclosures, ensuring that public trust in emerging technologies is not eroded.

About the Author

Benoit Chevallier-Mames, VP Privacy-Preserving Cloud and ML, Zama , is a security engineer and researcher and currently leads the Cloud and ML division at Zama, developing an FHE compiler and privacy-preserving ML libraries. He has spent more than 20 years between cryptographic research and secure implementations in a wide range of domains such as side-channel security, provable security, whitebox cryptography, fully homomorphic encryption and, more recently, machine learning.

Prior to Zama, he securely implemented public-key algorithms on smartcards in Gemplus for seven years, worked for the French governmental ANSSI agency, and then designed and developed whitebox implementations at Apple for 12 years.

Benoit can be reached online on LinkedIn and at our company website https://www.zama.ai/



Source link

Leave a Comment