Securing AI: Navigating the Complex Landscape of Models, Fine-Tuning, and RAG
Almost overnight, Artificial Intelligence (AI) has become a priority for most organizations. A concerning trend is the increasing use of AI by adversaries to execute malicious activities. Sophisticated actors leverage AI to automate attacks, optimize breach strategies, and even mimic legitimate user behaviors, thereby escalating the complexity and scale of threats. This blog discusses how attackers might manipulate and compromise AI systems, highlighting potential vulnerabilities and the implications of such attacks on AI implementations.
By manipulating input data or the training process itself, adversaries can subtly alter a model’s behavior, leading to outcomes like biased results, misclassifications, or even controlled responses that serve their nefarious purposes. This type of attack compromises the integrity, trust, and reliability of AI-driven systems and creates significant risks to the applications and users relying on them. It underscores the urgent need for robust security measures and proper monitoring in developing, fine-tuning, and deploying AI models. While the need is urgent, we believe there is reason for hope.
The expansive use of AI is early, and the opportunity to consider appropriate security measures at such a foundational state of a transformational technology is exciting. This paradigm shift needs a proactive approach in cybersecurity measures, where understanding and countering AI-driven threats become essential components of our defense strategies.
AI/Machine Learning (ML) is not new. Many organizations, including Cisco, have been implementing AI/ML models for quite some time and have been a subject of research and development for decades. These range from simple decision trees to complex neural networks. However, the emergence of advanced models, like Generative Pre-trained Transformer 4 (GPT-4), marks a new era in the AI landscape. These cutting-edge models, with unprecedented levels of sophistication and capability, are revolutionizing how we interact with technology and process information. Transformer-based models, for instance, demonstrate remarkable abilities in natural language understanding and generation, opening new frontiers in many sectors from networking to medicine, and significantly enhancing the potential of AI-driven applications. These fuel many modern technologies and services, making their security a top priority.
Building an AI model from scratch involves starting with raw algorithms and progressively training the model using a large dataset. This process includes defining the architecture, selecting algorithms, and iteratively training the model to learn from the data provided. In the case of large language models (LLMs) significant computational resources are needed to process large datasets and run complex algorithms. For example, a substantial and diverse dataset is crucial for training the model effectively. It also requires a deep understanding of machine learning algorithms, data science, and the specific problem domain. Building an AI model from scratch is often time-consuming, requiring extensive development and training periods (particularly, LLMs).
Fine-tuned models are pre-trained models adapted to specific tasks or datasets. This fine-tuning process adjusts the model’s parameters to suit the needs of a task better, improving accuracy and efficiency. Fine-tuning leverages the learning acquired by the model on a previous, usually large and general, dataset and adapts it to a more focused task. Computational power could be less than building from scratch, but it is still significant for the training process. Fine-tuning typically requires less data compared to building from scratch, as the model has already learned general features.
Retrieval Augmented Generation (RAG) combines the power of language models with external knowledge retrieval. It allows AI models to pull in information from external sources, enhancing the quality and relevance of their outputs. This implementation enables you to retrieve information from a database or knowledge base (often referred to as vector databases or data stores) to augment its responses, making it particularly effective for tasks requiring up-to-date information or extensive context. Like fine-tuning, RAG relies on pre-trained models.
Fine-tuning and RAG, while powerful, may also introduce unique security challenges.
AI/ML Ops and Security
AI/ML Ops includes the entire lifecycle of a model, from development to deployment, and ongoing maintenance. It’s an iterative process involving designing and training models, integrating models into production environments, continuously assessing model performance and security, addressing issues by updating models, and ensuring models can handle real-world loads.
Deploying AI/ML and fine-tuning models presents unique challenges. Models can degrade over time as input data changes (i.e., model drift). Models must efficiently handle increased loads while ensuring quality, security, and privacy.
Security in AI needs to be a holistic approach, protecting data integrity, ensuring model reliability, and protecting against malicious use. The threats range from data poisoning, AI supply chain security, prompt injection, to model stealing, making robust security measures essential. The Open Worldwide Application Security Project (OWASP) has done a great job describing the top 10 threats against large language model (LLM) applications.
MITRE has also created a knowledge base of adversary tactics and techniques against AI systems called the MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems). MITRE ATLAS is based on real-world attacks and proof-of-concept exploitation from AI red teams and security teams. Techniques refer to the methods used by adversaries to accomplish tactical objectives. They are the actions taken to achieve a specific goal. For instance, an adversary might achieve initial access by performing a prompt injection attack or by targeting the supply chain of AI systems. Additionally, techniques can indicate the outcomes or advantages gained by the adversary through their actions.
What are the best ways to monitor and protect against these threats? What are the tools that the security teams of the future will need to safeguard infrastructure and AI implementations?
The UK and US have developed guidelines for creating secure AI systems that aim to assist all AI system developers in making educated cybersecurity choices throughout the entire development lifecycle. The guidance document underscores the importance of being aware of your organization’s AI-related assets, such as models, data (including user feedback), prompts, related libraries, documentation, logs, and evaluations (including details about potential unsafe features and failure modes), recognizing their value as substantial investments and their potential vulnerability to attackers. It advises treating AI-related logs as confidential, ensuring their protection and managing their confidentiality, integrity, and availability.
The document also highlights the necessity of having effective processes and tools for tracking, authenticating, version-controlling, and securing these assets, along with the ability to restore them to a secure state if compromised.
Distinguishing Between AI Security Vulnerabilities, Exploitation and Bugs
With so many advancements in technology, we must be clear about how we talk about security and AI. It is essential that we distinguish between security vulnerabilities, exploitation of those vulnerabilities, and simply functional bugs in AI implementations.
- Security vulnerabilities are weaknesses that can be exploited to cause harm, such as unauthorized data access or model manipulation.
- Exploitation is the act of using a vulnerability to cause some harm.
- Functional bugs refer to issues in the model that affect its performance or accuracy, but do not necessarily pose a direct security threat. Bugs can range from minor issues, like misspelled words in an AI-generated image, to severe problems, like data loss. However, not all bugs are exploitable vulnerabilities.
- Bias in AI models refers to the systematic and unfair discrimination in the output of the model. This bias often stems from skewed, incomplete, or prejudiced data used during the training process, or from flawed model design.
Understanding the difference is crucial for effective risk management, mitigation strategies, and most importantly, who in an organization should focus on which problems.
Forensics and Remediation of Compromised AI Implementations
Performing forensics on a compromised AI model or related implementations involves a systematic approach to understanding how the compromise occurred and preventing future occurrences. Do organizations have the right tools in place to perform forensics in AI models. The tools required for AI forensics are specialized and need to handle large datasets, complex algorithms, and sometimes opaque decision-making processes. As AI technology advances, there is a growing need for more sophisticated tools and expertise in AI forensics.
Remediation may involve retraining the model from scratch, which can be costly. It requires not just computational resources but also access to quality data. Developing strategies for efficient and effective remediation, including partial retraining or targeted updates to the model, can be crucial in managing these costs and reducing risk.
Addressing a security vulnerability in an AI model can be a complex process, depending on the nature of the vulnerability and how it affects the model. Retraining the model from scratch is one option, but it’s not always necessary or the most efficient approach. The first step is to thoroughly understand the vulnerability. Is it a data poisoning issue, a problem with the model’s architecture, or a vulnerability to adversarial attacks? The remediation strategy will depend heavily on this assessment.
If the issue is related to the data used to train the model (e.g., poisoned data), then cleaning the dataset to remove any malicious or corrupt inputs is essential. This might involve revalidating the data sources and implementing more robust data verification processes.
Sometimes, adjusting the hyperparameters or fine-tuning the model with a more secure or robust dataset can address the vulnerability. This approach is less resource-intensive than full retraining and can be effective for certain types of issues. In some cases, particularly if there are architectural bugs, updating or altering the model’s architecture might be necessary. This could involve adding layers, changing activation functions, etc. Retraining from scratch is often seen as a last resort due to the resources and time required. However, if the model’s fundamental integrity is compromised, or if incremental fixes are ineffective, fully retraining the model might be the only option.
Beyond the model itself, implementing robust security protocols in the environment where the model operates can mitigate risks. This includes securing APIs, vector databases, and adhering to best practices in cybersecurity.
Future Trends
The field of AI security is evolving rapidly. Future trends may include automated security protocols and advanced model manipulation detection systems specifically designed for today’s AI implementations. We will need AI models to monitor AI implementations.
AI models can be trained to detect unusual patterns or behaviors that might indicate a security threat or a compromise in another AI system. AI can be used to continuously monitor and audit the performance and outputs of another AI system, ensuring they adhere to expected patterns and flagging any deviations. By understanding the tactics and strategies used by attackers, AI can develop and implement more effective defense mechanisms against attacks like adversarial examples or data poisoning. AI models can learn from attempted attacks or breaches, adapting their defense strategies over time to become more resilient against future threats.
As developers, researchers, security professionals and regulators focus on AI, it is essential that we evolve our taxonomy for vulnerabilities, exploits and “just” bugs. Being clear about these will help teams understand, and break down this complex, fast-moving space.
Cisco has been on a long-term journey to build security and trust into the future. Learn more on our Trust Center.
We’d love to hear what you think. Ask a Question, Comment Below, and Stay Connected with Cisco Security on social!
Cisco Security Social Channels
Instagram
Facebook
Twitter
LinkedIn
Share: