Someone will try to hack a new technology as soon as it is used extensively. The same is true with artificial intelligence, particularly generative AI. Google created a “red team” in response to this issue around a year and a half ago to investigate potential hacking strategies for targeting AI systems in particular.
The head of Google Red Teams, Daniel Fabian, said there isn’t much threat information accessible for genuine adversaries looking to target machine learning systems in an interview with The Register. The main weaknesses in the existing AI systems have already been identified by his team.
According to Google’s red team leader, adversarial attacks, data poisoning, prompt injection, and backdoor assaults are among of the greatest dangers to machine learning (ML) systems. These machine learning (ML) systems, such as ChatGPT, Google Bard, and Bing AI, are all based on sizable language models.
The phrase “tactics, techniques, and procedures” (TTPs) is a frequent name for these attacks. Google’s AI Red team recently conducted a study that revealed the most prevalent TTPs that attackers employ to target AI systems.
1. Negative AI system assaults
One type of adversarial assault is writing inputs with the specific intent of tricking a machine learning model. In a result, the model generates erroneous results or results that it wouldn’t generate under other circumstances, such in situations when the model may have been specifically instructed to avoid certain outcomes.
According to Google’s AI Red Team report, “The impact of an attacker successfully generating adversarial examples can range from negligible to critical, and depends entirely on the use case of the AI classifier.”
2. AI that poisons data
A common way that attackers may attack machine learning systems, according to Fabian, is through data poisoning, which entails interfering with the model’s training data to thwart its ability to learn.
Fabian says that “Data poisoning has become more and more interesting.” “Anyone, even attackers, is allowed to publish stuff online and propagate their harmful information. Therefore, it is our responsibility as the defense to learn how to recognize information that may have been corrupted in some way.
In order to bias the model’s behavior and outcomes, these “data poisoning” assaults entail intentionally injecting incorrect, misleading, or changed data into the training dataset. This may be demonstrated by purposely misidentifying faces in a facial recognition dataset by incorrectly labeling images from the collection.
Securing the data supply chain is one method to prevent data poisoning in AI systems, according to Google’s AI Red Team report.
3. Quick injection assaults
An AI system may be attacked by a user using rapid injection assaults to change the output of a model. Even if the model has been specifically trained to address these dangers, the output may still come up with unexpected, unfair, erroneous, and insulting responses.
Since the majority of AI organizations seek to create models that provide reliable and impartial information, it is essential to protect the model from users who have negative intents. This can require restricting the input to the model and carefully considering what input consumers can provide.
4. AI model backdoor attacks
Backdoor assaults on AI systems are one of the deadliest kind of assault and might go unnoticed for a very long time. A hacker might use backdoor assaults to steal data, bury code in the model, and disrupt model output.
“On the one hand, the attacks are very ML-specific, and they require a lot of machine learning subject matter expertise to be able to modify the model’s weights to put a backdoor into a model or to do specific fine-tuning to integrate a backdoor,” continued Fabian.
By installing and utilizing a backdoor, a covert access point that skips traditional authentication, the model may be used to carry out various attacks. In contrast, Fabian noted, “the defenses against those are very much traditional security best practices like having controls against bad insiders and locking down access.” Attackers may be able to target AI systems by stealing and transferring training data.
The rise of any new breakthrough in the constantly changing world of technology attracts prospective hackers’ interest. Artificial intelligence, especially generative AI, is subject to this fact. The initiative taken by Google in creating a special “red team” emphasizes the attention needed to foresee and thwart future assaults on AI systems.
Google’s red team, led by Daniel Fabian, has discovered significant flaws in existing AI systems, including as adversarial attacks, data poisoning, quick injection, and backdoor attacks.
In order to protect the integrity of AI technologies, their study highlights the necessity of defending against various strategies, techniques, and processes. This is done by highlighting the need for strong security measures and ongoing monitoring.