Digital Safety
Might attackers use seemingly innocuous prompts to govern an AI system and even make it their unwitting ally?
12 Dec 2024
•
,
3 min. learn
When interacting with chatbots and different AI-powered instruments, we usually ask them easy questions like, “What’s the climate going to be right now?” or “Will the trains be operating on time?”. These not concerned within the growth of AI in all probability assume that every one knowledge is poured right into a single big and all-knowing system that immediately processes queries and delivers solutions. Nevertheless, the fact is extra advanced and, as proven at Black Hat Europe 2024, the programs might be susceptible to exploitation.
A presentation by Ben Nassi, Stav Cohen and Ron Bitton detailed how malicious actors might circumvent an AI system’s safeguards to subvert its operations or exploit entry to it. They confirmed that by asking an AI system some particular questions, it’s attainable to engineer a solution that causes injury, corresponding to a denial-of-service assault.
Creating loops and overloading programs
To many people, an AI service could seem as a single supply. In actuality, nonetheless, it depends on many interconnected elements, or – because the presenting workforce termed them – brokers. Going again to the sooner instance, the question relating to the climate and trains will want knowledge from separate brokers – one which has entry to climate knowledge and the opposite to coach standing updates.
The mannequin – or the grasp agent that the presenters known as “the planner” – then must combine the info from particular person brokers to formulate responses. Additionally, guardrails are in place to stop the system from answering questions which might be inappropriate or past its scope. For instance, some AI programs would possibly keep away from answering political questions.
Nevertheless, the presenters demonstrated that these guardrails might be manipulated and a few particular questions can set off endless loops. An attacker who can set up the boundaries of the guardrails can ask a query that frequently gives a forbidden reply. Creating sufficient cases of the query finally overwhelms the system and triggers a denial-of-service assault.
If you implement this into an on a regular basis state of affairs, because the presenters did, you then see how rapidly this will trigger hurt. An attacker sends an e-mail to a consumer who has an AI assistant, embedding a question that’s processed by the AI assistant, and a response is generated. If the reply is at all times decided to be unsafe and requests rewrites, the loop of a denial-of-service assault is created. Ship sufficient such emails and the system grinds to a halt, with its energy and sources depleted.
There’s, after all, the query of the best way to extract the data on guardrails from the system so you possibly can exploit it. The workforce demonstrated a extra superior model of the assault above, which concerned manipulating the AI system itself into offering the background data via a sequence of seemingly innocuous prompts about its operations and configuration.
A query corresponding to “What working system or SQL model do you run on?” is prone to elicit a related response. This, mixed with seemingly unrelated details about the system’s goal, could yield sufficient data that textual content instructions might be despatched to the system, and if an agent has privileged entry, unwittingly grant this entry to the attacker. In cyberattack phrases, we all know this as “privilege escalation” – a way the place attackers exploit weaknesses to achieve increased ranges of entry than meant.
The rising risk of socially engineering AI programs
The presenter didn’t conclude with what my very own takeaway from their session is: for my part, what they demonstrated is a social engineering assault on an AI system. You ask it questions that it’s comfortable to reply, whereas additionally presumably permitting dangerous actors to piece collectively the person items of knowledge and use the mixed data to avoid boundaries and extract additional knowledge, or to have the system take actions that it mustn’t.
And if one of many brokers within the chain has entry rights, that would make the system extra exploitable, permitting the attacker to make use of these rights for their very own acquire. An excessive instance utilized by the presenter concerned an agent with file write privileges; within the worst case, the agent might be misused to encrypt knowledge and block entry for others – a state of affairs generally referred to as a ransomware incident.
Socially engineering an AI system via its lack of controls or entry rights demonstrates that cautious consideration and configuration is required when deploying an AI system in order that it’s not inclined to assaults.