SafeBreach shows WhatsApp notifications can be used to hijack Google Gemini

SafeBreach finds a new way to steer Gemini through notifications

SafeBreach Labs says it has uncovered a fresh indirect prompt injection technique that can manipulate Google Gemini through notifications arriving from messaging apps. The researchers say the method works with services such as WhatsApp, Slack, SMS, Signal, Instagram and Messenger, widening the attack surface beyond the calendar-based issues the company previously studied.

The research focuses on Gemini’s voice assistant behavior on Android devices. According to SafeBreach, attacker-controlled content in a notification can be folded into Gemini’s context in a way that changes how the assistant responds, even when the user never sees the malicious instruction. The company says this can happen through seemingly ordinary message alerts.

SafeBreach says it was able to bypass Google’s newer protections using a technique it calls Fake Context Alignment. The idea, as described by the researchers, is to shape the conversation so Gemini appears to be asking for permission and receiving it, while the user is actually being steered into agreeing to something else. That allowed the team to re-create effects similar to a prior delayed-action attack it had disclosed against Gemini.

The company says the impact goes beyond simple prompt manipulation. In demonstrations, it says Gemini could be influenced to generate spam or phishing-style text, produce toxic content, and interact with connected tools. Those tools included smart home functions such as windows, boilers and lights, as well as actions that open URLs or launch app-specific links. SafeBreach also said the technique could be used to start Zoom video streams.

A particularly concerning part of the research involves social engineering. SafeBreach says a poisoned notification can make Gemini repeat a fake message as if it came from a trusted contact, such as a manager or friend. In some cases, the researchers say an attacker would not even need to know the contact’s name beforehand. They say Gemini could be prompted to take the first real name found in a notification queue and attach a fabricated message to it, making large-scale impersonation possible.

The researchers also say the attack can persist. They describe scenarios in which a malicious prompt could alter Gemini’s long-term memory or schedule recurring actions, creating a longer-lived compromise than a one-time response manipulation.

SafeBreach argues that the problem extends to the design of agentic AI systems more broadly. The researchers say assistants that combine backend processing with user-facing conversations can be vulnerable when they treat hidden or partially visible content as part of the same exchange. They also say trust in external sources, like messaging apps and contacts, becomes a major security issue when those signals are blended into the assistant’s reasoning.

According to the company, Google has since released content classifier updates after responsible disclosure to reduce the risk. SafeBreach’s findings add to a growing list of concerns about indirect prompt injection, a class of attacks that aims to influence AI systems by feeding them malicious instructions through data they ingest rather than direct user prompts.