by Sakshi Dhingra - 5 days ago - 5 min read

In a recent update on Atlas’ security posture, OpenAI compared prompt injection to long-standing human-focused threats like scams and social engineering, stressing that this class of attack is not something the industry can simply “patch away.”
“Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully ‘solved,’” the company wrote, framing it as a long-term hazard that developers and users will need to continually manage rather than eliminate.
OpenAI also acknowledged that the very features that make Atlas powerful — especially its “agent mode,” which lets it act on the user’s behalf — inevitably broaden its exposure.
The company conceded that agent mode “expands the security threat surface,” because it gives the AI more autonomy and access across emails, websites, and other connected services.
ChatGPT Atlas only launched in October, but security researchers moved quickly to test its defenses in the real world. Within days, demonstrations circulated showing that a short instruction hidden inside a simple Google Doc was enough to nudge the AI-driven browser into unexpected behavior, overriding the user’s apparent intent.
The same day Atlas debuted, other browser makers publicly highlighted that this was not just OpenAI’s problem. Brave, for example, published its own analysis explaining that indirect prompt injection — where malicious instructions are buried in third-party content — is a systematic challenge for AI-powered browsers, including competitors such as Perplexity’s Comet, and not something any single vendor can fully avoid.
OpenAI’s warning comes as government cybersecurity agencies start to sound similar alarms. The U.K.’s National Cyber Security Centre recently cautioned that prompt injection attacks on generative AI applications “may never be totally mitigated,” underscoring that organizations should focus on reducing the risk and impact of such attacks rather than assuming they can be “stopped” outright.
According to the agency, websites that plug AI agents into internal data and business workflows could be at particular risk of data exposure if those agents are tricked into following hostile instructions hidden in user content, logs, or external pages.
Their guidance encourages security teams to treat prompt injection as a persistent design-level risk and to build layered controls around where agents can read from and what actions they are allowed to take.
To respond to what it describes as a “long-term AI security challenge,” OpenAI is leaning heavily on large-scale simulation and fast patch cycles. “We view prompt injection as a long-term AI security challenge, and we’ll need to continuously strengthen our defenses against it,” the company said, emphasizing that this will be an ongoing process rather than a one-off security update.
At the center of its strategy is an unusual tool: an “LLM-based automated attacker” trained with reinforcement learning to behave like a tireless red-team hacker. In OpenAI’s description, this bot repeatedly probes Atlas and related agents, crafting and refining attack prompts in simulation, watching how the target AI would think and act, then tweaking the attack and trying again to uncover weaknesses that might never show up in a standard human red-teaming campaign.
OpenAI says this approach allows it to discover “novel attack strategies” — including long, multi-step workflows that unfold over tens or even hundreds of actions — before attackers in the real world stumble upon them. In one internal demo, the automated attacker was able to slip a malicious email into a user’s inbox, causing the agent to follow hidden instructions and send an unintended resignation email instead of a harmless out-of-office reply; after OpenAI’s security update, Atlas’ agent mode reportedly learned to detect and flag that same injection attempt rather than obey it.
External security researchers, however, caution that even sophisticated internal testbeds will only be one part of the solution. One security expert described a useful way to reason about AI risk as “autonomy multiplied by access”: the more freedom an agent has and the more sensitive systems it can touch, the more damage a successful prompt injection can do.
By that logic, agentic browsers sit in an especially dangerous zone: they have moderate autonomy but very high access, often sitting on top of email accounts, payment methods, personal documents, and third-party apps. As a result, many current recommendations focus on constraining what these agents can do by default — limiting logged-in sessions, requiring explicit confirmation before sending messages or making payments, and narrowing the instructions users give instead of telling an agent to “handle whatever is needed.”
OpenAI insists that protecting Atlas users against prompt injection is a top priority and points to its collaboration with external partners and security firms that have been probing the browser since before launch. The company also says its rapid-response loop — powered by automated attackers and ongoing monitoring — is already helping it find and fix vulnerabilities faster, even if it has not yet disclosed hard numbers on how many successful attacks have actually been prevented.
Some experts remain unconvinced that today’s AI browsers have justified the trade-off.
Security researchers who study these agentic tools argue that, for most everyday use cases, AI browsers still do not deliver enough unique value to match their current risk profile, given how deeply they can reach into sensitive data like emails, documents, and payment information.
That equation may change as models improve and as best practices for limiting autonomy and access become more mature and standardized. For now, however, the message from both OpenAI and independent security voices is strikingly aligned: AI agents and AI browsers are promising, but prompt injection is a structural problem that will need constant attention — and one that users, developers, and platform providers will likely be managing for years to come, not months.