Tech

Mozilla says AI helped squash 423 Firefox security bugs

Published

on

security

Yet it remains unclear if Anthropic’s uber model was effective, or if better model middleware is what makes the difference 

Mozilla fixed 423 Firefox security bugs in April, a repair rate more than five times higher than the 76 fixes issued in March and almost 20 times higher than its 21.5 monthly average last year.

The browser maker previously said Anthropic’s ballyhooed Mythos Preview model found 271 of these in Firefox 150.

Advertisement

Now, a trio of technical types has come forward to provide a bit more detail about what Mythos (and its less storied sibling Opus 4.6) actually found. But they also highlight something that may matter more than the model: the agentic harness – the middleware mediating between AI and the end user.

Brian Grinstead, Firefox distinguished engineer, Christian Holler, Firefox tech lead, and Frederik Braun, head of the Firefox security team, observe that over the past few months, AI-generated security reports have gone from slop to rather more tasty.

They attribute the transformation to better models and development of better ways of harnessing those models – steering them in a way that increases the ratio of signal to noise. 

But they also appear to be aware that there’s some skepticism in the security community about Mythos. So they’ve decided to publicize selected wins in an effort to encourage others to jump aboard the AI bug remediation train.

Advertisement

“Ordinarily we keep detailed bug reports private for several months after shipping fixes and issuing security advisories, largely as a precaution to protect any users who, for whatever reason, were slow to update to the latest version of Firefox,” they said. 

“Given the extraordinary level of interest in this topic and the urgency of action needed throughout the software ecosystem, we’ve made the calculated decision to unhide a small sample of the reports behind the fixes we recently shipped.”

The post links to a dozen Firefox bugs with varying degrees of severity. The list includes, for example, a 20-year-old heap use-after-free bug (high severity) that a web page could trigger using the XSLTProcessor DOM API without any user interaction.

Many of these bugs are sandbox escapes, they note, which are difficult to find using techniques like fuzzing. AI analysis, they say, helps provide broader security coverage. And they add that it has helped validate prior browser hardening work designed to prevent prototype pollution attacks – audit logs showed AI models making unsuccessful exploitation attempts using this technique.

Advertisement

Following Anthropic’s announcement of Project Glasswing – a program for companies to gain early access to Mythos because it’s touted as too dangerous for public release – security experts expressed skepticism.

For example, Davi Ottenheimer, president of security consultancy flyingpenguin, wrote in an April 13 blog post, “The supposedly huge Anthropic ‘step change’ appears to be little more than a rounding error. The threat narrative so far appears to be ALL marketing and no real results. The Glasswing consortium is regulatory capture dressed up poorly as restraint.”

He subsequently ran a test in which he strapped Anthropic’s lesser models Sonnet 4.6 and Haiku 4.5 into a harness called Wirken with an auditing skill called Lyrik. The result was eight findings in two minutes at a cost of about $0.75, Ottenheimer claims, noting that two of the eight matched bugs Mythos had identified.

Other security folk have also reported that bug hunting and exploit development can be quite productive with off-the-shelf models like Opus 4.6, which among other virtues costs about 5x less than Mythos.

Advertisement

In an email to The Register, Ottenheimer said, “There’s a fundamental philosophical failure in the Mozilla post. A reading and a measurement are not the same thing. I don’t see a measurement, but they seem to want us to believe we’re looking at one. 

“When they give us the ‘behind the scenes math’ it’s circular, a trick. ‘Mythos found 271 bugs’ is what Mythos found, not what other tools could not find against the same code. Why leave it as an assumption if it can be proven?”

Ottenheimer said Mozilla advocates that every project adopt a similar approach without proving the merits of that approach.

“It’s like saying if you don’t drink Coca-Cola, you can’t run a mile under six minutes, because that’s what a guy sponsored by Coca-Cola just did,” he said. “The bar moves on rhetoric, marketing, not proper evidence. That is the capture crew again.”

Advertisement

He notes that the merits of Mythos might be more convincing if Mozilla had reported they couldn’t do this work without Mythos. And since they’re not saying that, he suggests, it’s worth asking why there’s no transparent comparison of Mythos to other models.

He points to Mozilla’s admission that Opus 4.6 was already identifying “an impressive amount of previously unknown vulnerabilities.”

“Mozilla never quantifies what Opus 4.6 [did] before saying what Mythos added,” he said. “So 271 attributed to Mythos doesn’t fit the analysis. And there’s a deeper reveal when they say ‘we dramatically improved our techniques for harnessing these models.’ The improvement may be entirely in the harness, not as much in the model. This maps to my own experience. A nail gun has advantages over the hammer, yet without being in the right hands the outputs are as bad or worse.” ®

Source link

Advertisement

You must be logged in to post a comment Login

Leave a Reply

Cancel reply

Trending

Exit mobile version