TL;DR
Chaotic Eclipse dropped RoguePlanet, their seventh Windows zero-day, hours after Microsoft’s record Patch Tuesday. It grants SYSTEM access on fully patched machines.
Google’s Quick Share is the kind of feature you don’t think about until the day you need it and your phone simply doesn’t have it. Huawei device owners live in that reality permanently, given that they don’t have access to Google Play services, and so does anyone running the Chinese regional build of Android.
However, a developer with the handle Kyujin-cho just published an open-source Android app called Bada on GitHub that seems to solve exactly this problem. It does so by implementing Google’s own Quick Share protocol from scratch, circumventing the lack of Google Play Services.

Once Bada is installed on a device that lacks Quick Share, it becomes fully interoperable with any Quick Share-equipped Android device nearby on the same Wi-Fi network. The same four-digit PIN confirmation process that users already know shows up on both the sending and receiving sides.
Using the app, people can send files from any Android app (via the system share sheet), receive files to a specific folder, and even send entire folders, with the directory structure intact. Like Quick Share, the app supports Wi-Fi LAN as the transfer route, with BLE-based identification for devices running on stock Android and Samsung’s One UI.
Testing has already confirmed that Bada works with Galaxy S26 Ultra and Z Fold 7 over BLE GATT bootstrap. NearDrop on macOS and Quick Share on Windows are listed as targets; however, they remain untested.

According to Android Authority’s hands-on testing, the app experience isn’t exactly seamless when sharing files from a Quick Share device to a Bada device. Windows transfers completely failed.
The project sits at 10 GitHub stars and one fork, which is still early-project territory by all means. The codebase is open-source, meaning anyone with the technical know-how can verify what it’s actually doing with their files.
The app itself confirms that transfers still use Quick Share’s encryption method. The developer explicitly targets interoperability with NearDrop and Windows Quick Share for the near future.
In my opinion, Bada won’t replace Quick Share for most people, but for Huawei users, along with the Chinese Android users Google quietly left behind, or for any other Android user whose phone doesn’t ship with Quick Share out of the box, it’s the closest thing to a real solution anyone has bothered to build.
Algorithms. Beauty filters. Endless scrolling.
The case over “social media addiction” against Meta and Google in a California courtroom ultimately came down to these elements, legal experts say, and what a jury found was negligence on social media companies’ part when designing apps where tweens and teens would come to spend roughly one-fifth of their day.
Joseph McNally, former federal prosecutor and director of Emerging Torts and Litigation at McNicholas & McNicholas in California, says jurors agreed with the novel legal argument that Meta and Google were negligent in their design of Instagram and YouTube, respectively, contributing to the mental health problems of the plaintiff. Parent companies of Snapchat and TikTok settled with the plaintiffs before the trial.
McNally and other experts tell EdSurge the verdict will affect thousands of similar cases and influence how tech companies roll out their features — and that the legal tussle over where liability falls when it comes to youth mental health isn’t over yet. With the social media giants vowing to appeal, the case could end up before the U.S. Supreme Court.
The impact left by the presentation of internal company emails was undeniable, McNally says. Internal Meta communications showed that employees raised alarms about the potential harm to teen girls posed by a beauty filter. Documents also showed they knew that users much younger than 13 — the minimum age required for sign up — were on their platforms, he adds.
“They looked the other way because — the plaintiffs argued — they had a long-term benefit, long-term value of hooking those users early,” McNally says. “I think that the emails painted a picture of a company whose own employees were raising concerns about features in the product, and the plaintiff effectively used those emails to show that they knew about the risk of the product.”
If Meta and Google had settled, the court wouldn’t have had cause to grapple with the legal question of whether social media companies can be held liable for harm caused by their design. But from the defense’s perspective, tech companies had been solidly protected by Section 230 in the past, explains Princess Uchekwe, corporate attorney and founder of The Chief Counsel in New York. That’s the part of the 1996 Communications Decency Act that shields websites and online platforms from being sued over content posted by users.
Just one day before the California verdict, a New Mexico jury found Meta liable in a $375 million consumer protection lawsuit over its failure to protect children from social media harm on its platforms.
“What the lawyers for the plaintiffs were arguing is, essentially, it’s not the content that we have a problem with,” Uchekwe says, “It’s the fact that when people use your platform, you have implemented certain features that make it almost impossible for people to leave. You can scroll into the bottomless pit of hell on Instagram, and nothing ever tells you, ‘Maybe you should pause.’”
NEWSLETTERS
Sign up for EdSurge newsletters for timely news, insights and analysis.
The $6 million in damages is a drop in the bucket for the two social media giants, but McNally says there are potential benefits to appealing the ruling anyway. There are thousands more consumer lawsuits against social media companies around the country, with school districts joining as plaintiffs.
One is that an appellate court might find that the long-time protections that social media companies have relied on should have come into play. The verdict barreled through the defenses raised by Section 230, which protects platforms from claims of harm caused by third-party content. It’s a policy that makes a free and open internet possible.
“[Section] 230 has resulted in the dismissal of hundreds of lawsuits over the years where they would’ve otherwise faced hundreds of millions of dollars in liability,” McNally says. “An appeal [based on] Section 230, which is a federal statute, could make its way up to the Supreme Court, who would have the final word on the scope. [If the] court of appeals remanded it back to the trial court and said, ‘Look, Section 230 applies,’ it would essentially bar these claims [of harm caused by the design].”
Uchekwe says failure to win an appeal could be “almost devastating” for tech companies due to the sheer amount of damages they could have to pay across thousands of similar lawsuits, along with the cost of restructuring how their apps function. That could mean rethinking features like targeted algorithms, the ability to endlessly scroll and notifications that draw users back into the app.
“Not only social media companies,” Uchekwe says, “all tech companies that have implemented things like that, especially if they have children as a base, are going to have to start reconsidering.”
There’s also a First Amendment case to be made, McNally adds. Some legal experts, including UC Berkeley law professor Erwin Chemerinsky, argue that the “addictive” algorithms that came under fire during the trial are protected free speech. If that argument succeeds on appeal, it could stop the legal cases arguing product liability in their tracks.
“If the Supreme Court overturned it based on Section 230 and the First Amendment, it’s unlikely there’s going to be a new trial. It would likely be dismissed,” McNally says. “I won’t say that with certainty, but the prospects of dismissal would be pretty good for the defendants.”
McNally says the fact that a jury ruled Meta and Google’s app features were “unreasonably unsafe for its users” creates challenges for them in the swaths of similar lawsuits they’re facing. Plaintiffs in those cases still must prove a direct link between the social media companies and the harm they’re alleging.
“I think it’s going to result in some cases probably moving closer to settlement, but in all those cases, I think that the defendants are going to be looking closely at the causation issue,” McNally says. “There’s probably other cases out there where the evidence of causation is not as strong, and those cases may be harder for a plaintiff to get across the finish line.”
Uchekwe predicts that if the verdict sticks, tech companies — especially those with users who are under 18 — will be forced to retool their app features to encourage users to spend less time on their platforms. That could hurt the companies’ ad revenue and their ability to gather data on users.
“Undoing some of those things may decrease their bottom line, but I’m not sure it will do it to the extent that it’s detrimental to their revenue,” Uchekwe says. “If you weigh the benefits of putting these safeguards in for children versus your revenue, I never think that your profit should come at the expense of a generation of people.”
Nadia Tamez-Robledo (@nadiatamezr) is a reporter covering K-12 education for EdSurge with focuses on student and teacher mental health and changing demographics. You can reach her at nadia [at] edsurge [dot] com.
William Shakespeare wrote “The Tragedy of Hamlet, Prince of Denmark” (which, for obvious reasons, is typically referred to as just “Hamlet”) somewhere around 1600. And for centuries, the age-old philosophical question was, “To be, or not to be?” Had ol’ Willy been born in modern times, though, that question might instead have been, “To Costco, or to Sam’s Club?” Because if we’re being honest, that’s a far more important question, as it directly impacts our wallets on a near-every-day basis.
Most of us who visit these big-box stores are looking for a way to save money. When we leave pushing two carts full of stuff we didn’t know we needed in the first place, though, did we really save anything at all? Consumerist anxieties aside, believe it or not, both stores opened in 1983 and began the Costco versus Sam’s Club rivalry we still have today. They’re almost like a modern-day Hatfields and McCoy, but the preferred weapon of choice is bucks over bullets.
Technically, Sam’s Club (founded by Walmart’s Sam Walton) struck first, flinging open the doors to its first members-only store in Midwest City, Oklahoma, in April of 1983. Costco opened its first store in Seattle, Washington, just a few months later, in September of that same year. While both started in the same year, the story of these two economic juggernauts (and their rivalry isn’t that clean and simple.
A store called Price Club opened in 1976 in what had once been an airplane hangar on Morena Boulevard in San Diego, California. Founded by Sol Price and his son, Robert, it’s considered the world’s first membership warehouse club, and initially catered only to business customers in need of supplies and wholesale items. Jim Sinegal was the executive vice president of merchandising, distribution, and marketing for this lone warehouse store, which took off and thrived for several years.
In April 1983, Walmart’s Sam Walton launched his competing chain, Sam’s Club. Then, Jim Sinegal, taking what he had learned from Price Club, teamed up with Jeffrey Brotman to open the first Costco in September of the same year — and the big box store war truly began. A decade later, Sam’s Club was the dominant leader, raking in $14.7 billion annually at its roughly 400 stores. Second, with 94 stores, was Price Club, while Costco’s 103 stores placed it in third.
Realizing they wouldn’t be able to win the war by maintaining that status quo, Price Club merged with Costco in 1993, with the new enterprise relaunching as PriceCostco. The new company quickly generated $16 billion annually from 206 stores, edging out Sam’s Club, and eventually renamed itself Costco in 1997. Today, Sam’s Club and Costco are locked in a seemingly never-ending battle, with the two companies vying to offer the better deal on tires, televisions, and other goods to customers.
Apple has terminated support for AFP in macOS 27, effectively killing off the Time Capsule. However, affected owners might be able to revive their hardware.
A long-discontinued network storage device, Time Capsules gave Mac users a way to back up over a home network using Time Machine. While the hardware hasn’t been available for quite a few years, support continued up to macOS 26.
However, as warned in macOS Sequoia 15, support for the Apple Filing Protocol, AFP, was being deprecated and removed in a future macOS release. That turned out to be macOS 27, thanks to a notice in macOS 26 warning about the end of support for AirPort Disk and other Time Capsule disks.
This is an issue that affects Time Capsule specifically, as it relies on AFP for its connectivity. While Time Capsule does include support for SMBv1 (Server Message Block), it was only supported in macOS 26 as a deprecated measure.
From macOS 27 onwards, Time Machine will require hardware using SMBv2 or SMBv3. This will mean it will work with modern NAS devices, but not Time Capsules.
While Time Capsules in their normal state won’t work for Time Machine, there are efforts to try and add the required functionality to the hardware.
A GitHub project we wrote about in April, titled TimeCapsuleSMB, aims to update the outdated SMB layer with a newer one, while keeping Apple’s firmware untouched. This way, Apple’s file sharing stays enabled, so your internal disk, or connected USB ones, keep auto-mounting and working on the forthcoming macOS 27.
Really, it’s a modern Samba build to manage file sharing that’s loaded onto the Time Capsule. It runs Samba 4.24.3 server, advertises itself with Bonjour, and accepts authenticated SMB3 connections.
At that point, assuming the project ever works, you can connect to the server using a normal SMB URL, and then use it for Time Machine backups.
When we first wrote about the project, there were concerns that it was more a proof-of-concept than a full project. However, at the time of publication, there have been many commits to the project, including some that are just hours old.
According to the project’s requirements, you need to use a Mac running macOS 14 or later, or a Linux device on the same local network as the Time Capsule. You also need the password for the Time Capsule, as well as Homebrew, Python 3.9 or later, and smbclient installed locally.
The instructions to install it are quite complex, which puts the project out of reach of the typical user. However, near the top is a “Quick Start” option that relies on just five commands, streamlining the process.
As it stands, Time Machine users have few choices in how they maintain their backups. They could look for an external drive or invest in a NAS, as the most obvious, if expensive, solutions.
But, with a project like TimeCapsuleSMB, there’s a chance of reviving an underappreciated part of Apple’s former product line.
SECURITY
The CEO thought this was the best way to deal with some email issues
PWNED Welcome, once again, to PWNED, the weekly screed where we highlight those who did not do the deed of securing their systems. If someone left their passwords or their access exposed, we will be writing about them here.
Have a story about someone leaving a gaping hole in their network? Share it with us at pwned@sitpub.com. Anonymity is available upon request.
This week’s terrifying tale of poor security hygiene comes courtesy of Luke Irwin, CEO and principal consultant at Aegis Cybersecurity. He’s been in the industry for more than a quarter of a century and he knows where the bits are buried.
At one point, Irwin consulted for a company that was a large national facility services organization, a 2,000-employee firm that provided cleaning, security guards, industrial abseiling (cleaning the facade), and other things that other large businesses need to keep their physical plants running smoothly.
The CEO had one very peculiar idea about how to keep his own house in order: he wanted to have access to every one of his employees’ login credentials.
The chief executive had an Excel spreadsheet sitting right on his desktop with a complete list of all the employee usernames and passwords. Let that sink in for a second. One person had all the keys to the castle in a single, easily accessible file.
In any decent security setup, no one in the company has access to anyone else’s password. Even the head of the IT department should not know another employee’s password. I say this as someone who used to work for a company where the IT department would ask you to DM them your password if you had computer problems.
But this company’s CEO wanted the usernames and passwords for reasons I’m sure any of his employees would appreciate: so he could go into their email accounts! He had an experience where one colleague had sent secret information to the entire company via email and he had spent the evening logging into every single account and deleting the message before anyone could see it.
Just in case other messages were sent in error in the future, the CEO wanted the ability to log into all the relevant accounts and delete them himself. Perhaps for the same reason, he would not allow MFA (multi-factor authentication), because that would have kept him out of people’s inboxes. He was adamant even though the company had been the victim of a ransomware incident previously.
“Despite repeated advice, he held that position for around four months, until we were able to demonstrate that the IT team could remove messages centrally using fairly simple administrative commands, without needing everyone’s password,” Irwin said.
Even after getting rid of the Excel sheet of shame, the boss still refused to turn on MFA and the company subsequently suffered two data breaches involving sensitive client data.
Unfortunately, this company wasn’t the only one that Irwin worked with where the management had something against MFA. Another client, this one in the medical sector, was opposed to multi-factor authentication because it “made things just a little too hard” for the external consultants they were using to access their systems.
During the time that Irwin worked with that company, they got lucky and no one breached them. But since then, he’s seen signs that their data was available on the dark web. No word on whether they ever switched MFA on.
There’s plenty to learn from Irwin’s two clients, but it’s all pretty obvious. First, don’t let anyone, even administrators or CEOs, have other people’s passwords. If someone has to get into another person’s email account, have IT use administrative access. Second, always enable MFA, preferably MFA with passkeys. ®
Chaotic Eclipse dropped RoguePlanet, their seventh Windows zero-day, hours after Microsoft’s record Patch Tuesday. It grants SYSTEM access on fully patched machines.
Chaotic Eclipse, the security researcher Microsoft threatened with criminal prosecution, has published a seventh Windows zero-day exploit. Called RoguePlanet, it grants attackers SYSTEM privileges on fully patched Windows 10 and 11 machines. The researcher released the proof-of-concept hours after Microsoft shipped its June Patch Tuesday update, which fixed a record 200 vulnerabilities.
RoguePlanet exploits a race condition in Windows Defender’s internal processing logic. Specifically, it is a Time-of-Check to Time-of-Use (TOCTOU) vulnerability. An unprivileged user can redirect a file operation performed by Defender, which runs as SYSTEM, to execute attacker-controlled code at the highest privilege level.
“The exploit is a race condition, so it’s a hit or miss,” the researcher said. “I have managed to get a 100% success rate on some machines while it struggled to work on others.”
Security firm ThreatLocker confirmed the flaw works and published a video demonstration. “Our initial analysis confirms that the RoguePlanet exploit is viable and performs as described,” said CEO Danny Jenkins. He added that application allowlisting can prevent the exploit from executing.
The proof-of-concept was published on a self-hosted Git repository after the researcher said Microsoft had both GitHub and GitLab repositories hosting earlier work removed. This is part of an escalating dispute. Microsoft invoked its Digital Crimes Unit against the researcher and revoked access to their Microsoft Security Response Center account.
Chaotic Eclipse has disclosed seven zero-days in a matter of months: BlueHammer, RedSun, UnDefend, YellowKey, GreenPlasma, MiniPlasma, and now RoguePlanet. Microsoft’s June Patch Tuesday fixed two of them, GreenPlasma and YellowKey, but the rest remain unpatched. The researcher says the disclosures are retaliation for how Microsoft handled the process.
“They mopped the floor with me and pulled every childish game they could,” the researcher wrote. “I was wondering if I was dealing with a massive corporation or someone who is just having fun seeing me suffer.”
The timing is pointed. Microsoft’s June Patch Tuesday was its largest ever, fixing 200 vulnerabilities including 33 rated critical and three publicly disclosed zero-days. Analysts attribute the surge in part to AI-assisted code auditing, which is finding vulnerabilities faster than defenders can patch them. RoguePlanet arriving hours after the record update underscores the gap: even the biggest patch cycle in Microsoft’s history was immediately obsolete for anyone running Windows Defender.
[Dennis] is on YouTube with his channel “Made By Dennis,” but for the record he is a maker, not a V-tuber. On the other hand, his latest project– creating a profesisonal-level tracking rig with DIY IR cameras and a whole lot of moxie–does mean he’s now equipped to make the move to the prestigious, high-status world of pretending to be an anime girl.
That is of course not why he did it. Like most projects around here, the motivation was more a case of “I wonder if I can…”– in this case [Dennis] wondered what it would take for him to pull off the same sort of optical motion capture, or MoCap, that is used in Hollywood studios. Optical mocap has the advantage of being very precise, able to track things at high speeds, and not being in any way limited to the human form like the slew of AI-assisted methods hitting the market right now. The disatvantage is that you need to place markers on any part of your subject you want tracked, film them from all angles, and process a whole lot of pixels. In [Dennis]’s case, it ended up being about four billion. Keeping in mind that actually locating those points in 3D space is dependent on knowing exactly where your cameras are: if you want sub-millimeter precision, your cameras need to be fixed with sub-millimeter tolerance. It’s a big project, hence a long video, which is embedded below.
The DIY cameras use a AR0234 MIPI camera on a custom PCB with M12 lenses and IR filters. To improve the signal-to-noise ratio on optical MoCap, it’s standard to use near-IR light. The camera boards, as you might expect given the MIPI interface, hook into Raspberry Pi compute modules– the cheapest CM4 should work, though he’s using CM5s. The compute modules sit on custom boards that provide PoE, and some other niceties– like a small microcontroller driven by the pulse-per-second pin to help trigger the cameras in sync.
Each camera gets a ring light of near-IR LEDs that pulse at 160 W, which would be way more than PoE is specced to provide, but since the LEDs are only on when the camera is taking a frame, the average power is well within allowable limits. With 16 cameras each having their own ring light, that’s a lot of near-IR photons. Don’t forget your safety squints!
Rather than process the images with OpenCV, he has his own custom solution optimized for this use-case that [Dennis] reports is 300x faster. Luckily, he’s put his implementation on GitHub, along with the rest of the project. Even if you don’t have any v-tubing ambitions, this project is very impressive and worth checking out in its entirety.
Optical MoCap isn’t the only game in town, of course. If you want to do this cheap and easy, you can strap a bunch of IMU sensors to yourself– just don’t expect the same precision.
Thanks to [Dennis] for the tip!
Following its 2024 acquisition of McIntosh and Sonus faber, Bose is making another calculated move in connected audio with the acquisition of StreamUnlimited Engineering GmbH, a Vienna-based company that supplies streaming software platforms, hardware modules, app frameworks, certifications, and engineering support for audio and smart home manufacturers.
This is not just Bose buying another parts supplier. StreamUnlimited gives Bose something far more useful: the software plumbing and certification backbone needed to build, support, and potentially license connected audio products across multiple brands and categories. That matters for Bose, but it may matter even more for McIntosh and Sonus faber, two premium audio brands that need stronger streaming ecosystems if they are going to compete in a market increasingly shaped by BluOS, Sonos, HEOS, WiiM, AirPlay, Google Cast, Spotify Connect, TIDAL Connect, Qobuz Connect, and Roon.
The deal also gives Bose a broader path to embed its proprietary audio technologies, including Sound by Bose and the Bose WaveForm Audio Engine, into more products beyond its own speakers and headphones. That could include smart speakers, soundbars, multiroom systems, mobile devices, wearables, automotive audio, and third-party connected products. In other words, Bose is not just chasing another box for the shelf. It is buying the infrastructure needed to make its audio technology travel farther.

Two current examples of Bose’s Sound by Bose strategy already exist in the wild: Epson’s Lifestudio projector lineup and Skullcandy’s Method 360 ANC earbuds. In both cases, Bose is not selling a finished Bose-branded speaker or headphone. It is licensing its audio tuning, acoustic design, and performance credibility into third-party products that need better sound to stand out.
That makes the StreamUnlimited acquisition more interesting. If Bose can combine Sound by Bose with StreamUnlimited’s streaming software, app frameworks, hardware modules, certification work, and connected-audio engineering, the company gains a much wider path to expand beyond its own products. Epson and Skullcandy show the basic strategy. StreamUnlimited could help scale it.
“As connected ecosystems scale and become more complex, how devices work together is a central driver of value,” said Nick Smith, president of Bose Audio Technology and chief strategy officer. “StreamUnlimited has built a trusted position at the center of this coordination layer, where interactions between devices are defined and orchestrated. We’re excited to welcome their team to Bose as we bring our capabilities to more partners, products, and experiences.”
“We look forward to joining with Bose as we expand StreamUnlimited’s offerings and accelerate the development of next-generation intelligent audio experiences for our customers,” said Frits Wittgrefe, CEO at StreamUnlimited. Markus Rutz, CTO at StreamUnlimited, added, “There is a significant opportunity to further advance the orchestration capabilities at the core of our platform, enabling more seamless, adaptive, and AI-driven audio ecosystems. This will unlock broader access to new streaming technologies, services, and capabilities, positioning us for continued growth as the market evolves.”
StreamUnlimited will continue to support both current and new customers, while extending its expertise into new markets. Its solutions will remain fully supported, interoperable, and open to integration with third-party technologies, products, and ecosystems.
Additional information about the acquisition, including financial and other transaction terms, remains confidential at this time.

Bose’s acquisition of StreamUnlimited is not about turning McIntosh and Sonus faber into Bose-branded lifestyle products. So far, both brands have continued on their own legacy paths. What this deal really gives Bose is something more strategic: the software, streaming, app, certification, hardware-module, and engineering infrastructure needed to compete in connected audio at a much larger scale.
That matters across the portfolio. Bose gets more control over the platform layer behind smart speakers, soundbars, headphones, wearables, automotive systems, and third-party products using Sound by Bose. McIntosh could benefit from stronger connected amplifiers, streamers, preamps, and in-car systems without losing its identity. Sonus faber gains a clearer path toward active, wireless, and lifestyle products that still feel like Sonus faber, not another anonymous app-controlled box.
The AI angle is part of the story, but not the whole story. StreamUnlimited gives Bose a foundation for more adaptive, personalized, voice-enabled, and software-driven audio experiences. That does not mean Bose bought an AI company or that a Bose rival to BluOS, Sonos, HEOS, or WiiM appears tomorrow. It means Bose now owns more of the plumbing required to build one, license one, or embed its technology more deeply into other companies’ products.
Bose is no longer just chasing the next speaker or headphone. It is building a broader audio technology ecosystem that can live inside its own products, its luxury brands, and the products of third-party partners. That is the real move.
Researchers from the University of California, Berkeley’s Center for Responsible, Decentralized Intelligence (RDI), alongside an advisory committee of over 300 domain experts, have launched Agents’ Last Exam (ALE)—a grueling new benchmark built to measure whether artificial intelligence can actually execute economically valuable, long-horizon professional workflows.
In a shocking upset, OpenAI’s GPT-5.5 from April, operating through the Codex harness, secured the absolute top spot on the new ALE Leaderboard with a 24.0% pass rate, beating Anthropic’s highly anticipated, brand new Mythos-class Claude Fable 5 model released just yesterday, which came in third with a score of 22.0%.
Rather than testing models on isolated coding puzzles, ALE is explicitly designed as an instrument to close the gap between academic benchmark hype and real, GDP-relevant labor impact. And right now, the data proves the most advanced models in the world are fundamentally failing the exam.
The fundamental shift in ALE lies in its evaluation architecture and the demands it places on the agent.
Historically, AI benchmarks have relied on static question-answering or narrow, text-based terminal environments. More recent agentic evaluations introduced multi-step interaction but suffered from severe grading issues.
As noted in recent independent audits of older leaderboards like SWE-Bench Pro, automated verifiers frequently reject correct solutions, and certain models—specifically the Claude Opus family—have been caught “cheating” by reading hidden answer keys in a container’s Git history rather than solving the underlying problem.
ALE neutralizes these loopholes by forcing models into a strict Generalist Computer-Use Agent (GCUA) framework. To pass, an agent cannot merely execute terminal commands.
The benchmark maps capability across five functional layers: Brain (reasoning), Eyes (visual perception), Body (orchestration), Hands (tool invocation), and Feet (runtime substrate).
An agent must use its “Eyes” and “Hands” to navigate Linux or Windows virtual machines, interleaving shell scripting with point-and-click operations inside heavy desktop software.
Crucially, ALE almost entirely rejects the unpredictable “LLM-as-a-judge” grading paradigm, relying on it for a mere 6.8% of its workflows. If a task involves generating a 3D mesh or parsing SEC filings, the benchmark uses deterministic, code-based evaluation to compare the agent’s artifact against an expert’s ground-truth reference.
ALE launches with 1,490 task instances and is scaling toward a massive 5,000-task target. What makes the product remarkable is its authenticity. The tasks are strictly anchored in the U.S. federal occupational taxonomy (O*NET / SOC 2018), covering 55 non-physical industry sub-domains.
The workflows are sourced directly from the professional histories of industry practitioners. Agents are asked to perform 3D model creation in Siemens NX, scene setup in Unreal Engine, neuroimaging analysis in FSLeyes, and visual effects compositing in Adobe After Effects.
When faced with these authentic, long-horizon workflows, the limitations of current AI are glaring. ALE divides its tasks into three difficulty tiers: Near-Term, Full-Spectrum, and Last-Exam.
|
Rank |
Agent Harness |
Underlying Model |
Pass Rate |
Mean Score |
|
1 |
Codex |
gpt-5-5 |
24.0% |
42.8% |
|
2 |
Ale Claw |
gpt-5-5 |
23.0% |
45.8% |
|
3 |
Claude Code |
claude-fable-5 |
22.0% |
40.5% |
|
4 |
OpenClaw |
gpt-5-5 |
21.1% |
41.0% |
|
5 |
Cursor CLI |
composer-2-5 |
20.4% |
38.5% |
The victory of GPT-5.5 aligns with recent third-party analysis suggesting that OpenAI’s models are currently superior at strictly adhering to multi-part, complex prompts. Conversely, users report Anthropic’s Claude architecture can sometimes be “forgetful” with multi-part instructions, abandoning required steps mid-workflow — a fatal flaw in ALE’s rigorous pipeline.
And while hitting a 24.0% pass rate is enough to claim the crown, the absolute performance ceiling remains remarkably low.
On the hardest “Last-Exam” tier — representing the frontier of professional difficulty — most configurations, including Anthropic’s older Claude Opus 4.8 and Google’s Gemini CLI, record a devastating 0.0% pass rate.
A core vulnerability in modern AI evaluation is “benchmark contamination”—the phenomenon where test questions inevitably leak into the massive data lakes used to train next-generation models. Once a model memorizes the benchmark, the evaluation becomes entirely useless.
ALE solves this through a dual-use deployment strategy. The project operates as an open-source research initiative, but it closely guards its evaluation data. Only about 10% of the dataset (roughly 150 tasks) is released publicly on platforms like GitHub and Hugging Face. The remaining 1,300+ tasks are kept strictly private.
For developers and enterprise evaluators, this means ALE functions as a “living benchmark”. Private tasks are systematically rotated into the public pool over time, while retired public tasks are swapped out.
This rolling release ensures that the evaluation surface remains uncontaminated across successive model generations, giving enterprise buyers confidence that an agent’s high score is earned, not memorized.
Additionally, ALE provides transparency by tracking both “Full” and “Unlicensed” scores. Because real professional work often requires paid, proprietary software, the “Full” leaderboard incorporates tasks that rely on commercial CAD tools, paid APIs, or licensed datasets.
The “Unlicensed” tier drops these license-gated tasks to provide a clean, like-for-like comparison using only freely available tools, ensuring models aren’t simply rewarded for having access to paid enterprise software.
For developers frustrated by the gap between marketing claims and actual production performance, ALE’s brutal grading curve is highly validating.
Zengyi Qin, an MIT PhD researcher and data contributor to the project, took to X to announce the launch, sharing images of the paper and the staggering 100+ institution contributor list.
“Introducing Agents’ Last Exam (ALE),” Qin wrote. “Built by 300+ domain experts from 100+ institutions. Covering 55 industry domains. Claude Opus 4.8 has 0.0% pass rate on the hardest subset. Glad to have contributed to this benchmark”.
In a follow-up post highlighting the Hugging Face ArXiv paper link, Qin added:
“Very solid work from project leads @YiyouSun @Xinyang_Han_ @dawnsongtweets and @BerkeleyRDI”.
As businesses deploy billions in capital betting on AI agents, they desperately need a compass that points true north. If an agent can eventually conquer the gauntlet of Agents’ Last Exam, it won’t just be passing a test—it will be proving it is ready to join the workforce. Until then, the sobering pass rates on the leaderboard serve as a necessary reality check for the entire AI ecosystem.
Multiple reports indicate that Chinese operatives continue using every tech tool at their disposal – including American AI – to amass data on and manipulate everyone from security-clearance holders to everyday US citizens. And they’re trying to influence public opinion on building datacenters for AI, albeit without success so far.
One of these reports found a “significant resurgence” of a botnet linked to Chinese government-backed goons, including Volt Typhoon, which previously used a covert network of connected devices to burrow deep into critical US networks and preposition for future destructive attacks.
In January 2024, the FBI said it killed Volt’s KV-botnet, comprised of hundreds of end-of-life routers and other internet-connected devices. At the time, KV-botnet consisted of four clusters, with the KV cluster primarily being used as a covert data transfer network, and the JDY cluster used for scanning and reconnaissance.
In a Wednesday report, Lumen’s Black Lotus Labs said that while the KV cluster became largely defunct after the law enforcement takedown, the JDY cluster remains an active threat, and has since surged to more than 1,500 compromised routers and IoT devices.
“Analysis of this activity shows a clear focus on identifying vulnerable infrastructure shortly after public vulnerability disclosures, suggesting that reconnaissance output is rapidly operationalized by China-nexus advanced persistent threat (APT) actors,” the threat intel team wrote. “This targeted focus has been observed across a range of sectors, with the US military and associated entities as the most prominent.”
While the botnet resurgence poses the most pressing threat, and the security shop recommends all enterprises implement CISA and NCSC guidance for mitigating Volt Typhoon activity and defending against China-nexus covert networks of compromised devices, another report indicates that China’s attempts at influence operations haven’t died down, either.
OpenAI in a Wednesday report said it banned ChatGPT accounts likely originating from China after they used the American AI company’s models to generate content for covert operations about – wait for it – American AI. While neither of the two clusters seemed to have much success in sowing chaos or swaying opinions, the fact that they tried at all is significant, according to Ben Nimmo, principal investigator on OpenAI’s Intelligence and Investigations team.
“Neither campaign appears to have gained much authentic engagement,” Nimmo told reporters. “They’re important for what they reveal about the intentions of influence operators from China and the narratives they’re testing and seeking to amplify.”
The first cluster used ChatGPT to generate social media content and images for an operation claiming datacenters and AI applications are increasing electricity demand and causing higher costs for ordinary Americans.
“For example, they asked for comic strips about a power grid operator’s capacity auction prices based on reporting from a legitimate regional paper,” the report says. “They asked ChatGPT to focus the comments on rising capacity prices as a consequence of peak electricity demand, framing the new demand as coming from data centers and AI applications and argued that these costs were ultimately passed to ordinary households.”
The operators then posted these comments and images on X, likely using fake accounts, with links to real news stories about datacenters.
OpenAI suspects the operators are part of a social-media team at a private Chinese tech company that provides services for Chinese provincial-level government clients.
“This was not a case of an influence operation creating a debate,” Nimmo said. “The debate existed already. This was an influence operation from China trying to interfere in it. We didn’t see any signs that they succeeded.”
The second cluster of banned ChatGPT accounts also likely originated in China and used OpenAI’s models to write comments and draw political cartoons criticizing US tech policies and tariffs. “Interestingly, the operators specified in their prompts that the content should not include cartoons of Xi Jinping in the output and should only include President Trump,” Nimmo said.
These accounts, all writing prompts in simplified Chinese and using VPNs to access the AI systems, also used ChatGPT to edit work reports and help design social media monitoring systems. “This isn’t the first time that we’ve seen actors in China trying to come up with ideas for social media monitoring,” Nimmo said.
In February, OpenAI said it banned ChatGPT accounts believed to be linked to Chinese government entities attempting to use AI models to surveil individuals and social media accounts.
If Chinese agents can’t use AI systems to unearth sensitive information, there are always fake websites and job offers promising cash for state secrets. We’ve seen Beijing-linked government snoops use these tactics in the past, and according to the US Justice Department, they’re still using this scam (because it works).
On Wednesday, the feds said they obtained a warrant for and seized 13 fake consulting company websites used to target US persons, including current and former security clearance holders with access to classified and sensitive government information.
The domains include centrikglobalconsulting.com, rightinfoconsult.com, finnaclevesperconsulting.com, cydfconsulting.com, pulsewaveglobal.com, catalystglobalsolutions.com, thehorizzen.com, geoindopacific.com, gpf-ina.org, safesec-group.com, thetruthinfo.com, Vandercons.com, and gulfpeace.org.
Since November 2023, these websites and associated job postings on social media, LinkedIn, and other hiring platforms advertised “consulting” jobs, including “Senior Analyst” and “International Affairs Consultant” positions.
Suspected PRC operatives used the sites and job listings to recruit applicants and bribe them for sensitive information, DOJ alleges. “The conspirators have encouraged applicants and recruits to share confidential and sensitive information in violation of their official duties and of particular interest to the People’s Republic of China (PRC) government,” according to the court documents. “The recruiters pressured candidates to share confidential information and reports from ‘insider sources’ in violation of their official duties.”
The court documents allege the conspirators then paid the recruits for these reports using online accounts in the names of fictitious individuals, and cryptocurrency to hide their identities and the source of the payments. ®
The Mobi Fold is a compact wireless mouse designed to fold flat when not in use. Early impressions are positive for its surprisingly comfortable shape, quiet clicks, and multi-device Bluetooth support.
Weekend Open Thread: Evereve – Corporette.com
Jensen Huang Approves Samsung, SK Hynix, and Micron for NVIDIA (NVDA) HBM4 Memory Supply
The Best Mystery Series of All Time Is Surging on Streaming 30 Years After It Ended
Anatomy of the June crypto crash: Fed, Iran, Saylor
Alexander Zverev wins the French Open to finally earn a 1st Grand Slam title
Suspicious Polyfill login prompts pop up on Toshiba, Muji websites
Senator Cynthia Lummis Calls CLARITY Act the Most Consequential Financial Legislation of This Generation
Microsoft unveils seven homegrown AI models in new bid for ‘long term self-sufficiency’
Microsoft launches MXC, an OS-level sandbox for AI agents, with OpenAI and Nvidia already on board
(VIDEO) Justin Bieber Delivers Surprise Happy Birthday Serenade to Diners at Los Angeles Mexican Restaurant
The Pain Points Taking a Fragile Tech Rally Down a Notch
Eli Lilly (LLY) Stock Surges 4% Following Breakthrough Sleep Apnea Trial Results
High Stakes for Wembanyama as New York Pushes for 3-0 Lead
Von der Leyen’s AI envoy pick draws conflict-of-interest fire
LBank Surpasses 25 Million Users Worldwide as AFA Partnership Continues to Drive Global Growth
Hackers now exploit SolarWinds Serv-U flaw to crash servers
Meta steals a tactic from Tesla and builds data centers in tents
Trump’s AI Ownership Plan Could Benefit Anthropic at OpenAI’s Expense
Bangladesh beat Australia after 20 years in ODIs, register only their second win over six-time world champions | Cricket News
Notion restores access to Anthropic after service disruption
You must be logged in to post a comment Login