TL;DR
Chaotic Eclipse dropped RoguePlanet, their seventh Windows zero-day, hours after Microsoft’s record Patch Tuesday. It grants SYSTEM access on fully patched machines.
As statehouses ramp up for 2026, we’re seeing a familiar and concerning trend of lawmakers rushing to regulate the internet based on shockingly shaky science. From the California State Assembly to the Massachusetts and Minnesota legislatures, a wave of bills is crashing against the digital lives of young people, with proponents of these measures framing social media access as a “public health epidemic,” or a “mental health crisis,” even though we have yet to see any of the settled science that those labels usually invoke.
As a digital rights organization dedicated to the civil liberties of all users, EFF’s expertise lies in reminding lawmakers that young people enjoy largely the same free speech and privacy rights as adults. EFF is not a social science research shop, but we can read the emerging research. What that research shows is much more nuanced than what is claimed by those proposing to ban young people from social media, and it is clear that research and theories used to justify these sweeping bans is far from settled. The rush to ban access to digital platforms is being fueled by “pop psychology” narratives and a collection of statistically flawed studies that do not meet the rigorous standards required for such a massive infringement on youth autonomy and constitutional rights.
The current legislative push relies heavily on a specific, media-friendly narrative that the “great rewiring” of the adolescent brain is a proven fact. This theory suggests that smartphones and social media are the primary, if not sole, drivers of a global uptick in teen anxiety, depression, eating disorders, self harm, etc. While this narrative makes for a compelling airport-bookstore read, it quickly collapses under the scrutiny of the broader scientific community.
Independent researchers, including developmental psychologists from institutions like the University of California, Irvine, and Brown University, have repeatedly found that the evidence for such claims is mixed, blurry, and often contradictory. Large-scale meta-analyses covering dozens of countries have failed to show a consistent, measurable association between the rollout of social media and a decline in global well-being. In reality, we are seeing a classic case of what many of our middle school science teachers warned us about: “correlation” being sold as “causation.”
Additionally, the studies used to support these measures often fail to account for or exclude significant alternative explanations for rising teen anxiety and depression, such as the lasting impact of pandemic-era isolation, the persistent threat of school gun violence, and mounting economic or climate-related stress. By focusing narrowly on social media, these findings frequently overlook the broader societal factors that also impact youth mental health.
The current push for blanket social media bans relies almost exclusively on the work of Jonathan Haidt, particularly his book The Anxious Generation. While Haidt is an amiable and brilliant storyteller, he is not a clinical psychologist or a specialist in child development. He is a social psychologist who writes about moral psychology at a business school. Nonetheless, the book has made it to every Best Seller list, and with Haidt revered as an expert on podcasts with massive reach, like Oprah, Joe Rogan, Michelle Obama, and Trevor Noah—his message has been heard by a large subset of society, which primarily relies on: no smartphones or social media before age 16, phone-free schools, and more “unsupervised, real-world independence.”
To highlight Haidt’s reach when it comes to legislation banning social media: the California committee analysis for the proposed California social media ban mentions Haidt 20 times; the Governor of Utah promoted the book as a “must-read” months before signing the nation’s first social media ban; Haidt is cited in bill analysis for the bill banning social media in Florida; his work is mentioned in a federal bill aiming to ban phones in schools; and he provided formal testimony before the U.S. Senate Judiciary Committee (Subcommittee on Technology, Privacy, and the Law) in May 2022.
While Haidt’s research has been paramount to legislation stripping millions of young people of their rights to expression and connection, his conclusions are not without challenge, and many experts in the field argue that the evidence is less than ironclad.
While we can admit that Jonathan Haidt’s “great rewiring” theory makes for a gripping narrative, we cannot ignore that independent researchers and statisticians have identified significant flaws in the data used to justify it. Which means we are currently watching policymakers legislate blanket bans based on evidence that would be rejected in almost any other field of public health.
The reality is that research has consistently disproven the oft-assumed link between social media use and poor mental health in youth, and actually indicates that moderate internet use is a net positive for teens’ development, and negative outcomes are usually due to either lack of access or excessive use. In one major study of 100,000 adolescents, a “U-shaped association emerged where moderate social media use was associated with the best well-being outcomes, while both no use and highest use were associated with poorer well-being.” We also know that young people’s relationship with social media is complex, as it provides them essential spaces for civic engagement, identity exploration, and community building—particularly for LGBTQ+ and marginalized youth who may lack support in their physical environments.
But again, the image Haidt presents in his book is increasingly at odds with the broader academic consensus. As mentioned, critics argue that the evidence for the mental health impacts of social media is mixed, blurry, and often misinterpreted. NYU statistics expert Aaron Brown, writing for Reason, notes that many of the studies in Haidt’s exhaustive reference list are statistically unreliable or fail to show a strong causal link. Prof. Candace Odgers, a leading voice in psychological science, explains the “selection effect” that legislators often ignore:
“Hundreds of researchers, myself included, have searched for the kind of large effects suggested by Haidt. Our efforts have produced a mix of no, small and mixed associations. Most data are correlative. When associations over time are found, they suggest not that social-media use predicts or causes depression, but that young people who already have mental-health problems use such platforms more often or in different ways from their healthy peers.”
This raises a fundamental question of legislative responsibility: If the science is not settled, how can legislators confidently declare a “public health crisis” to justify stripping away young people’s First Amendment rights? By bypassing the rigorous, nuanced findings of the scientific community in favor of a more convenient narrative, legislators are choosing emotion over evidence. Before imposing such draconian restrictions on young people’s access to information, policymakers have an obligation to do the heavy lifting: to dig into the actual research and listen to the experts who are sounding the alarm on oversimplified conclusions.
Perhaps the most troubling aspect of Haidt’s crusade is its overlap with ideological rhetoric that pathologizes the identities of marginalized youth, and how that makes its way through efforts to ban social media for youth. A recurring theme in the literature favored by proponents of social media bans is the idea of “social contagion“—specifically regarding the rise in young people identifying as transgender or non-binary. Haidt dedicates an entire chapter of his book to this (ch.6, pt 3, p. 165), talking about “Why Social Media Harms Girls More Than Boys,” stating that:
“The recent growth in diagnoses of gender dysphoria may also be related in part to social media trends, […] the fact that gender dysphoria is now being diagnosed among many adolescents who showed no signs of it as children all indicate the social influence and sociogenic transmission may be at work as well.”
These harmful theories suggesting that social media is “infecting” young people with gender dysphoria are false and not supported by peer-reviewed clinical research. But by legitimizing “experts” who promote these debunked theories, legislators—especially those in states like California who pride themselves on being a sanctuary for LGBTQ+ youth—are inadvertently platforming the same rhetoric used in other states to ban gender affirming care for youth. This “social contagion” narrative is a tool of exclusion, not a scientific reality, and we must be wary of any “public health” argument that treats community-building and self-discovery among marginalized young people as a “purported mental illness” spread via TikTok.
Fortunately, there is a measured, evidence-based alternative already emerging. California’s A.B. 2071, for instance, is a student-authored “digital wellness” bill that offers a measured, evidence-based alternative rather than prohibition. The bill advocates for a curriculum that teaches students how to manage algorithms, recognize cyberbullying, and regulate their own relationship with technology. Instead of trying to completely shield young people from social media, education-based approaches empower young people and have the benefit of providing skills that stay with a young person long after they leave the classroom.
JustLeadershipUSA, a criminal justice organization, has a slogan that rings true in this instance too: “Those closest to the problem are closest to the solution.” So let’s start listening to what our young people are asking us for—more education—instead of imposing paternalistic, disempowering bans.
Adolescent mental health struggles are a complex, multifaceted crisis. It is a crisis that has existed for as long as time, and has been driven by economic instability, the opioid epidemic, the threat of school violence, amongst other issues. To pin all of society’s woes on a smartphone app is not just a scientific error; it is a policy failure that ignores the real, material needs of young people both online and off.
Legislators must stop legislating as “anxious parents” and start acting as measured policymakers. Because for some youth, social media platforms are a lifeline. UNICEF and other global human rights organizations have warned that age-related restrictions and blanket bans can backfire in three critical ways: isolating marginalized youth (like LGBTQ+ youth, students in rural areas, foster youth, or those with disabilities) who social media is often the only place they can find a supportive community; necessitating invasive mass collection of biometric data or government-issued IDs from all users, including adults; and pushing young people toward less-regulated, “darker” corners of the web where content moderation is non-existent and the risks of actual exploitation are significantly higher.
Legislators have a valid interest in protecting children, but that interest must be pursued through tailored, measured approaches. We cannot allow emotions or a collection of flawed data sets to justify a historic rollback of digital rights.
Reposted from the EFF’s Deeplinks blog.
Filed Under: addiction, harm to children, jonathan haidt, protect the children, research, social media
William Shakespeare wrote “The Tragedy of Hamlet, Prince of Denmark” (which, for obvious reasons, is typically referred to as just “Hamlet”) somewhere around 1600. And for centuries, the age-old philosophical question was, “To be, or not to be?” Had ol’ Willy been born in modern times, though, that question might instead have been, “To Costco, or to Sam’s Club?” Because if we’re being honest, that’s a far more important question, as it directly impacts our wallets on a near-every-day basis.
Most of us who visit these big-box stores are looking for a way to save money. When we leave pushing two carts full of stuff we didn’t know we needed in the first place, though, did we really save anything at all? Consumerist anxieties aside, believe it or not, both stores opened in 1983 and began the Costco versus Sam’s Club rivalry we still have today. They’re almost like a modern-day Hatfields and McCoy, but the preferred weapon of choice is bucks over bullets.
Technically, Sam’s Club (founded by Walmart’s Sam Walton) struck first, flinging open the doors to its first members-only store in Midwest City, Oklahoma, in April of 1983. Costco opened its first store in Seattle, Washington, just a few months later, in September of that same year. While both started in the same year, the story of these two economic juggernauts (and their rivalry isn’t that clean and simple.
A store called Price Club opened in 1976 in what had once been an airplane hangar on Morena Boulevard in San Diego, California. Founded by Sol Price and his son, Robert, it’s considered the world’s first membership warehouse club, and initially catered only to business customers in need of supplies and wholesale items. Jim Sinegal was the executive vice president of merchandising, distribution, and marketing for this lone warehouse store, which took off and thrived for several years.
In April 1983, Walmart’s Sam Walton launched his competing chain, Sam’s Club. Then, Jim Sinegal, taking what he had learned from Price Club, teamed up with Jeffrey Brotman to open the first Costco in September of the same year — and the big box store war truly began. A decade later, Sam’s Club was the dominant leader, raking in $14.7 billion annually at its roughly 400 stores. Second, with 94 stores, was Price Club, while Costco’s 103 stores placed it in third.
Realizing they wouldn’t be able to win the war by maintaining that status quo, Price Club merged with Costco in 1993, with the new enterprise relaunching as PriceCostco. The new company quickly generated $16 billion annually from 206 stores, edging out Sam’s Club, and eventually renamed itself Costco in 1997. Today, Sam’s Club and Costco are locked in a seemingly never-ending battle, with the two companies vying to offer the better deal on tires, televisions, and other goods to customers.
Apple has terminated support for AFP in macOS 27, effectively killing off the Time Capsule. However, affected owners might be able to revive their hardware.
A long-discontinued network storage device, Time Capsules gave Mac users a way to back up over a home network using Time Machine. While the hardware hasn’t been available for quite a few years, support continued up to macOS 26.
However, as warned in macOS Sequoia 15, support for the Apple Filing Protocol, AFP, was being deprecated and removed in a future macOS release. That turned out to be macOS 27, thanks to a notice in macOS 26 warning about the end of support for AirPort Disk and other Time Capsule disks.
This is an issue that affects Time Capsule specifically, as it relies on AFP for its connectivity. While Time Capsule does include support for SMBv1 (Server Message Block), it was only supported in macOS 26 as a deprecated measure.
From macOS 27 onwards, Time Machine will require hardware using SMBv2 or SMBv3. This will mean it will work with modern NAS devices, but not Time Capsules.
While Time Capsules in their normal state won’t work for Time Machine, there are efforts to try and add the required functionality to the hardware.
A GitHub project we wrote about in April, titled TimeCapsuleSMB, aims to update the outdated SMB layer with a newer one, while keeping Apple’s firmware untouched. This way, Apple’s file sharing stays enabled, so your internal disk, or connected USB ones, keep auto-mounting and working on the forthcoming macOS 27.
Really, it’s a modern Samba build to manage file sharing that’s loaded onto the Time Capsule. It runs Samba 4.24.3 server, advertises itself with Bonjour, and accepts authenticated SMB3 connections.
At that point, assuming the project ever works, you can connect to the server using a normal SMB URL, and then use it for Time Machine backups.
When we first wrote about the project, there were concerns that it was more a proof-of-concept than a full project. However, at the time of publication, there have been many commits to the project, including some that are just hours old.
According to the project’s requirements, you need to use a Mac running macOS 14 or later, or a Linux device on the same local network as the Time Capsule. You also need the password for the Time Capsule, as well as Homebrew, Python 3.9 or later, and smbclient installed locally.
The instructions to install it are quite complex, which puts the project out of reach of the typical user. However, near the top is a “Quick Start” option that relies on just five commands, streamlining the process.
As it stands, Time Machine users have few choices in how they maintain their backups. They could look for an external drive or invest in a NAS, as the most obvious, if expensive, solutions.
But, with a project like TimeCapsuleSMB, there’s a chance of reviving an underappreciated part of Apple’s former product line.
SECURITY
The CEO thought this was the best way to deal with some email issues
PWNED Welcome, once again, to PWNED, the weekly screed where we highlight those who did not do the deed of securing their systems. If someone left their passwords or their access exposed, we will be writing about them here.
Have a story about someone leaving a gaping hole in their network? Share it with us at pwned@sitpub.com. Anonymity is available upon request.
This week’s terrifying tale of poor security hygiene comes courtesy of Luke Irwin, CEO and principal consultant at Aegis Cybersecurity. He’s been in the industry for more than a quarter of a century and he knows where the bits are buried.
At one point, Irwin consulted for a company that was a large national facility services organization, a 2,000-employee firm that provided cleaning, security guards, industrial abseiling (cleaning the facade), and other things that other large businesses need to keep their physical plants running smoothly.
The CEO had one very peculiar idea about how to keep his own house in order: he wanted to have access to every one of his employees’ login credentials.
The chief executive had an Excel spreadsheet sitting right on his desktop with a complete list of all the employee usernames and passwords. Let that sink in for a second. One person had all the keys to the castle in a single, easily accessible file.
In any decent security setup, no one in the company has access to anyone else’s password. Even the head of the IT department should not know another employee’s password. I say this as someone who used to work for a company where the IT department would ask you to DM them your password if you had computer problems.
But this company’s CEO wanted the usernames and passwords for reasons I’m sure any of his employees would appreciate: so he could go into their email accounts! He had an experience where one colleague had sent secret information to the entire company via email and he had spent the evening logging into every single account and deleting the message before anyone could see it.
Just in case other messages were sent in error in the future, the CEO wanted the ability to log into all the relevant accounts and delete them himself. Perhaps for the same reason, he would not allow MFA (multi-factor authentication), because that would have kept him out of people’s inboxes. He was adamant even though the company had been the victim of a ransomware incident previously.
“Despite repeated advice, he held that position for around four months, until we were able to demonstrate that the IT team could remove messages centrally using fairly simple administrative commands, without needing everyone’s password,” Irwin said.
Even after getting rid of the Excel sheet of shame, the boss still refused to turn on MFA and the company subsequently suffered two data breaches involving sensitive client data.
Unfortunately, this company wasn’t the only one that Irwin worked with where the management had something against MFA. Another client, this one in the medical sector, was opposed to multi-factor authentication because it “made things just a little too hard” for the external consultants they were using to access their systems.
During the time that Irwin worked with that company, they got lucky and no one breached them. But since then, he’s seen signs that their data was available on the dark web. No word on whether they ever switched MFA on.
There’s plenty to learn from Irwin’s two clients, but it’s all pretty obvious. First, don’t let anyone, even administrators or CEOs, have other people’s passwords. If someone has to get into another person’s email account, have IT use administrative access. Second, always enable MFA, preferably MFA with passkeys. ®
Chaotic Eclipse dropped RoguePlanet, their seventh Windows zero-day, hours after Microsoft’s record Patch Tuesday. It grants SYSTEM access on fully patched machines.
Chaotic Eclipse, the security researcher Microsoft threatened with criminal prosecution, has published a seventh Windows zero-day exploit. Called RoguePlanet, it grants attackers SYSTEM privileges on fully patched Windows 10 and 11 machines. The researcher released the proof-of-concept hours after Microsoft shipped its June Patch Tuesday update, which fixed a record 200 vulnerabilities.
RoguePlanet exploits a race condition in Windows Defender’s internal processing logic. Specifically, it is a Time-of-Check to Time-of-Use (TOCTOU) vulnerability. An unprivileged user can redirect a file operation performed by Defender, which runs as SYSTEM, to execute attacker-controlled code at the highest privilege level.
“The exploit is a race condition, so it’s a hit or miss,” the researcher said. “I have managed to get a 100% success rate on some machines while it struggled to work on others.”
Security firm ThreatLocker confirmed the flaw works and published a video demonstration. “Our initial analysis confirms that the RoguePlanet exploit is viable and performs as described,” said CEO Danny Jenkins. He added that application allowlisting can prevent the exploit from executing.
The proof-of-concept was published on a self-hosted Git repository after the researcher said Microsoft had both GitHub and GitLab repositories hosting earlier work removed. This is part of an escalating dispute. Microsoft invoked its Digital Crimes Unit against the researcher and revoked access to their Microsoft Security Response Center account.
Chaotic Eclipse has disclosed seven zero-days in a matter of months: BlueHammer, RedSun, UnDefend, YellowKey, GreenPlasma, MiniPlasma, and now RoguePlanet. Microsoft’s June Patch Tuesday fixed two of them, GreenPlasma and YellowKey, but the rest remain unpatched. The researcher says the disclosures are retaliation for how Microsoft handled the process.
“They mopped the floor with me and pulled every childish game they could,” the researcher wrote. “I was wondering if I was dealing with a massive corporation or someone who is just having fun seeing me suffer.”
The timing is pointed. Microsoft’s June Patch Tuesday was its largest ever, fixing 200 vulnerabilities including 33 rated critical and three publicly disclosed zero-days. Analysts attribute the surge in part to AI-assisted code auditing, which is finding vulnerabilities faster than defenders can patch them. RoguePlanet arriving hours after the record update underscores the gap: even the biggest patch cycle in Microsoft’s history was immediately obsolete for anyone running Windows Defender.
[Dennis] is on YouTube with his channel “Made By Dennis,” but for the record he is a maker, not a V-tuber. On the other hand, his latest project– creating a profesisonal-level tracking rig with DIY IR cameras and a whole lot of moxie–does mean he’s now equipped to make the move to the prestigious, high-status world of pretending to be an anime girl.
That is of course not why he did it. Like most projects around here, the motivation was more a case of “I wonder if I can…”– in this case [Dennis] wondered what it would take for him to pull off the same sort of optical motion capture, or MoCap, that is used in Hollywood studios. Optical mocap has the advantage of being very precise, able to track things at high speeds, and not being in any way limited to the human form like the slew of AI-assisted methods hitting the market right now. The disatvantage is that you need to place markers on any part of your subject you want tracked, film them from all angles, and process a whole lot of pixels. In [Dennis]’s case, it ended up being about four billion. Keeping in mind that actually locating those points in 3D space is dependent on knowing exactly where your cameras are: if you want sub-millimeter precision, your cameras need to be fixed with sub-millimeter tolerance. It’s a big project, hence a long video, which is embedded below.
The DIY cameras use a AR0234 MIPI camera on a custom PCB with M12 lenses and IR filters. To improve the signal-to-noise ratio on optical MoCap, it’s standard to use near-IR light. The camera boards, as you might expect given the MIPI interface, hook into Raspberry Pi compute modules– the cheapest CM4 should work, though he’s using CM5s. The compute modules sit on custom boards that provide PoE, and some other niceties– like a small microcontroller driven by the pulse-per-second pin to help trigger the cameras in sync.
Each camera gets a ring light of near-IR LEDs that pulse at 160 W, which would be way more than PoE is specced to provide, but since the LEDs are only on when the camera is taking a frame, the average power is well within allowable limits. With 16 cameras each having their own ring light, that’s a lot of near-IR photons. Don’t forget your safety squints!
Rather than process the images with OpenCV, he has his own custom solution optimized for this use-case that [Dennis] reports is 300x faster. Luckily, he’s put his implementation on GitHub, along with the rest of the project. Even if you don’t have any v-tubing ambitions, this project is very impressive and worth checking out in its entirety.
Optical MoCap isn’t the only game in town, of course. If you want to do this cheap and easy, you can strap a bunch of IMU sensors to yourself– just don’t expect the same precision.
Thanks to [Dennis] for the tip!
Following its 2024 acquisition of McIntosh and Sonus faber, Bose is making another calculated move in connected audio with the acquisition of StreamUnlimited Engineering GmbH, a Vienna-based company that supplies streaming software platforms, hardware modules, app frameworks, certifications, and engineering support for audio and smart home manufacturers.
This is not just Bose buying another parts supplier. StreamUnlimited gives Bose something far more useful: the software plumbing and certification backbone needed to build, support, and potentially license connected audio products across multiple brands and categories. That matters for Bose, but it may matter even more for McIntosh and Sonus faber, two premium audio brands that need stronger streaming ecosystems if they are going to compete in a market increasingly shaped by BluOS, Sonos, HEOS, WiiM, AirPlay, Google Cast, Spotify Connect, TIDAL Connect, Qobuz Connect, and Roon.
The deal also gives Bose a broader path to embed its proprietary audio technologies, including Sound by Bose and the Bose WaveForm Audio Engine, into more products beyond its own speakers and headphones. That could include smart speakers, soundbars, multiroom systems, mobile devices, wearables, automotive audio, and third-party connected products. In other words, Bose is not just chasing another box for the shelf. It is buying the infrastructure needed to make its audio technology travel farther.

Two current examples of Bose’s Sound by Bose strategy already exist in the wild: Epson’s Lifestudio projector lineup and Skullcandy’s Method 360 ANC earbuds. In both cases, Bose is not selling a finished Bose-branded speaker or headphone. It is licensing its audio tuning, acoustic design, and performance credibility into third-party products that need better sound to stand out.
That makes the StreamUnlimited acquisition more interesting. If Bose can combine Sound by Bose with StreamUnlimited’s streaming software, app frameworks, hardware modules, certification work, and connected-audio engineering, the company gains a much wider path to expand beyond its own products. Epson and Skullcandy show the basic strategy. StreamUnlimited could help scale it.
“As connected ecosystems scale and become more complex, how devices work together is a central driver of value,” said Nick Smith, president of Bose Audio Technology and chief strategy officer. “StreamUnlimited has built a trusted position at the center of this coordination layer, where interactions between devices are defined and orchestrated. We’re excited to welcome their team to Bose as we bring our capabilities to more partners, products, and experiences.”
“We look forward to joining with Bose as we expand StreamUnlimited’s offerings and accelerate the development of next-generation intelligent audio experiences for our customers,” said Frits Wittgrefe, CEO at StreamUnlimited. Markus Rutz, CTO at StreamUnlimited, added, “There is a significant opportunity to further advance the orchestration capabilities at the core of our platform, enabling more seamless, adaptive, and AI-driven audio ecosystems. This will unlock broader access to new streaming technologies, services, and capabilities, positioning us for continued growth as the market evolves.”
StreamUnlimited will continue to support both current and new customers, while extending its expertise into new markets. Its solutions will remain fully supported, interoperable, and open to integration with third-party technologies, products, and ecosystems.
Additional information about the acquisition, including financial and other transaction terms, remains confidential at this time.

Bose’s acquisition of StreamUnlimited is not about turning McIntosh and Sonus faber into Bose-branded lifestyle products. So far, both brands have continued on their own legacy paths. What this deal really gives Bose is something more strategic: the software, streaming, app, certification, hardware-module, and engineering infrastructure needed to compete in connected audio at a much larger scale.
That matters across the portfolio. Bose gets more control over the platform layer behind smart speakers, soundbars, headphones, wearables, automotive systems, and third-party products using Sound by Bose. McIntosh could benefit from stronger connected amplifiers, streamers, preamps, and in-car systems without losing its identity. Sonus faber gains a clearer path toward active, wireless, and lifestyle products that still feel like Sonus faber, not another anonymous app-controlled box.
The AI angle is part of the story, but not the whole story. StreamUnlimited gives Bose a foundation for more adaptive, personalized, voice-enabled, and software-driven audio experiences. That does not mean Bose bought an AI company or that a Bose rival to BluOS, Sonos, HEOS, or WiiM appears tomorrow. It means Bose now owns more of the plumbing required to build one, license one, or embed its technology more deeply into other companies’ products.
Bose is no longer just chasing the next speaker or headphone. It is building a broader audio technology ecosystem that can live inside its own products, its luxury brands, and the products of third-party partners. That is the real move.
Researchers from the University of California, Berkeley’s Center for Responsible, Decentralized Intelligence (RDI), alongside an advisory committee of over 300 domain experts, have launched Agents’ Last Exam (ALE)—a grueling new benchmark built to measure whether artificial intelligence can actually execute economically valuable, long-horizon professional workflows.
In a shocking upset, OpenAI’s GPT-5.5 from April, operating through the Codex harness, secured the absolute top spot on the new ALE Leaderboard with a 24.0% pass rate, beating Anthropic’s highly anticipated, brand new Mythos-class Claude Fable 5 model released just yesterday, which came in third with a score of 22.0%.
Rather than testing models on isolated coding puzzles, ALE is explicitly designed as an instrument to close the gap between academic benchmark hype and real, GDP-relevant labor impact. And right now, the data proves the most advanced models in the world are fundamentally failing the exam.
The fundamental shift in ALE lies in its evaluation architecture and the demands it places on the agent.
Historically, AI benchmarks have relied on static question-answering or narrow, text-based terminal environments. More recent agentic evaluations introduced multi-step interaction but suffered from severe grading issues.
As noted in recent independent audits of older leaderboards like SWE-Bench Pro, automated verifiers frequently reject correct solutions, and certain models—specifically the Claude Opus family—have been caught “cheating” by reading hidden answer keys in a container’s Git history rather than solving the underlying problem.
ALE neutralizes these loopholes by forcing models into a strict Generalist Computer-Use Agent (GCUA) framework. To pass, an agent cannot merely execute terminal commands.
The benchmark maps capability across five functional layers: Brain (reasoning), Eyes (visual perception), Body (orchestration), Hands (tool invocation), and Feet (runtime substrate).
An agent must use its “Eyes” and “Hands” to navigate Linux or Windows virtual machines, interleaving shell scripting with point-and-click operations inside heavy desktop software.
Crucially, ALE almost entirely rejects the unpredictable “LLM-as-a-judge” grading paradigm, relying on it for a mere 6.8% of its workflows. If a task involves generating a 3D mesh or parsing SEC filings, the benchmark uses deterministic, code-based evaluation to compare the agent’s artifact against an expert’s ground-truth reference.
ALE launches with 1,490 task instances and is scaling toward a massive 5,000-task target. What makes the product remarkable is its authenticity. The tasks are strictly anchored in the U.S. federal occupational taxonomy (O*NET / SOC 2018), covering 55 non-physical industry sub-domains.
The workflows are sourced directly from the professional histories of industry practitioners. Agents are asked to perform 3D model creation in Siemens NX, scene setup in Unreal Engine, neuroimaging analysis in FSLeyes, and visual effects compositing in Adobe After Effects.
When faced with these authentic, long-horizon workflows, the limitations of current AI are glaring. ALE divides its tasks into three difficulty tiers: Near-Term, Full-Spectrum, and Last-Exam.
|
Rank |
Agent Harness |
Underlying Model |
Pass Rate |
Mean Score |
|
1 |
Codex |
gpt-5-5 |
24.0% |
42.8% |
|
2 |
Ale Claw |
gpt-5-5 |
23.0% |
45.8% |
|
3 |
Claude Code |
claude-fable-5 |
22.0% |
40.5% |
|
4 |
OpenClaw |
gpt-5-5 |
21.1% |
41.0% |
|
5 |
Cursor CLI |
composer-2-5 |
20.4% |
38.5% |
The victory of GPT-5.5 aligns with recent third-party analysis suggesting that OpenAI’s models are currently superior at strictly adhering to multi-part, complex prompts. Conversely, users report Anthropic’s Claude architecture can sometimes be “forgetful” with multi-part instructions, abandoning required steps mid-workflow — a fatal flaw in ALE’s rigorous pipeline.
And while hitting a 24.0% pass rate is enough to claim the crown, the absolute performance ceiling remains remarkably low.
On the hardest “Last-Exam” tier — representing the frontier of professional difficulty — most configurations, including Anthropic’s older Claude Opus 4.8 and Google’s Gemini CLI, record a devastating 0.0% pass rate.
A core vulnerability in modern AI evaluation is “benchmark contamination”—the phenomenon where test questions inevitably leak into the massive data lakes used to train next-generation models. Once a model memorizes the benchmark, the evaluation becomes entirely useless.
ALE solves this through a dual-use deployment strategy. The project operates as an open-source research initiative, but it closely guards its evaluation data. Only about 10% of the dataset (roughly 150 tasks) is released publicly on platforms like GitHub and Hugging Face. The remaining 1,300+ tasks are kept strictly private.
For developers and enterprise evaluators, this means ALE functions as a “living benchmark”. Private tasks are systematically rotated into the public pool over time, while retired public tasks are swapped out.
This rolling release ensures that the evaluation surface remains uncontaminated across successive model generations, giving enterprise buyers confidence that an agent’s high score is earned, not memorized.
Additionally, ALE provides transparency by tracking both “Full” and “Unlicensed” scores. Because real professional work often requires paid, proprietary software, the “Full” leaderboard incorporates tasks that rely on commercial CAD tools, paid APIs, or licensed datasets.
The “Unlicensed” tier drops these license-gated tasks to provide a clean, like-for-like comparison using only freely available tools, ensuring models aren’t simply rewarded for having access to paid enterprise software.
For developers frustrated by the gap between marketing claims and actual production performance, ALE’s brutal grading curve is highly validating.
Zengyi Qin, an MIT PhD researcher and data contributor to the project, took to X to announce the launch, sharing images of the paper and the staggering 100+ institution contributor list.
“Introducing Agents’ Last Exam (ALE),” Qin wrote. “Built by 300+ domain experts from 100+ institutions. Covering 55 industry domains. Claude Opus 4.8 has 0.0% pass rate on the hardest subset. Glad to have contributed to this benchmark”.
In a follow-up post highlighting the Hugging Face ArXiv paper link, Qin added:
“Very solid work from project leads @YiyouSun @Xinyang_Han_ @dawnsongtweets and @BerkeleyRDI”.
As businesses deploy billions in capital betting on AI agents, they desperately need a compass that points true north. If an agent can eventually conquer the gauntlet of Agents’ Last Exam, it won’t just be passing a test—it will be proving it is ready to join the workforce. Until then, the sobering pass rates on the leaderboard serve as a necessary reality check for the entire AI ecosystem.
Multiple reports indicate that Chinese operatives continue using every tech tool at their disposal – including American AI – to amass data on and manipulate everyone from security-clearance holders to everyday US citizens. And they’re trying to influence public opinion on building datacenters for AI, albeit without success so far.
One of these reports found a “significant resurgence” of a botnet linked to Chinese government-backed goons, including Volt Typhoon, which previously used a covert network of connected devices to burrow deep into critical US networks and preposition for future destructive attacks.
In January 2024, the FBI said it killed Volt’s KV-botnet, comprised of hundreds of end-of-life routers and other internet-connected devices. At the time, KV-botnet consisted of four clusters, with the KV cluster primarily being used as a covert data transfer network, and the JDY cluster used for scanning and reconnaissance.
In a Wednesday report, Lumen’s Black Lotus Labs said that while the KV cluster became largely defunct after the law enforcement takedown, the JDY cluster remains an active threat, and has since surged to more than 1,500 compromised routers and IoT devices.
“Analysis of this activity shows a clear focus on identifying vulnerable infrastructure shortly after public vulnerability disclosures, suggesting that reconnaissance output is rapidly operationalized by China-nexus advanced persistent threat (APT) actors,” the threat intel team wrote. “This targeted focus has been observed across a range of sectors, with the US military and associated entities as the most prominent.”
While the botnet resurgence poses the most pressing threat, and the security shop recommends all enterprises implement CISA and NCSC guidance for mitigating Volt Typhoon activity and defending against China-nexus covert networks of compromised devices, another report indicates that China’s attempts at influence operations haven’t died down, either.
OpenAI in a Wednesday report said it banned ChatGPT accounts likely originating from China after they used the American AI company’s models to generate content for covert operations about – wait for it – American AI. While neither of the two clusters seemed to have much success in sowing chaos or swaying opinions, the fact that they tried at all is significant, according to Ben Nimmo, principal investigator on OpenAI’s Intelligence and Investigations team.
“Neither campaign appears to have gained much authentic engagement,” Nimmo told reporters. “They’re important for what they reveal about the intentions of influence operators from China and the narratives they’re testing and seeking to amplify.”
The first cluster used ChatGPT to generate social media content and images for an operation claiming datacenters and AI applications are increasing electricity demand and causing higher costs for ordinary Americans.
“For example, they asked for comic strips about a power grid operator’s capacity auction prices based on reporting from a legitimate regional paper,” the report says. “They asked ChatGPT to focus the comments on rising capacity prices as a consequence of peak electricity demand, framing the new demand as coming from data centers and AI applications and argued that these costs were ultimately passed to ordinary households.”
The operators then posted these comments and images on X, likely using fake accounts, with links to real news stories about datacenters.
OpenAI suspects the operators are part of a social-media team at a private Chinese tech company that provides services for Chinese provincial-level government clients.
“This was not a case of an influence operation creating a debate,” Nimmo said. “The debate existed already. This was an influence operation from China trying to interfere in it. We didn’t see any signs that they succeeded.”
The second cluster of banned ChatGPT accounts also likely originated in China and used OpenAI’s models to write comments and draw political cartoons criticizing US tech policies and tariffs. “Interestingly, the operators specified in their prompts that the content should not include cartoons of Xi Jinping in the output and should only include President Trump,” Nimmo said.
These accounts, all writing prompts in simplified Chinese and using VPNs to access the AI systems, also used ChatGPT to edit work reports and help design social media monitoring systems. “This isn’t the first time that we’ve seen actors in China trying to come up with ideas for social media monitoring,” Nimmo said.
In February, OpenAI said it banned ChatGPT accounts believed to be linked to Chinese government entities attempting to use AI models to surveil individuals and social media accounts.
If Chinese agents can’t use AI systems to unearth sensitive information, there are always fake websites and job offers promising cash for state secrets. We’ve seen Beijing-linked government snoops use these tactics in the past, and according to the US Justice Department, they’re still using this scam (because it works).
On Wednesday, the feds said they obtained a warrant for and seized 13 fake consulting company websites used to target US persons, including current and former security clearance holders with access to classified and sensitive government information.
The domains include centrikglobalconsulting.com, rightinfoconsult.com, finnaclevesperconsulting.com, cydfconsulting.com, pulsewaveglobal.com, catalystglobalsolutions.com, thehorizzen.com, geoindopacific.com, gpf-ina.org, safesec-group.com, thetruthinfo.com, Vandercons.com, and gulfpeace.org.
Since November 2023, these websites and associated job postings on social media, LinkedIn, and other hiring platforms advertised “consulting” jobs, including “Senior Analyst” and “International Affairs Consultant” positions.
Suspected PRC operatives used the sites and job listings to recruit applicants and bribe them for sensitive information, DOJ alleges. “The conspirators have encouraged applicants and recruits to share confidential and sensitive information in violation of their official duties and of particular interest to the People’s Republic of China (PRC) government,” according to the court documents. “The recruiters pressured candidates to share confidential information and reports from ‘insider sources’ in violation of their official duties.”
The court documents allege the conspirators then paid the recruits for these reports using online accounts in the names of fictitious individuals, and cryptocurrency to hide their identities and the source of the payments. ®
The Mobi Fold is a compact wireless mouse designed to fold flat when not in use. Early impressions are positive for its surprisingly comfortable shape, quiet clicks, and multi-device Bluetooth support.
Palantir’s Karp predicts full AI nationalization in two years. He says Sanders’ 50% proposal will look moderate. Trump, Sanders, and Karp agree the shift is coming.
Palantir CEO Alex Karp says full nationalization of AI companies is coming, and that Senator Bernie Sanders’ proposal for 50% public ownership will soon look moderate. “In two years, they’re not going to think Bernie Sanders is progressive,” Karp told CNBC on Wednesday. “They’re going to be like, ‘Bernie Sanders, you only want 50%? What is this 50%?’”
Karp said he has spent six months privately warning top AI executives about the threat. “The momentum is on the side of people who want to nationalise them,” he said. He described himself as a “card-carrying progressive” and argued that the most important political decisions in the country will be driven by whether politicians understand AI.
The prediction lands in an increasingly crowded political space. Sanders has outlined his American AI Sovereign Wealth Fund Act, which would impose a one-time 50% tax on stock, not profits, from companies like OpenAI, Anthropic, and xAI. Trump has said he plans to meet AI company leaders to discuss some form of public ownership, calling it a “partnership with the American public.” The two sides disagree on nearly everything else.
“The question is not whether AI will change the world, it will,” Sanders said in a video this month. “The question is who will own and control that future.” Trump said at the White House: “If we do that, the public will become very rich, the people in our country.”
Not everyone in Trump’s orbit agrees. David Sacks, the former White House AI and crypto czar, warned that Republicans who adopt the Sanders position will regret it. “Conservatives are right to fear where this is all headed but ought to think more carefully about how regulations they are flirting with now will be used against them the next time a Democrat administration is in power,” Sacks wrote.
Karp framed the debate differently. He said Americans are asking what will happen to them as AI eliminates jobs, “and the answers aren’t all good or bad.” He predicted the US would need to “retrain and retool” and said it is better positioned to do so than Europe. He did not address how Palantir, which sells AI to governments and militaries, would be affected by nationalization.
The bipartisan convergence on public ownership of AI is remarkable. A year ago, the idea of the US government taking equity stakes in AI companies would have been dismissed as fringe. Now a socialist senator, a Republican president, and a defence contractor CEO all agree it is likely. The disagreement is only about how much and how fast.
Whether any of it happens depends on legislation, which has not been introduced yet, and on whether AI companies voluntarily offer equity, as OpenAI has proposed through its Public Wealth Fund concept. But Karp’s prediction is the most extreme version yet from a sitting CEO: not 10%, not 50%, but full nationalization, and within two years.
Weekend Open Thread: Evereve – Corporette.com
Jensen Huang Approves Samsung, SK Hynix, and Micron for NVIDIA (NVDA) HBM4 Memory Supply
The Best Mystery Series of All Time Is Surging on Streaming 30 Years After It Ended
Anatomy of the June crypto crash: Fed, Iran, Saylor
Alexander Zverev wins the French Open to finally earn a 1st Grand Slam title
Suspicious Polyfill login prompts pop up on Toshiba, Muji websites
Senator Cynthia Lummis Calls CLARITY Act the Most Consequential Financial Legislation of This Generation
Microsoft launches MXC, an OS-level sandbox for AI agents, with OpenAI and Nvidia already on board
Microsoft unveils seven homegrown AI models in new bid for ‘long term self-sufficiency’
(VIDEO) Justin Bieber Delivers Surprise Happy Birthday Serenade to Diners at Los Angeles Mexican Restaurant
The Pain Points Taking a Fragile Tech Rally Down a Notch
Eli Lilly (LLY) Stock Surges 4% Following Breakthrough Sleep Apnea Trial Results
High Stakes for Wembanyama as New York Pushes for 3-0 Lead
Von der Leyen’s AI envoy pick draws conflict-of-interest fire
LBank Surpasses 25 Million Users Worldwide as AFA Partnership Continues to Drive Global Growth
Meta steals a tactic from Tesla and builds data centers in tents
Hackers now exploit SolarWinds Serv-U flaw to crash servers
Trump’s AI Ownership Plan Could Benefit Anthropic at OpenAI’s Expense
Bangladesh beat Australia after 20 years in ODIs, register only their second win over six-time world champions | Cricket News
Notion restores access to Anthropic after service disruption
You must be logged in to post a comment Login