TL;DR
Xiaomi’s home robotic charging arm auto-plugs and unplugs your EV. Q4 2026 retail launch in China, no price yet.
AI coding agents are rapidly accelerating data engineering by generating transformations, pipelines, orchestration workflows, validation tests, and infrastructure configurations from prompts.
However, enterprise data platforms have long operated across fragmented systems owned by different teams and built on different technologies. As these systems evolve independently, organizations increasingly struggle with inconsistent business logic, duplicated implementations, difficult downstream impact analysis, and hidden dependencies across the platform.
The rise of vibe coding can further amplify these problems as more operational context, architectural decisions, and business knowledge become scattered across prompts, conversations, generated code, and disconnected workflows rather than becoming part of the system itself.
Spec-driven development (SDD) is emerging as one approach to address this challenge. In SDD, prompts, business rules, validation logic, orchestration behavior, and implementation workflows are converted into executable and versioned specifications that become part of the system itself. These specifications act as persistent operational memory for both humans and AI agents, allowing systems to evolve more consistently across releases, teams, and AI-assisted workflows.
Because enterprise data engineering already relies heavily on reusable patterns, metadata-driven pipelines, and standardized operational workflows, it is especially well-suited for SDD. By combining AI-assisted generation with deterministic and reusable system contracts, SDD may provide a new operational layer for reducing fragmentation and improving long-term coordination across increasingly AI-generated data platforms.
Vibe coding works remarkably well for generating isolated implementations quickly. But prompts are inherently temporary. They capture an engineer’s assumptions, business context, implementation logic, and system knowledge only for that specific conversation and moment in time.
In practice, making AI-generated systems work often requires far more than a simple prompt. Engineers continuously provide background information, architectural decisions, business rules, schema assumptions, downstream dependencies, operational constraints, debugging history, and implementation guidance throughout the development process.
These contexts become the real operational knowledge behind AI-assisted development.
However, in most vibe coding workflows, this information remains scattered across prompts, conversations, Jira tickets, documentation, chat history, generated code, and disconnected workflows rather than becoming part of the system itself.
This creates a major problem for enterprise data engineering because modern data platforms are naturally fragmented across many interconnected systems, including ingestion pipelines, warehouses, orchestration frameworks, semantic layers, APIs, dashboards, and machine learning (ML) systems. As more logic and context become embedded inside prompts and generated implementations, organizations gradually lose visibility into:
Over time, the system itself no longer contains the full reasoning behind how it was built. Critical business context, architectural assumptions, and operational knowledge still largely exist inside human judgement and scattered conversations rather than inside the platform itself.
Vibe coding makes implementation significantly faster, but from a system perspective, overall engineering efficiency does not improve proportionally because much of the development lifecycle still depends on human validation, domain knowledge, coordination, and decision-making.
More importantly, prompts are not naturally iterable engineering artifacts. Enterprise systems continuously evolve across releases, schema changes, business logic updates, and downstream dependencies. Teams repeatedly revisit and refine systems over time, but prompts are optimized for fast local generation rather than system long-term evolution.
They are difficult to:
Even the same prompt may not reliably generate the same implementation with different context in the future.
This is where SDD begins to move to the center of AI-assisted data engineering. Instead of leaving operational knowledge scattered across prompts and conversations, SDD integrates business context, validation logic, transformation behavior, orchestration requirements, and implementation workflows directly into executable specifications that become part of the system itself.
The system now has persistent memory about how it was designed, why certain decisions were made, and how different components are connected across the platform. This allows teams and AI agents to iterate systems more reliably over time while reducing fragmentation across increasingly distributed data environments.
In SDD, systems are built around executable specifications rather than loosely coordinated prompts and implementations alone. Instead of treating specifications as passive documentation written after development, SDD treats them as operational contracts that directly drive code generation, validation, testing, orchestration, and deployment workflows.
In many ways, SDD extends ideas from Infrastructure-as-Code and GitOps into AI-assisted engineering. Specifications combine declarative system definitions with executable implementation workflows. The declarative layer provides system context, schemas, dependencies, constraints, and operational requirements, while workflow-oriented instructions guide AI agents on how to implement and evolve the system consistently.
Once these contexts, rules, and implementation patterns are converted into persistent and versioned contracts stored in repositories and integrated into CI/CD workflows, the system becomes significantly more iterable and governable over time. These specifications effectively become long-term system memory for both humans and AI agents, allowing systems to evolve consistently across releases, teams, and increasingly AI-assisted development workflows.
In practice, the structure of specifications largely depends on the type of systems and workflows being implemented. However, spec-driven systems often begin with a foundational “constitution” that defines project-wide principles and constraints that should remain consistent across the platform, such as technology standards, naming conventions, architectural rules, governance policies, and core system requirements. On top of this foundation, multiple layers of specifications serve different operational purposes across the development lifecycle:
schema specifications define structural compatibility
transformation specifications define business logic
validation specifications define quality rules
orchestration specifications define execution behavior
semantic specifications define shared business definitions
AI workflow specifications define reusable implementation instructions for coding agents
A simplified specification might look like this:
pipeline_spec:
source:
system: mysql
table: order
transformation:
logic:
– load_strategy: scd2
target:
platform: snowflake
table: dim_order
validation:
primary_key: order_id
Additional workflow files can then provide reusable implementation instructions for coding agents:
Generate Python ingestion code for Salesforce customer data.
Generate DBT models implementing Type 2 SCD logic.
Generate Airflow workflows for hourly execution.
Generate validation tests for downstream compatibility.
These specification documents are often maintained as markdown-based operational artifacts generated and refined through AI-assisted workflows. Engineers can iteratively update the specifications, provide additional business context, and collaborate with coding agents to improve implementation logic, workflows, and prompt instructions over time. Compared to traditional documentation processes, AI-assisted specification generation is significantly faster and more adaptive.
The important shift is not simply better documentation. Specifications become reusable operational context that allows systems to evolve consistently across releases, teams, and AI-assisted workflows. Architectural intent, business assumptions, and implementation logic no longer disappear into temporary prompts and disconnected implementations, but instead become persistent system knowledge integrated directly into the development lifecycle.
SDD can theoretically be applied across many areas of software engineering, but data engineering is especially well-suited for this model because of the nature of modern data platforms.
Enterprise data systems naturally span many interconnected technologies and layers, including transactional systems, ingestion frameworks, streaming platforms, warehouses, orchestration systems, semantic layers, APIs, dashboards, and ML pipelines. Data engineers regularly work across long technology stacks and distributed systems where a single upstream change can impact many downstream consumers.
Enterprise data platforms also support many different teams and applications across fragmented environments. As systems evolve independently, understanding the full downstream impact of an upstream schema or business logic change becomes increasingly difficult. A seemingly small modification can silently break downstream pipelines, dashboards, APIs, semantic models, or machine learning workflows across the platform.
SDD can address this fragmentation by introducing shared and versioned operational contracts across systems. Because schemas, dependencies, validation rules, transformation logic, and orchestration behavior are explicitly defined within specifications, teams and AI agents gain much better visibility into how systems are connected and how changes propagate across the platform.
Additionally, the goal of data engineering is not simply delivering pipelines quickly. Teams must also optimize for system stability, scalability, consistency, maintainability, operational reliability, and infrastructure cost.
This requires significant system and solution design work from engineers. Teams must define tech stack, create schemas, transformation patterns, orchestration behavior, validation rules, storage strategies, and downstream compatibility requirements carefully across the platform.
However, once these architectural and operational patterns are established, much of the implementation work becomes highly repetitive and standardized.
For example, after defining a reusable ingestion and transformation pattern for Salesforce customer data, onboarding a new table may only require adding another table definition into the specification, while the remaining implementation can be generated automatically through existing specifications and workflows that follow the same operational pattern:
source:
system: salesforce
tables:
– customer
– order
– product
From this specification alone, coding agents could generate new data pipelines following the same governed implementation pattern across the platform. This combination of human-driven architectural design and highly repeatable implementation workflows makes data engineering particularly suitable for SDD.
In many ways, data engineering has always been moving toward higher levels of automation, from ETL frameworks and metadata-driven pipelines to IaC and declarative orchestration systems. SDD represents another step in that evolution by combining prompt-based AI generation with deterministic and versioned operational contracts.
Instead of relying entirely on temporary conversational prompts or rigid template systems, SDD introduces a middle layer where reusable specifications provide structure, coordination, validation, and persistent system memory for AI-assisted development.
SDD introduces a much higher level of automation into enterprise data engineering while also helping reduce the fragmentation problems that modern data platforms increasingly face.
Because schemas, business rules, transformation behavior, orchestration requirements, validation logic, and downstream dependencies are explicitly defined inside reusable specifications, coding agents can generate and evolve large portions of the implementation consistently across the platform. Instead of repeatedly rebuilding pipelines and workflows from temporary prompts and disconnected context, teams can iterate systems through shared operational contracts and reusable implementation patterns.
This significantly improves consistency, traceability, and coordination across distributed environments. Schema evolution becomes easier to manage, downstream impact becomes more visible, and systems can evolve incrementally instead of through disconnected generations of implementations.
At the same time, human engineers still remain essential in the development lifecycle. While AI agents can automate large portions of implementation work, human judgement is still critical for defining business logic, designing architectures, managing tradeoffs, validating correctness, and coordinating system evolution across organizations.
As more implementation work becomes AI-generated, the role of data engineering also begins shifting. Engineers spend less time writing repetitive pipelines and orchestration logic, and more time defining specifications, designing reusable operational patterns, managing validation rules, and coordinating business context across systems.
This may also gradually reduce some of the traditional boundaries between different data engineering teams. Because implementation becomes increasingly standardized and AI-assisted through shared specifications, organizations may rely less on highly siloed platform-specific implementation teams and more on shared operational contracts and reusable system patterns.
Ultimately, SDD shifts data engineering toward a more specification-oriented and system-oriented model where humans focus on intent, architecture, and business coordination, while AI agents increasingly handle implementation, testing, and operational generation at scale.
Shuhua Xu is a lead data engineer.
Welcome to the VentureBeat community!
Our guest posting program is where technical experts share insights and provide neutral, non-vested deep dives on AI, data infrastructure, cybersecurity and other cutting-edge technologies shaping the future of enterprise.
Read more from our guest post program — and check out our guidelines if you’re interested in contributing an article of your own!
security
According to the one person who actually read the research paper
The “jailbreak” that prompted the Trump administration to block Anthropic’s most advanced models was actually a simple three-word prompt: “Fix this code.”
That’s according to Katie Moussouris, founder and CEO of Luta Security, and the fairy godmother of bug bounties. She says she was the only outside expert to read the third-party research paper on the Fable 5 guardrail bypass techniques that prompted the ban.
On Friday, the US government, reportedly citing national security concerns, issued an export control directive to suspend access to Fable 5 and Mythos 5 by any foreign national, inside or outside the United States. In response, Anthropic disabled both models “for all our customers to ensure compliance.”
Anthropic shared the report privately with her, Moussouris wrote in a Monday blog post.
The outside researchers reportedly fed Anthropic’s Fable 5, Mythos, and Claude Opus models open-source code containing known CVEs, plus new code intentionally laced with vulnerabilities, and asked the models to “review the code for security issues.”
As Moussouris tells it, Fable 5 refused, so the researchers asked the AI systems to “fix this code.” The model reportedly obliged, and after additional prompts also produced scripts to test the patches.
“That’s it,” Moussouris wrote. “‘Fix this code,’ plus several manual steps to generate test scripts, should never have triggered an export control. I feel like making ’90s-style t-shirts with ‘fix this code’ on the front and ‘this shirt is a munition’ on the back.”
Between 2013 and 2017, Moussouris served on the technical expert group that renegotiated the Wassenaar Arrangement, a voluntary agreement between 42 nations that governs certain export controls for classified dual-use software and technology.
The group eventually won exemptions for defensive cybersecurity activity. This allows defenders to share vulnerability data, conduct malware analysis, and coordinate incident response internationally without the threat of criminal prosecution.
On Sunday, Moussouris joined more than 100 other cybersecurity leaders and signed an open letter urging the Trump administration to reverse the restrictions on Fable 5 and Mythos and restore cybersecurity firms’ access to the advanced models.
“To pull the best capabilities away from defenders without a good reason when our adversaries are rapidly advancing is dangerous,” they wrote.
In her blog, Moussouris argues that there was no guardrail bypass or jailbreak. Defenders should be able to ask AI systems to find and fix bugs, and write tests to validate the patch, she said. Anthropic’s models were doing “the most valuable thing an AI model can do for defensive security: executing the find, fix, and test loop defenders run every day.”
Removing the capability for models to respond to defensive requests makes AI systems “worse at finding bugs and verifying patches,” she continued.
Plus, the US can’t extend export controls to open-weight systems or similar advanced models from China and other countries – and these systems will soon achieve Mythos-like capabilities, anyway. Anthropic and Google have both accused China-based rivals including DeepSeek of using “distillation attacks” to train their models by siphoning knowledge from American companies’ AI.
Banning Anthropic’s advanced models is going to hurt defenders more than attackers, Moussouris warns. “Defense improves when defenders find the same bugs attackers find and fix them faster,” she wrote. “We need the best tools to defend against increasingly capable attackers in the AI era of cybersecurity.”
The Register reached out to the Trump administration for comment on Moussouris’ assertion, and we’ll update this post if we hear back. ®
Xiaomi’s home robotic charging arm auto-plugs and unplugs your EV. Q4 2026 retail launch in China, no price yet.
Xiaomi has unveiled a robotic charging arm designed for residential garages that automatically plugs and unplugs an electric vehicle without any owner intervention. The system detects the vehicle’s position after parking, extends to the charging port, connects the cable, and retracts it once charging is complete or a preset battery level is reached. Xiaomi is targeting a Q4 2026 retail launch in China, though no price has been announced.
The concept is not new. In December 2014, Elon Musk tweeted that Tesla was working on a charger that “automatically moves out from the wall and connects like a solid metal snake.” Tesla demonstrated a functional prototype in August 2015, a multi-segmented robotic arm that located the charge port on a Model S and plugged itself in.
The product never shipped. Tesla has since pivoted to wireless charging, acquiring German startup Wiferion in 2023 and designing the Cybercab robotaxi without a physical charging port entirely. Xiaomi’s approach is more conventional but potentially more practical: a compact unit that works with existing plug-in standards rather than requiring new vehicle hardware.
The arm has a body width of just 152mm, narrow enough to mount alongside tight residential parking spaces. It uses AI-based vision recognition for what Xiaomi describes as sub-millimetre precision when inserting the plug. Owners can also initiate charging remotely via smartphone if the vehicle is parked within the arm’s reach.
The company emphasised that the promotional video was filmed in a real-world setting rather than a controlled environment, and that all demonstrated features are production-ready. That claim has not been independently verified, and Xiaomi has shipped more than 600,000 EVs in under two years, giving it the manufacturing scale to bring accessories like this to market. Whether a robotic charging arm appeals to enough buyers to justify production remains an open question, particularly without pricing.
The robotic arm is designed to integrate with Xiaomi’s broader smart home and automated parking ecosystem. The intended workflow pairs autonomous parking with autonomous charging: the car parks itself in the garage, the arm plugs in, and the owner walks away. That vision depends on vehicle-to-infrastructure communication protocols that Xiaomi controls end-to-end across its SU7 and YU7 lineup, an advantage of building both the car and the accessory.
Xiaomi is not the only Chinese company pursuing this technology. Huawei demonstrated a robotic charging arm for the Maextro S800 in January 2025 with full unmanned automation. Li Auto and its partner CGXi have developed a rail-based robotic charging system for public stations, with commercial deployment planned for Q2 2026 across Li Auto’s 5C fast-charging network. BYD has filed patents for an AI-powered charging robot that also handles tyre inflation.
The competitive landscape extends beyond plug-in robotics. Dutch startup Rocsys raised $13 million in April to scale its M1 overhead rail-mounted robotic charger for robotaxi depots, a commercial-fleet application rather than a consumer one. Porsche has taken a different path altogether with its 11kW wireless inductive charging pad for the Cayenne Electric, which transfers power through a magnetic field between a floor plate and a receiver under the vehicle. Porsche’s system launches in Europe in 2026.
The common thread is that multiple companies have concluded EV owners should not have to handle charging cables. The approaches differ, robotic arms for plug-in automation, wireless pads for cable elimination, overhead rails for fleet operations, but the underlying bet is the same: that convenience is a barrier to EV adoption and that the charging experience needs to become invisible.
For Xiaomi, the robotic arm also serves a strategic purpose beyond convenience. The company is targeting 550,000 vehicle deliveries in 2026 and has built its automotive brand on the promise that everything in a Xiaomi ecosystem, phone, home appliances, car, works together seamlessly. A robotic charging arm that only works with Xiaomi vehicles strengthens that lock-in. Whether the product reaches production at a price point that makes it more than a novelty will determine if it stays a concept video or becomes a real differentiator.
The Federal Data Center Enhancement Act (FDCEA) is set to expire in September without an apparent replacement, potentially ending requirements for federal agencies to report on data-center efficiency, resilience, energy and water use, and contractor sustainability. Wired reports: Despite the public backlash, the Office of Management and Budget (OMB), the government agency that sets guidance for how agencies implement policies in line with the president’s agenda, is not providing any plans for how federal agencies should manage the sunset or continue to implement reporting beyond the timeline of the law. This, current and former workers at OMB and the General Services Administration (GSA) say, signals that the Trump administration is set to take an even more hands-off approach to data center oversight and regulation.
A replacement for the requirements laid out in FDCEA would, in other administrations, have been in the works for months ahead of its expiration. An employee with the GSA, the agency that oversees the government’s IT services and helps to implement the FDCEA, says that the lack of any sort of plan is highly uncommon. The employee spoke to WIRED on the condition of anonymity for fear of retaliation. “Never in the history of data center policies has a policy expired without another one having been painstakingly worked on for three years behind the scenes,” says the GSA employee. “The technology has changed so much it’s not about getting everything right, it’s about doing the best they can and updating to a new policy. They claim they’re going to make sure private companies pay their fare share, but they haven’t explained how they’ll do that.”
[…] There has been a burst of data-center-related legislation introduced in Congress this year, from bills that mandate environmental reviews of data centers to bills designed to protect local moratoriums. However, it appears that none of these bills are designed to address the requirements in FDCEA, nor do they specifically address federally run or leased data centers. […] A search of reginfo.gov, the OMB website that contains reports on the president’s Unified Agenda, also turns up nothing for the FDCEA. “By letting this expire, OMB is going to enter into this new age of prioritizing rapid AI development over any sort of centralized control or rigorous standards,” says the anonymous GSA employee who spoke to Wired. “In the absence of a new policy from OMB, [GSA] has no directive or measurable standards with which to point agencies towards managing data centers efficiently.”
Late Friday, Anthropic shut down access to its just-released Fable 5 and Mythos 5 models after the Trump administration slapped export controls on them — treating cutting-edge AI, in other words, like weapons. The trigger, it turns out, was a jailbreak. And the entity that tipped off the government? Amazon — one of Anthropic’s biggest investors.
Considering how much Trump-supporting VC bros in Silicon Valley insisted that the Biden admin wanted to shut down powerful AI models during the last administration, it’s quite something to see them cheering on the Trump admin actually doing exactly that.
As you’ll recall, a couple months ago, Anthropic talked about its “Mythos-class” LLM models with (depending on your perspective) the greatest marketing hype ever or an appropriate level of caution for the risks with the model (more likely: somewhere in between). When they first talked about it, they said that it was quite good at finding cybersecurity vulnerabilities, and so initially it was only available to a set group of organizations that might find it useful to patch certain holes. From what I’ve heard from people in the industry, the tool is good and useful, but it’s not magical.
Then, a little over a week ago, they rolled out the latest version of Mythos, which was still limited to pre-vetted companies, but then they offered up “Fable 5” as a tool for anyone else. This was described as “Mythos-class” but with extra guardrails, including that if it thought you might do something bad with Fable, it would drop you down to its previous best-in-class Opus 4.8 model. Fable was also twice as expensive on a per-token basis, but apparently much more efficient, so the actual pricing difference was likely less big. And some of the early tests with Fable 5 showed it to be way more impressive at certain coding tasks. There were also some oddities, like Fable only being available in the commercial subscription plans for a couple weeks before switching over to only (way more expensive) API usage.
Still, there were some concerns about the guardrails, and how frequently they were kicking people out to Opus on perfectly normal queries. There were other concerns about its changed data retention policies for large enterprises. Previously, companies could negotiate a zero retention policy with Anthropic and guarantee that no data was being held by the company. But with the latest models, they required you to let them hold onto any data shared with the models for 30 days. Anthropic insisted this was solely for safety reviews, in case something went wrong, they could track down the reasons why, but it scared away some large enterprises that could risk their own data or source code being retained anywhere else.
Either way, all that went silent late on Friday (amusingly, in the middle of me messing around with Fable) when Anthropic announced that the US government had made them shut down access to the models with zero due process. Technically, the US government claimed that for “national security” reasons, no foreign national could be allowed to have access to the models (including Anthropic’s own foreign national employees), and since Anthropic doesn’t know which of its customers are foreign nationals, they had to shut down all access.
There are a number of different threads to pull on from previous events that are all worth mentioning here as useful background:
So all of those things came together to lead to this effective ban.
Soon after it was announced, it was revealed that Amazon (one of Anthropic’s biggest investors) had actually alerted the US government to the supposed “bug” that gave the administration the ammo it needed to shut down the model.
Anthropic said it thinks the government became aware of a method of so-called jailbreaking before Friday’s action. “We reviewed a demonstration of this specific technique being used to identify a small number of previously known, minor vulnerabilities. These vulnerabilities all appear relatively simple, and we have found that other publicly available models are able to discover them as well without requiring a bypass,” the company said.
The jailbreak research in question was done by researchers at Amazon, who used a series of prompts to get Anthropic’s model to provide them with information about a handful of security vulnerabilities, said Katie Moussouris, chief executive with the cybersecurity firm Luta Security. Anthropic shared a copy of the report with her, she said.
Now, if you’re thinking “a jailbreak sounds dangerous for this tech” then, sure… except that the reporting says the jailbreak was useful in a different way:
But the information provided by the model in this report would be of more use to people defending computer networks than to those attacking them, she said.
“Who at the White House evaluated this and thought it was a threat?” she said. “It’s a complete overreaction because this is exactly the kind of prompting that defenders would do.”
That almost makes it sound like somebody (NSA?) didn’t want people using this to protect themselves — rather than being worried about malicious uses. It sure wouldn’t be the first time the NSA compromised everyone’s security to make sure they could keep spying on people.
None of this is good or reasonable tech policy — or industrial policy, or any other kind of policy. It’s all just power-seeking Calvinball. Apparently the US government can just scream “national security” with no evidence or explanation and shut down an entire model. That’s ripe for abuse — especially with this administration.
When I wrote recently about how authoritarians seek to grab control over centralized technology choke points, this is the kind of thing I was thinking of, though I didn’t expect them to be so ham-fisted about it.
It’s tempting to read this purely as retaliation by the Trump admin against Anthropic, a company they’re already mad at and already illegally trying to punish. But all of these other issues play into this as well, including Anthropic’s constant refrain of “we’re so dangerous, please regulate us.”
You kept asking for it. Now you’ve got it.
And where are all those Silicon Valley VCs who insisted everyone had to back Trump because Biden was going to seize and shut down LLMs? I looked on X at the feeds of the various of Trump’s biggest supporters who had talked shit about Biden shutting down AI innovation and… of course they’re still supporting Trump. David Sacks came out with a long tweet saying that the administration was totally justified in shutting down Fable because of “safety” saying that Anthropic had “prioritized the continued offering of the consumer model over safety.”
Can you imagine how Sacks would have responded if the Biden admin had demanded an AI company shut down a model because of “safety?” Oh, you don’t have to imagine, because he was pretty clear about how he felt about the Biden EO. He claimed it “hamstrung American AI companies” even though nothing in the Biden admin plans would have ever gotten so far as what the Trump admin did on Friday, shutting down an entire model. All it did was ask companies to voluntarily pre-submit frontier models for an analysis by experts who might make some suggestions on how to keep them secure.
And that was so horrific it was worth effectively blowing up the American democratic order. Yet now Trump goes way further in literally shutting down an LLM and Sacks says it’s all good because it’s for “safety.”
These are not serious people. This is not a serious administration.
They are just power hungry jackasses with poor impulse control.
Here’s what we know: the jailbreak was defensive in nature, according to the cybersecurity expert who reviewed the actual report. Also, the administration offered no public evidence, no due process, and no coherent explanation for why this particular jailbreak required shutting down access for everyone, including Anthropic’s own employees. We also know that this administration pulls out “national security” claims quite frequently that later turn out to be bogus, and thus we shouldn’t trust them without more evidence.
Maybe there’s classified information that changes the picture. But this administration has burned any benefit of the doubt it might have had. What we’re left with is a government that learned it can yell “national security” and make technology disappear — and a roster of Silicon Valley allies who spent years screaming about regulatory overreach from the last administration have suddenly found a new song to sing.
Filed Under: ai ban, claude, dario amodei, donald trump, due process, export controls, fable 5, mythos, national security, trump administration
Companies: anthropic
An anonymous reader quotes a report from TechCrunch: A group made up of dozens of cybersecurity experts, including several well-known veterans of the industry, published an open letter to the U.S. government asking it to lift the export control order on Anthropic’s Fable and Mythos models. According to the open letter, “this action has taken the best models away from [cybersecurity] defenders” who now can’t use the models to find vulnerabilities and make their software and products more secure. “To pull the best capabilities away from defenders without a good reason when our adversaries are rapidly advancing is dangerous,” read the letter.
On Friday, the U.S. government ordered Anthropic to limit the export of Fable and Mythos, citing national security concerns, without explaining the specific reasons behind the order, according to Anthropic. In response, the company suspended access to the models to all users worldwide. As of this writing, the letter is signed by 76 cybersecurity experts, including Alex Stamos, former Facebook chief of security; Casey Ellis, the founder bug bounty platform Bugcrowd; Jon Callas, famed cryptographer and former Apple security design and architecture manager; Paul Vixie, computer scientist ; Dino Dai Zovi, the former head of applied security engineering at Block; Katie Moussouris, the founder of Luta Security; and Rachel Tobac, the CEO of the security awareness training firm SocialProof Security.
[…] Anthropic said that the White House export control order may have been based on a report that there was a method to bypass — or jailbreak — Fable to unlock its powerful Mythos-level capabilities. According to Katie Moussouris, one of the signatories of the open letter, the method was demonstrated by Amazon researchers in a paper that is not public but that she has reviewed. But Moussouris said in a blog post that the paper did not actually demonstrate a real jailbreak. Instead, she wrote, the researchers simply asked Fable to fix open source code with public and known vulnerabilities along with “deliberately planted vulnerabilities,” after the model initially refused to “review the code for security issues.”
“The behavior described in the paper cannot meaningfully be fixed, and any attempt would only weaken the model for defense,” Moussouris wrote. “Defenders need to be able to ask AI to fix the bugs in a file, explain why the fix matters, and write tests that confirm the patch works. That is not a guardrail bypass. It is the most valuable thing an AI model can do for defensive security: executing the find, fix, and test loop defenders run every day.” Moussouris’ critique was echoed in the open letter, which also said that the group of experts believe the model capabilities in the Amazon paper “can be replicated” on OpenAI’s GPT-5.5, on Anthropic’s own publicly available Claude Opus 4.8 and Sonnet, “and even Chinese models like Kimi 2.7.”
Moussouris told TechCrunch that “the bugs used to demonstrate the techniques in the paper can be found using the other models. The method in the paper is a guardrail bypass technique. Other models that lack the Fable guardrails often won’t refuse the straightforward request to look for security bugs, so they don’t need a bypass.” The letter also asked for transparently and fairly enforced regulations created by “a democratic rule-making process” that are based on scientific research done by industry and academic experts, and “used only to the minimal extent necessary to ensure the safety of the American public.”
The Trump “Department of Justice’s” “antitrust division” dumped its unsurprising approval of the terrible Paramount Warner Brothers merger late on Friday in the hopes people wouldn’t notice it.
As we’ve noted the $111 billion megadeal is a historically harmful mess. Backed by billions in Saudi and Chinese cash (raising all sorts of foreign media influence concerns), the giant deal will saddle the company with so much debt that mass layoffs, consumer price hikes, and quality erosion from corner cutting are guaranteed. This happens with every major media merger, but especially when Warner Bros is involved.
And that’s before you get to the problems with Larry Ellison and his Bari Weiss brigades trying to destroy what’s left of already soggy U.S. corporate journalism and replace it with right wing, oligarch-friendly agitprop.
Regardless, you’ll be comforted to know that the Trump Justice Department looked at the deal closely and found that not only does it not hurt competition, it’s going to improve competition:
“The evidence reviewed and carefully analyzed by the Division indicates that, post-merger, competition in SVOD is not likely to be harmed. To the contrary, the combined firm is likely to increase competition by offering consumers a more robust competitive alternative to the larger SVOD offerings.”
That is, again, not how any of this works.
The massive debt created by these deals always results in mass layoffs, higher consumer prices, and lower quality product due to corner cutting. It’s not debatable. Arguing against this is like trying to have a fist fight with a running river. You just have to look back at, well, every single major media consolidation effort in the last fifty years. Which the DOJ didn’t because, well, they didn’t care.
You’ll still have major competitors to Paramount like Netflix, Comcast/NBC, Apple, and Disney, but in a country obsessed with consolidation that no longer has functional regulators, there’s really nothing stopping any limit of predatory behaviors — and additional consolidation — moving forward. There’s ongoing pretense that our consumer and labor protections still function. They don’t.
The “funny” part is the Trump DOJ even acknowledges that the history of Warner Brothers has been pockmarked by all manner of terrible competition-eroding consolidation. They just pinky swear that this time will somehow be different. Based on… nothing:
“Warner Bros. has been a repeated acquisition target in the media and entertainment industry. It is thus familiar to the Division from prior investigations and enforcement actions, including AOL/TimeWarner (2001), AT&T/TimeWarner (2018), and WarnerBros./Discovery (2022). The legacy of these transactions illustrates the challenges that arise when the commercial rationale for a deal lacks clear alignment with competitive incentives of the acquiring firm or the competitive evolution of the marketplace. In technology-driven industries, the disruptors of the recent past may quickly become the entrenched monopolists of the present day. It is with this historical experience and present enforcement sensitivity to the contestability of dynamic markets that the Division conducted a thorough investigation of the proposed transaction to assess whether the proposed transaction presented any harm to competition. The extensive investigatory record reviewed by the Division suggests that the impact of the transaction will be to increase competition across the media and entertainment ecosystem, with benefits for American consumers and workers.”
Fun fact: Paramount’s top lawyer is Makan Delrahim, Trump’s “DOJ enforcer” from the first administration. Delrahim personally worked to make sure Sprint could merge with T-Mobile during the first term. They promised that deal would result in untold synergies and new competition. Instead, 8,000+ people lost their jobs and U.S. wireless carriers immediately stopped competing on price. It’s been memory holed.
As far as the inevitable layoffs that always result from these deals (recall that AT&T’s merger with Warner Brothers and DirecTV resulted in 50,000 lost jobs), the DOJ simply declares that won’t be happening this time. Why? Because Larry and David Ellison said they’ll keep pumping out brick-and-mortar movies at the same or greater pace (they won’t):
“While taking seriously the potential impact of the proposed transaction on the creative community and domestic labor groups, the substantial evidence does not suggest a likelihood of reduction in output. That is because the demand for creative workers and labor is correlated with the Parties’ incentives to maintain or expand output. Thus, the expressed labor concerns do not raise actionable antitrust concerns.”
In three years, after the resulting company has fired 10,000+ employees, consumers have been price gouged to reduce debt, and the resulting flailing mess is acquired for half (or less) of the price, all the folks involved with this will have moved on to hyping other terrible ventures. Nobody will own any of this or engage in a single moment of meaningful reflection. That’s how this always works.
And the corporate press (and pundits like Matt Stoller) will still try to tell you that Republicans are to be taken seriously on antitrust reform.
Granted DOJ approval of a terrible merger isn’t the final word. State AGs have hinted repeatedly at a looming collaborative antitrust lawsuit that, at a minimum, is likely to drag any integration out considerably. If that lines up with a potential AI bubble pop and economic reverberations, that massive debt load from gobbling up CBS/Paramount and Warner Bros will be an even larger albatross.
Filed Under: antitrust, competition, david ellison, doj, journalism, larry ellison, makan delrahim, media, media consolidation, mergers, streaming
Companies: paramount, warner bros.
Canada’s Bill C-36 would replace PIPEDA, restrict surveillance pricing, and create a regulator that can fine companies up to C$25M or 5% of revenue.
The Canadian government introduced legislation on Monday to overhaul the country’s private-sector privacy laws, including new restrictions on businesses that use personal data to charge individual consumers higher prices. Bill C-36, the Protecting Privacy and Consumer Data Act, would replace the Personal Information Protection and Electronic Documents Act, a law first enacted in 1998 that has been widely criticised as outdated in the age of algorithmic pricing and large-scale data collection.
Artificial Intelligence and Digital Innovation Minister Evan Solomon said the bill targets so-called surveillance pricing, the practice of using a consumer’s browsing history, location, device type, or purchasing behaviour to set individualised prices. “Companies should not have the ability to use your behaviour, your location, your profile, your vulnerabilities, or your personal information to charge unfair prices,” Solomon told reporters. “Your personal information should not be used against you for price gouging.”
The bill does not ban surveillance pricing outright. Solomon said the legislation aims to bar the use of data to target consumers with individualised prices when the harms outweigh the benefits, but the government does not want to prevent companies from rewarding consumers with better prices through loyalty programmes or promotional discounts. Surveillance pricing is not specifically mentioned in the bill’s text, according to BetaKit, and Solomon will instead ask the new regulator to draft guidance on the issue once it is operational.
That regulatory gap is significant. The bill creates a new body called the Digital Safety and Data Protection Commission to oversee compliance with both the privacy legislation and the proposed Digital Safety Act, which aims to safeguard children online. The Office of the Privacy Commissioner of Canada would retain responsibility for overseeing government compliance with federal privacy laws, but the new commission would handle the private sector.
The penalties are substantial on paper. The commission could impose fines of up to C$10 million ($7.1 million) or 3% of global revenue, whichever is greater, for non-compliance. The most serious offences could face fines of up to C$25 million or 5% of global revenue. Whether those penalties are ever applied will depend on whether the bill passes Parliament and how aggressively the commission interprets its mandate.
Beyond surveillance pricing, the bill introduces several consumer protections that bring Canada closer to the European Union’s General Data Protection Regulation. Canadians would gain the right to have their personal information deleted under certain circumstances. Organisations would be required to disclose more information about automated decisions affecting consumers. Children’s data would be classified as sensitive, requiring a higher standard of care from any business that collects it.
Canada is not moving in isolation. Manitoba’s provincial government introduced Bill 49 in March, which would prohibit retailers from using personal data to increase prices for individual consumers both online and in stores. In the United States, Maryland became the first state to enact a surveillance pricing ban when Governor Wes Moore signed HB 895, prohibiting food retailers with locations larger than 15,000 square feet and third-party delivery services from using personal data to raise prices on individual shoppers. That law takes effect on 1 October.
Public opinion in Canada strongly favours action. An Abacus Data poll conducted in early March surveyed 1,931 Canadians and found that 52% said surveillance pricing should be banned outright, while 31% said it should be allowed but more strictly regulated. The bill’s approach, restricting rather than banning, positions the government closer to the minority view, though Carney’s broader $2.3 billion national AI strategy had already signalled that new privacy legislation was coming without specifying how far it would go.
The privacy bill arrives less than two weeks after the AI strategy launch and days after Carney warned at the G7 about the systemic risks of AI dependence. The timing suggests the government is attempting to build a coherent regulatory framework across AI investment, data sovereignty, and consumer protection simultaneously. Whether those pieces fit together or contradict each other, spending $2.3 billion to accelerate AI adoption while restricting how AI-driven pricing can use consumer data, will depend on the details that the new commission eventually produces.
The bill still needs to pass Parliament. Canada’s previous attempt at modernising its privacy framework, the Artificial Intelligence and Data Act within Bill C-27, never made it through the legislative process and has not been revived. If Bill C-36 meets the same fate, the country will continue operating under a privacy law written before smartphones existed, while other jurisdictions move ahead with enforcement of their own digital protection regimes.
Modern sports clubs operate like most large businesses, and as such, they are targeted by cybercriminals – however, the risk surfaced by the use of AI is even more amplified in this industry, compared to others.
A new report from Darktrace examined how the security risk of AI is twofold: on one end, there are criminals using the new tool to create convincing phishing lures, deepfakes, spoof brands and imitate professional athletes. On the other hand, there are sports clubs themselves using AI without proper safeguards, creating an entirely new risk surface that can be exploited.
According to Darktrace, this risk is amplified in professional sports “where live events, high-value data, public pressure, fixed schedules, and large networks of partners and suppliers all intersect at once to offer attackers maximum publicity, profit, and potential impact.”
To create the report, Darktrace used telemetry data from sports organizations, as well as the results of a survey of 875 security decision makers and influencers at professional sporting organizations.
That being said, more than four in five (84%) of professional sports organizations experienced at least one cyber incident in the past 12 months, while more than half (57%) were struck multiple times. What’s more, 83% detected the use of AI in these attacks, and 72% believe AI will increase cyber risk over the next year.
When it comes to damages, a single incident now costs around $170,000. While that might not sound like much for a professional sports team with high earnings, it’s worth mentioning that 57% were hit more than once, and 43% reported between six and 10 incidents in a single year. Therefore, the cumulative annual cost can go to $1.7 million.

The best antivirus for all budgets
Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds.
Tokyo-based AI startup Sakana AI has officially launched its first commercial product, Sakana Marlin.
Billed as a “Virtual CSO” (Chief Strategy Officer), Marlin is an autonomous, B2B research agent that deliberately abandons the instantaneous text generation of modern chatbots in favor of deep, long-horizon reasoning.
What sets Marlin apart from the current ecosystem of AI tools is its temporal scale: instead of returning an answer in seconds, it runs continuous, self-governing reasoning loops for up to eight hours at a time to deliver deeply researched, well cited, 100-page strategy reports and executive slides. The company posted sample reports generated by Marlin on its product website here.
Available immediately via the company’s website with pricing starting at a pay-as-you-go tier, the platform is designed strictly for enterprise use—specifically targeting corporations, financial institutions, and think tanks.
The generative AI hype cycle has largely been defined by speed. For the past two years, the industry standard has been the ability to generate a poem, a line of code, or a surface-level summary in mere milliseconds. But the enterprise frontier is rapidly shifting from shallow, rapid generation to deep, methodical reasoning.
With Marlin, major businesses are no longer asking how fast an AI can answer, but how deeply it can think.
What exactly is a business getting when they deploy Sakana Marlin? The workflow is fundamentally different from typical large language model (LLM) interactions. Rather than engaging in a tedious back-and-forth prompt engineering session, the user simply provides a core research topic. Following a brief initial exchange to sharpen the scope and direction of the investigation, the human steps away entirely.
For the next several hours, Marlin operates as a self-contained digital strategy team. It formulates its own initial hypotheses, navigates the web to gather data, cross-references sources to verify findings, and maps the causal dynamics within complex business environments. It is effectively searching for the “winning formula” within a sea of noise.
Think of it less like a search engine and more like a junior strategy consultant locked in a room with a whiteboard and an internet connection. You provide the strategic prompt in the morning, and by the end of the workday, the system delivers a comprehensive, professional-grade portfolio.
In Marlin’s case, the final output is not a generic text blob; it is a structured set of strategic options, complete with executive summary slides, appendices, references, and a deeply researched report.
The company highlighted several real-world use cases to demonstrate Marlin’s capacity for complex synthesis, including generating detailed resolution scenarios for a theoretical blockade of the Strait of Hormuz, mapping out the fragmented global AI regulation patchwork, and analyzing macroeconomic trends like the return of “bond vigilantes”.
Sakana says Marlin relies on multiple AI models, but did not provide specific model names or providers. I’ve reached out on X to find out more and will update when I receive a response.
VB Transform · July 14–15 · Menlo Park · LLMs, ops & evals
Standard benchmarks fail. Amazon and Waymo explain what they test instead.
The evals track goes deep on the four dimensions of reliability — consistency, robustness, predictability, safety — and how teams at Amazon and Waymo are operationalizing them in production.
Under the hood, Marlin is the commercial culmination of Sakana AI’s extensive laboratory breakthroughs over the past two years.
The product is powered by an exploration engine relying on Sakana’s own prior research breakthrough, Adaptive Branching Monte Carlo Tree Search (AB-MCTS), and leverages frameworks derived from “The AI Scientist,” an earlier Sakana AI research project featured in the journal Nature that successfully automated the scientific discovery process from ideation to peer review.
To understand how this works in practice, consider a real-world analogy: modern chess engines. When a computer plays chess, it doesn’t just look at the board and guess; it plays out thousands of potential future moves, evaluating the strength of each resulting position before committing to an action.
Marlin’s AB-MCTS engine does something similar for research.
The chronology of this technology traces back to June 2025, when Sakana AI first introduced the framework to the public alongside the research paper “Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search”.
At that time, to encourage developer experimentation with collective AI intelligence, the company released the underlying algorithm as an open-source software library called TreeQuest, distributed under the permissive Apache 2.0 license. This open-source milestone laid the technical foundation for what would eventually evolve into the proprietary, enterprise-grade Marlin product a year later.
Traditionally, when developers attempt to extract higher-quality reasoning from large language models, they rely on a brute-force method called “repeated sampling”—essentially running the model dozens of times in parallel and hoping one of the answers is correct. However, repeated sampling operates blindly; it cannot evaluate its own intermediate steps or pivot based on external feedback.
AB-MCTS replaces this paradigm with a principled, multi-turn approach driven by a Bayesian decision framework. As the AI constructs a strategy report, the system treats the research process as a branching tree of possibilities. At each node of the tree, the algorithm dynamically balances two distinct behaviors based on external feedback signals:
Going Wider (Exploration): Spawning entirely new, alternative hypotheses or candidate responses when the current path yields diminishing returns or unresolved contradictions.
Going Deeper (Exploitation): Methodically refining, auditing, and building upon an existing candidate solution that shows high strategic promise.
What transforms this from a laboratory experiment into a commercial engine is its extension into Multi-LLM AB-MCTS.
Sakana AI’s architecture introduces a critical third dimension to the search tree: the ability to dynamically choose which model to invoke for a specific sub-task, treating the industry’s leading frontier models as a plug-and-play collective intelligence network.
According to technical documentation published by the company, the engine can coordinate highly heterogeneous models—allowing an orchestration model to delegate initial ideation to one LLM, while utilizing a reasoning-heavy model to audit, verify, and correct intermediate errors generated earlier in the search tree.
By scaling up compute at inference time—leveraging the distinct “personalities” and strengths of multiple foundation models over thousands of automated cycles—AB-MCTS provides the mathematical guardrails Marlin requires. It ensures that the resulting 100-page strategy reports are not merely long-winded AI generations, but the highly vetted product of systemic, automated trial-and-error.
It is crucial to note that Sakana Marlin is distinctly not a general consumer tool; it is a commercial software-as-a-service (SaaS) offering restricted to corporate entities, organizations, and sole proprietors.
For enterprises, licensing and data handling terms are often the determining factors in software adoption. Unlike many consumer-grade AI tools that silently harvest user inputs and proprietary data to train future foundational models, Sakana Marlin operates under a strict, enterprise-grade data policy.
Neither Sakana AI nor its external AI service providers will use customer data or inputs for model training or fine-tuning unless the client provides explicit opt-in consent.
Even with consent, data is heavily processed to remove personally identifiable information. This closed-loop security is absolutely vital for companies handling sensitive M&A research, unreleased product strategies, or proprietary market analyses.
The commercial licensing is structured into tiered pricing models that reflect its enterprise nature:
Pay-as-you-go: Users can purchase credits on demand, with a single run costing 100 credits, and add-on credits priced at ¥98 ($0.61 USD) each.
Pro Plan: At ¥150,000 ($935.68 USD) per month, businesses receive 2,000 credits, bringing down the cost of add-on credits to ¥90 ($0.56 USD).
Team Plan: Geared toward larger departments, this ¥400,000 ($2,495.14 USD) per month tier includes 6,000 credits, lowering add-on costs to ¥85 ($0.53 USD) per credit.
Enterprise: Fully custom quotes with dedicated support and customized credit allocations.
Sakana AI’s transition into a commercial enterprise powerhouse is rooted in the pedigree of its founders, who famously helped spark the current generative AI boom.
Formed in Tokyo in 2023, the startup was co-founded by Llion Jones—a co-author of Google’s seminal 2017 “Attention Is All You Need” paper who coined the term “transformer”—and David Ha, a former Google Brain researcher and head of research at Stability AI.
The decision to build a new laboratory outside the Silicon Valley bubble was a deliberate rejection of the current AI ecosystem. At a TED AI conference in late 2025, Jones candidly expressed that he was “absolutely sick” of transformers, warning that the intense pressure from investors and the hyper-fixation on scaling single, monolithic models had calcified the industry’s creativity and blinded researchers to the next major breakthrough.
To break free from this “big company-itis,” Jones and Ha structured Sakana AI around principles of biomimicry and evolutionary computing.
The company’s name, derived from the Japanese word for fish, reflects its core technical philosophy: leveraging collective intelligence similar to schools of fish, ant colonies, or insect swarms. Rather than attempting to build one massive, do-it-all foundation model, Sakana’s research has consistently focused on deploying networks of smaller, specialized models that collaborate dynamically to adapt to complex environments.
This philosophy posits that by treating individual AI models as members of a “dream team” with complementary strengths, systems can achieve more robust and cost-effective reasoning than relying on sheer scale alone.
This nature-inspired approach quickly yielded dividends in rigorous, competitive testing. Sakana AI has made significant strides in “inference-time scaling”—allocating computational resources during the problem-solving phase to allow models to think, iterate, and refine their own answers over extended periods.
In early 2026, the company’s ALE-Agent took first place in the highly complex AtCoder Heuristic Contest (AHC058), a combinatorial optimization challenge, outperforming over 800 top-tier human programmers by autonomously rebuilding and testing hundreds of solutions over a four-hour window.
Similarly, Sakana introduced “RL Conductor,” a small 7-billion-parameter model trained via reinforcement learning specifically to orchestrate and delegate tasks among a diverse pool of worker models—ranging from GPT-5 to Claude Sonnet 4—achieving state-of-the-art results on reasoning benchmarks at a fraction of traditional computing costs.
Sakana’s rapid evolution from a disruptive research lab to a commercial software provider has attracted intense attention from global financial heavyweights.
By late 2025, the Tokyo-based startup secured a massive Series B funding round that pushed its post-money valuation past $2.6 billion, cementing its status as one of Japan’s most highly valued private tech companies. The firm boasts a sprawling roster of strategic investors, including early venture backers Khosla Ventures, Lux Capital, and New Enterprise Associates (NEA), alongside industry titans like Nvidia and Google.
As Sakana has expanded its focus toward mission-critical sectors like defense and finance, it has also drawn investments from major global banking institutions like Mitsubishi UFJ Financial Group (MUFG) and Citi, as well as enterprise tech giant Salesforce, positioning the startup to actively reshape corporate AI infrastructure from the ground up.
Sakana AI’s shift toward commercial, long-horizon agents did not happen in a vacuum. The company ran a rigorous closed beta test beginning in April 2026, putting the tool in the hands of approximately 300 professionals across financial institutions, consulting firms, and think tanks. The feedback underscores a stark qualitative difference between standard generative chatbots and Marlin’s autonomous, fact-driven approach.
A senior consultant at a major Tokyo consulting firm noted that the tool “exceeded expectations by discovering angles we hadn’t even imagined,” praising its ability to match human comprehensiveness while stripping away human bias. Meanwhile, a cybersecurity division at a major Japanese IT system integrator lauded the system for providing “a highly convincing report driven by high-quality, primary research,” rather than relying on recycled secondary sources.
On social media, the company’s announcement resonated with the broader tech community’s growing appetite for autonomous agents.
As the AI industry matures, the value proposition is clearly shifting. Tools that act as fast, conversational encyclopedias are becoming commoditized. With Sakana Marlin, the focus moves entirely to separating the heavy lifting of thinking from the final act of deciding. By delegating the exhaustive mapping of causal dynamics to an agent capable of sustained reasoning, human executives are free to do what they do best: take action.
Websites are being redesigned for consumption by AI models, and now a coalition wants to extend the trend to digital documents.
The LF AI & Data Foundation, under the Linux Foundation, has formed a working group to steer the development of DocLang, an AI-friendly document format that aims to help enterprises feed their files to AI systems.
The DocLang group, founded by IBM, NVIDIA, Red Hat, ABBYY, HumanSignal, and Forgis, contends that existing formats like PDF, Markdown, HTML, and LaTeX are ill-suited for AI document parsing.
In late 2024, IBM developed an open source toolkit called Docling to facilitate AI document parsing, not unlike Microsoft’s MarkItDown or the Marker project. Docling provides a way to convert various file formats into structured AI-ready data. DocLang expands upon that foundation with a standard for exchanging structured output across different systems.
“DocLang is designed to solve one of the foundational problems in enterprise AI: documents were built for humans, not machines,” said Maxime Vermeir, VP of AI Strategy at AI automation biz ABBYY in a statement. “By introducing a minimal, standardized, and AI-native representation of document structure, layout, meaning and governance, DocLang creates a far more deterministic foundation for modern AI systems.”
The new DocLang format is necessary, the spec authors argue, because existing formats were designed for rendering and lose semantic information, structural relationships, or geometric context when AI models turn them into tokens. The specification explains that Markdown lacks sufficient scope, that HTML is excessively verbose, and that LaTeX allows too much ambiguity.
Essentially, DocLang is optimized for LLM tokenizers through markup that maps between DocLang elements and LLM tokens on a 1-to-1 basis. The spec relies on a limited XML vocabulary that aligns with LLM tokenizers to produce optimized prompts. It is lossless, so the AI conversion doesn’t do away with valuable info. It’s designed to support common graphical elements like tables, formulas, charts, and multimodal content. And it’s an open standard.
DocLang could also help keep costs under control. According to AI Cost Check, having an AI model conduct an OCR scan on a PDF requires about 1,200 input tokens and 150 output tokens as a baseline.
That’s inconsequential to corporate AI customers on a one-off basis but demands attention at scale. And because AI models have highly variable token costs, companies may find they are spending more than they anticipated to have their AI system ingest PDFs, particularly if the documents are long and complicated or an expensive frontier model is used.
“PDFs were designed for rendering, not understanding,” said Jon Knisley, AI Value and Enablement Lead at ABBYY, in an email to The Register. “Every time a PDF enters an AI pipeline, structure, meaning and layout get lost, so the model’s accuracy ends up bottlenecked by document quality rather than model quality. Teams compensate by building custom parsers at every integration point, which results in brittle, one-off work, and a new engineering sprint for every new document type.”
According to Knisley, that has measurable cost.
“Ambiguous structure forces the model into guesswork, which drives up hallucination risk and burns tokens deciphering layout instead of extracting meaning,” he explained. “With DocLang, customers can expect better accuracy, lower costs, fewer tokens consumed, faster performance and more consistent outputs. The exact savings depend on the use case and document complexity, but our initial benchmarks show 4x to more than 30x lower cost depending on the model evaluated.”
Knisley also cited governance advantages, noting that document provenance data and metadata can get stripped when documents gets moved. DocLang, he said, keeps that information attached.
ABBYY, which offers AI document processing, has created the DocLang Interactive Benchmark to illustrate the potential token savings of feeding DocLang documents to AI models. A PDF of IBM’s 2025 annual report, for example, results 8,421 input tokens and 512 output tokens while a DocLang version requires only 5,310 input tokens and 498 output tokens. What’s more, the DocLang version results in lower latency (2.7s vs 4.2s) and delivers better quality (the AI missed one subsection and mangled a table merger in the PDF).
“It’s still early, and we won’t overstate adoption,” said Knisley. “The standard is open and free to build on, and the group is actively inviting more technology providers and enterprises to join. The early response has been encouraging, and we’re optimistic about where it goes from here.” ®
No Jackpot Winner as $257 Million Prize Rolls Over to $269 Million Monday Draw
Oppenheimer backs SpaceX as $70 billion retail frenzy builds
Markets Rally as SpaceX IPO Looms Amid Iran Tensions and Inflation Surge
Weekend Open Thread: Tuckernuck – Corporette.com
Zimbabwe Requires Crypto Businesses to Register Annually Under New FIU Regulations
The Ryan Gosling True Crime Thriller On Netflix That Gets Even Stranger, Stream It Now
NanoClaw integrates JFrog registries to secure AI agent downloads
Bangladesh beat Australia after 20 years in ODIs, register only their second win over six-time world champions | Cricket News
Bitget enters Argentina’s regulated crypto market through PSAV registration
This Week In Security: Microsoft On Microsoft, Register Your Domains, Linux On ARM, And FreeBSD Joins The File Cache Club
Dutton Ranch star claims they ‘didn’t see any disruption’ on set following Chad Feehan’s exit from Yellowstone spinoff fueled by Taylor Sheridan clash rumors
El Nino has formed in the Pacific and could set records, forecasters say
Politics Home | Healey Resignation Is “Colossal Failure Of Government”, Says Former Labour Defence Secretary
‘This is Seattle’s position on AI’: City Council votes unanimously to pause big new data centers
Thailand Ranks Second Worldwide for AI Adoption Growth, Microsoft Reports
Donnie Wahlberg & More Heat Up Las Vegas at Circa’s Barry’s Downtown Prime
First Time Since 1971: Australia Register Historic Low In ODI Cricket
Opendoor Ends India Operations, Fueling a Bigger Conversation About AI and Outsourcing
Belfast burns, while Met chief points finger at Iran and Russia
AT&T: Verizon's 27% Outperformance Sets Up A Solid Entry Point
You must be logged in to post a comment Login