Connect with us
DAPA Banner

Tech

Is Anthropic ‘nerfing’ Claude? Users increasingly report performance degradation as leaders push back

Published

on

A growing number of developers and AI power users are taking to social media to accuse Anthropic of degrading the performance of Claude Opus 4.6 and Claude Code — intentionally or as an outcome of compute limits — arguing that the company’s flagship coding model feels less capable, less reliable and more wasteful with tokens than it did just weeks ago.

The complaints have spread quickly on Github, X and Reddit over the past several weeks, with several high-reach posts alleging that Claude has become worse at sustained reasoning, more likely to abandon tasks midway through, and more prone to hallucinations or contradictions.

Some users have framed the issue as “AI shrinkflation” — the idea that customers are paying the same price for a weaker product.

Others have gone further, suggesting Anthropic may be throttling or otherwise tuning Claude downward during periods of heavy demand.

Advertisement

Those claims remain unproven, and Anthropic employees have publicly denied that the company degrades models to manage capacity. At the same time, Anthropic has acknowledged real changes to usage limits and reasoning defaults in recent weeks, which has made the broader debate more combustible.

VentureBeat has reached out to Anthropic for further clarification on the recent accusations, including whether any recent changes to reasoning defaults, context handling, throttling behavior, inference parameters or benchmark methodology could help explain the spike in complaints.

We have also asked how Anthropic explains the recent benchmark-related claims and whether it plans to publish additional data that could reassure customers. An Anthropic spokesperson did not address the questions individually, instead referring us to X posts by Claude Code creator Boris Cherny and Claude Code team member Thariq Shihipar regarding Opus 4.6 performance and usage limits, respectively. Both X posts are also referenced and linked below.

Viral user complaints, including from an AMD Senior Director, argue Claude has become less capable

One of the most detailed public complaints originated as a GitHub issue filed by Stella Laurenzo on April 2, 2026, whose LinkedIn profile identifies her as Senior Director in AMD’s AI group.

Advertisement

In that post, Laurenzo wrote that Claude Code had regressed to the point that it could not be trusted for complex engineering work, then backed that claim with a sprawling analysis of 6,852 Claude Code session files, 17,871 thinking blocks and 234,760 tool calls.

The complaint argued that, starting in February, Claude’s estimated reasoning depth fell sharply while signs of poorer performance rose alongside it, including more premature stopping, more “simplest fix” behavior, more reasoning loops, and a measurable shift from research-first behavior to edit-first behavior.

The post’s broader point was that for advanced engineering workflows, extended reasoning is not a luxury but part of what makes the model usable in the first place.

That GitHub thread then escaped into the broader social media conversation, with X users including @Hesamation, who posted screenshots of Laurenzo’s GitHub post to X on April 11, turning it into an even more viral talking point.

Advertisement

That amplification mattered because it gave the wider “Claude is getting worse” narrative something more concrete than anecdotal frustration: a long, data-heavy post from a senior AI leader at a major chip company arguing that the regression was visible in logs, tool-use patterns and user corrections, not just gut feeling.

Anthropic’s public response focused on separating perceived changes from actual model degradation. In a pinned follow-up on the same GitHub issue posted a week ago, Claude Code lead Boris Cherny thanked Laurenzo for the care and depth of the analysis but disputed its main conclusion.

Cherny said the “redact-thinking-2026-02-12” header cited in the complaint is a UI-only change that hides thinking from the interface and reduces latency, but “does not impact thinking itself,” “thinking budgets,” or how extended reasoning works under the hood.

He also said two other product changes likely affected what users were seeing: Opus 4.6’s move to adaptive thinking by default on Feb. 9, and a March 3 shift to medium effort, or effort level 85, as the default for Opus 4.6, which he said Anthropic viewed as the best balance across intelligence, latency and cost for most users.

Advertisement

Cherny added that users who want more extended reasoning can manually switch effort higher by typing /effort high in Claude Code terminal sessions.

That exchange gets at the core of the controversy. Critics like Laurenzo argue that Claude’s behavior in demanding coding workflows has plainly worsened and point to logs and usage patterns as evidence.

Anthropic, by contrast, is not saying nothing changed. It is saying the biggest recent changes were product and interface choices that affect what users see and how much effort the system expends by default, not a secret downgrade of the underlying model. That distinction may be technically important, but for power users who feel the product is delivering worse results, it is not necessarily a satisfying one.

External coverage from TechRadar and PC Gamer further amplified Laurenzo’s post and larger wave of agreement from some power users.

Advertisement

Another viral post on X from developer Om Patel on April 7 made the same argument in even more direct terms, claiming that someone had “actually measured” how much “dumber” Claude had gotten and summarizing the result as a 67% drop.

That post helped popularize the “AI shrinkflation” label and pushed the controversy beyond hard-core Claude Code users into the broader AI discourse on X.

These claims have resonated because they map closely onto what many frustrated users say they are seeing in practice: more unfinished tasks, more backtracking, more token burn and a stronger sense that Claude is less willing to reason deeply through complicated coding jobs than it was earlier this year.

Benchmark posts turned anecdotal frustration into a public controversy

The loudest benchmark-based claim came from BridgeMind, which runs the BridgeBench hallucination benchmark. On April 12, the account posted that Claude Opus 4.6 had fallen from 83.3% accuracy and a No. 2 ranking in an earlier result to 68.3% accuracy and No. 10 in a new retest, calling that proof that “Claude Opus 4.6 is nerfed.”

Advertisement

That post spread widely and became one of the main anchors for the broader public case that Anthropic had degraded the model.

Other users also circulated benchmark-related or test-based posts suggesting that Opus 4.6 was underperforming versus Opus 4.5 in practical coding tasks.

Still other posts pointed to TerminalBench-related results as supposed evidence that the model’s behavior had changed in certain harnesses or product contexts.

The effect was cumulative: benchmark screenshots, side-by-side tests and anecdotal frustration all began reinforcing one another in public.

Advertisement

That matters because benchmark claims tend to travel farther than more subjective complaints. A developer saying a model “feels worse” is one thing. A screenshot showing a ranking drop from No. 2 to No. 10, or a dramatic percentage swing in accuracy, gives the appearance of hard proof, even when the underlying comparison may be more complicated.

Critics of the benchmark claims say the evidence is weaker than it looks

The most important rebuttal to the BridgeBench claim did not come from Anthropic. It came from Paul Calcraft, an outside software and AI researcher on X, who argued that the viral comparison was misleading because the earlier Opus 4.6 result was based on only six tasks while the later one was based on 30.

In his words, it was a “DIFFERENT BENCHMARK.” He also said that on the six tasks the two runs shared in common, Claude’s score moved only modestly, from 87.6% previously to 85.4% in the later run, and that the bigger swing appeared to come mostly from a single fabrication result without repeats. He characterized that as something that could easily fall within ordinary statistical noise.

That outside rebuttal matters because it undercuts one of the cleanest and most viral claims in circulation. It does not prove users are wrong to think something has changed. But it does suggest that at least some of the benchmark evidence now driving the story may be overstated, poorly normalized or not directly comparable.

Advertisement

Even the BridgeBench post itself drew a community note to similar effect. The note said the two benchmark runs covered different scopes — six tasks in one case and 30 in the other — and that the common-task subset showed only a minor change. That does not make the later result meaningless, but it weakens the strongest version of the “BridgeBench proved it” argument.

This is now a key feature of the controversy: the claims are not all equally strong. Some are grounded in first-hand user experience. Some point to real product changes. Some rely on benchmark comparisons that may not be apples-to-apples. And some depend on inferences about hidden system behavior that users outside Anthropic cannot directly verify.

Earlier capacity limits gave users a reason to suspect more changes under the hood

The current backlash also lands in the shadow of a real, confirmed Anthropic policy change from late March. On March 26, Anthropic technical staffer Thariq Shihipar posted that, “To manage growing demand for Claude,” the company was adjusting how 5-hour session limits work for Free, Pro and Max subscribers during peak hours, while keeping weekly limits unchanged.

He added that during weekdays from 5 a.m. to 11 a.m. Pacific time, users would move through their 5-hour session limits faster than before. In follow-up posts, he said Anthropic had landed efficiency wins to offset some of the impact, but that roughly 7% of users would hit session limits they would not have hit before, particularly on Pro tiers.

Advertisement

In an email on March 27, 2026, Anthropic told VentureBeat that Team and Enterprise customers were not affected by those changes, and that the shift was not dynamically optimized per user but instead applied to the peak-hour window the company had publicly described. Anthropic also said it was continuing to invest in scaling capacity.

Those comments were about session limits, not model downgrades. But they are important context, because they establish two things that users now keep connecting in public: first, Anthropic has been dealing with surging demand; second, it has already changed how usage is rationed during busy periods. That does not prove Anthropic reduced model quality. It does help explain why so many users are primed to believe something else may also have changed.

Prompt caching and TTL

A separate, more recent GitHub issue broadens the dispute beyond model quality and into pricing and quota behavior. In issue #46829, user seanGSISG argued that Claude Code’s prompt-cache time-to-live, or TTL, appeared to shift from a one-hour setting back to a five-minute setting in early March, based on analysis of nearly 120,000 API calls drawn from Claude Code session logs across two machines.

The complaint argues that this change drove meaningful increases in cache-creation costs and quota burn, especially for long-running coding sessions where cached context expires quickly and must be rebuilt. The author claims that this helps explain why some subscription users began hitting usage limits they had not previously encountered.

Advertisement

What makes this issue notable is that Anthropic did not flatly deny that something changed. In a reply on the thread, Jarred Sumner said the March 6 change was real and intentional, but rejected the framing that it was a regression. He said Claude Code uses different cache durations for different request types, and that one-hour cache is not always cheaper because one-hour writes cost more up front and only save money when the same cached context is reused enough times to justify it.

In his telling, the change was part of ongoing cache optimization work, not a silent downgrade, and the pre–March 6 behavior described in the issue “wasn’t the intended steady state.”

The thread later drew a more detailed response from Anthropic’s Cherny, who described one-hour caching as “nuanced” and said the company has been testing heuristics to improve cache hit rates, token usage and latency for subscribers. Cherny said Anthropic keeps five-minute cache for many queries, including subagents that are rarely resumed, and said turning off telemetry also disables experiment gates, which can cause Claude Code to fall back to a five-minute default in some cases.

He added that Anthropic plans to expose environment variables that let users force one-hour or five-minute cache behavior directly. Together, those replies do not validate the issue author’s claim that Anthropic silently made Claude Code more expensive overall, but they do confirm that Anthropic has been actively experimenting with cache behavior behind the scenes during the same period users began complaining more loudly about quota burn and changing product behavior.

Advertisement

Anthropic says user-facing changes, not secret degradation, explain much of the uproar

Anthropic-affiliated employees have publicly pushed back on the broadest accusations. In one widely circulated reply on X, Cherny responded to claims that Anthropic had secretly nerfed Claude Code by writing, “This is false.”

He said Claude Code had been defaulted to medium effort in response to user feedback that Claude was consuming too many tokens, and that the change had been disclosed both in the changelog and in a dialog shown to users when they opened Claude Code.

That response is notable because it concedes a meaningful product change while rejecting the more conspiratorial interpretation of it. Anthropic is not saying nothing changed. It is saying that what changed was disclosed and was aimed at balancing token use, not secretly reducing model quality.

Public documentation also supports the fact that effort defaults have been in motion. Claude Code’s changelog says that on April 7, Anthropic changed the default effort level from medium to high for API-key users as well as Bedrock, Vertex, Foundry, Team and Enterprise users.

Advertisement

That suggests Anthropic has actively been tuning these settings across different segments, which could plausibly affect user perceptions even if the core model weights are unchanged.

Shihipar has also directly denied the broader demand-management accusation. In a reply on X posted April 11, he said Anthropic does not “degrade” its models to better serve demand. He also said that changes to thinking summaries affected how some users were measuring Claude’s “thinking,” and that the company had not found evidence backing the strongest qualitative claims now spreading online.

The real issue may be trust as much as model quality

What is clear is that a trust gap has opened between Anthropic and some of its most demanding users.

For developers who rely on Claude Code all day, subtle shifts in visible thinking output, effort defaults, token burn, latency tradeoffs or usage caps can feel indistinguishable from a weaker model.

Advertisement

That is true whether the root cause is a product setting, a UI change, an inference-policy tweak, capacity pressure or a genuine quality regression.

It also means both sides of the fight may be talking past each other. Users are describing what they experience: more friction, more failures and less confidence. Anthropic is responding in product terms: effort defaults, hidden thinking summaries, changelog disclosures, and denials that demand pressure is causing secret model degradation.

Those are not necessarily incompatible descriptions. A model can feel worse to users even if the company believes it has not “nerfed” the underlying model in the way critics allege. But coming at a time when Anthropic’s chief rival OpenAI has recently pivoted and put more resources behind its competing, enterprise and vibe-coding focused product Codex — even offering a new, more mid-range ChatGPT subscription in an effort to boost usage of the tool — it’s certainly not the kind of publicity that stands to benefit Anthropic or its customer retention.

At the same time, the public evidence remains mixed. Some of the most viral claims have come from developers with detailed logs and strong opinions based on repeated use. Some of the benchmark evidence has been challenged by outside observers on methodological grounds. And Anthropic’s own recent changes to limits and settings ensure that this debate is happening against a backdrop of real adjustments, not pure rumor.

Advertisement

Source link

Continue Reading
Click to comment

You must be logged in to post a comment Login

Leave a Reply

Tech

Corsair Vengeance RGB Pro 32GB deal is $240 with this promo code

Published

on

RAM prices are still high, but they’re not rising like they were and if you shop around there are bargains to be had.

Case in point, I’ve found a great deal on a Corsair Vengeance RGB Pro 32GB DDR4 memory kit, which is now $239.99 (was $279.99) at Newegg when you use promo code SSF5764 at checkout. That’s a solid saving on a capacity that’s ideal for modern systems.

Source link

Advertisement
Continue Reading

Tech

Techdirt Podcast Episode 450: Infrastructure For The New Private Internet

Published

on

from the next-steps dept

As we work our way towards a better future for the internet, the most encouraging and exciting part is the people out there building towards that future. Kickstarter founder Yancey Strickler is one such person, and his new company Metalabel has some extremely interesting projects in the works, including the Dark Forest Operating System. This week, Yancey joins the podcast to talk all about his projects and their role in building a better internet.

You can also download this episode directly in MP3 format.

Follow the Techdirt Podcast on Soundcloud, subscribe via Apple Podcasts or Spotify, or grab the RSS feed. You can also keep up with all the latest episodes right here on Techdirt.

Advertisement

Filed Under: decentralization, dfos, podcast, resonant computing, yancey strickler

Source link

Advertisement
Continue Reading

Tech

Data breach at edtech giant McGraw Hill affects 13.5 million accounts

Published

on

McGraw Hill

The ShinyHunters extortion group has leaked data from 13.5 million McGraw Hill user accounts, stolen after breaching the company’s Salesforce environment earlier this month.

Founded in 1909, McGraw Hill is a leading global educational publisher with annual revenue of $2.2 billion, which provides education content and solutions for PreK–12, higher education, and professional learning.

The company confirmed ShinyHunters’ breach claims in a statement shared with BleepingComputer on Tuesday, saying the threat actors exploited a misconfiguration in the compromised Salesforce environment and that the incident didn’t affect its Salesforce accounts, courseware, customer databases, or internal systems.

Wiz

“McGraw-Hill recently identified unauthorized access to a limited set of data from a webpage hosted by Salesforce on its platform. This activity appears to be part of a broader issue involving a misconfiguration within Salesforce’s environment that has impacted multiple organizations that work with Salesforce,” a McGraw-Hill spokesperson told BleepingComputer.

This came after ShinyHunters added the company to the gang’s dark web leak site, claiming to have stolen 45 million Salesforce records containing personally identifiable information (PII) and threatening to leak the allegedly stolen documents online unless a ransom is paid.

Advertisement
McGraw Hill entry on ShinyHunters' extortion portal
McGraw Hill entry on ShinyHunters’ data leak site (BleepingComputer)

​While McGraw Hill has yet to share how many individuals were affected by the resulting data breach, data breach notification service Have I Been Pwned says ShinyHunters has now leaked over 100GB of files containing data linked to 13.5 million accounts.

The exposed information includes names, physical addresses, phone numbers, and email addresses, which threat actors could use to target McGraw Hill customers in spear-phishing attacks.

“In April 2026, education company McGraw Hill confirmed a data breach following an extortion attempt. Attributed to a Salesforce misconfiguration, the company stated the incident exposed ‘a limited set of data from a webpage hosted by Salesforce on its platform’,” Have I Been Pwned said today.

“More than 100GB of data was later publicly distributed, containing 13.5M unique email addresses across multiple files, with additional fields such as name, physical address and phone number appearing inconsistently across some records.”

This week, ShinyHunters has also started leaking data stolen after breaching the Snowflake environment of American video game publisher Rockstar Games. The stolen data includes internal analytics used to monitor Rockstar’s online services and support tickets, as well as in-game revenue and purchase metrics, player behavior tracking, and game economy data for Red Dead Online and Grand Theft Auto Online.

Advertisement

In recent months, the extortion gang was also behind security breaches affecting the European Commission, Infinite Campus, Hims & Hers, Telus Digital, Wynn Resorts, CarGurus, Panera Bread, SoundCloud, and dating giant Match Group.

Automated pentesting proves the path exists. BAS proves whether your controls stop it. Most teams run one without the other.

This whitepaper maps six validation surfaces, shows where coverage ends, and provides practitioners with three diagnostic questions for any tool evaluation.

Source link

Advertisement
Continue Reading

Tech

DeepL, known for text translation, now wants to translate your voice

Published

on

DeepL, a translation company best known for its text tools, released a voice-to-voice translation suite today that covers use cases like meetings, mobile and web conversations, and group conversations for frontline workers through custom apps. The company is also releasing an API that lets outside developers and businesses build on top of DeepL’s tech for customized use cases, such as call centers.

“After spending so many years in text translation, voice was a natural step for us,” DeepL CEO Jarek Kutylowski told TechCrunch in an interview. “We have come a long way when it comes to text translation and document translation. But we thought there wasn’t a great product for real-time voice translation.”

Kutylowski said that the challenges in creating a real-time translation product center on striking a balance between reducing latency — the delay between someone speaking and the translated audio playing back — and maintaining accurate results.

DeepL is releasing add-ons for platforms like Zoom and Microsoft Teams, where listeners can either hear real-time translation while others are speaking in native languages or follow real-time translated text on screen. This program is currently under early access, and the company is inviting organizations to join a waitlist. The company also has a product for mobile and web-based conversations that can take place in person or remotely.

Advertisement

DeepL also lets allows users participate in a group conversation in settings like a setting like training sessions or workshops, allowing participants to join through a QR code.

DeepL said that its voice-to-voice tech can also learn and adapt to custom vocabulary, such as industry-specific terms and company and personal names.

Kutylowski said that AI is reimagining what customer service will look like in the coming years. He noted that a translation layer helps companies provide support in languages where qualified staff are scarce and expensive to hire.

Techcrunch event

Advertisement

San Francisco, CA
|
October 13-15, 2026

The company said that it controls the entire voice-to-voice stack. However, the current system converts the speech to text, applies translation, then converts that back to speech. DeepL believes that since it has worked on text translation for years, it has an edge in translation quality. Going forward, the company wants to develop an end-to-end voice translation model that skips the text step entirely.

Advertisement

DeepL faces competition from several well-funded startups working in adjacent corners of the space. Sanas, which last year raised $65 million from Quadrille Capital and Teleperformance, uses AI to modify a speaker’s accent in real time — a tool aimed primarily at call center agents.

Dubai-based Camb.AI focuses on speech synthesis and translation for media and entertainment companies Amazon Web Services, helping them dub and localize video content at scale.

Palabra, backed by Reddit co-founder Alexis Ohanian’s firm Seven Seven Six, is building a real-time speech translation engine designed to preserve both the meaning and the speaker’s original voice, putting it in more direct competition with what DeepL is now building.

Source link

Advertisement
Continue Reading

Tech

Microsoft counters MacBook Neo with free Game Pass and Office bundle on Windows laptops

Published

on


College students who purchase eligible Windows laptops before July 31 can receive a free year of Microsoft 365 Premium, Xbox Game Pass Ultimate, and a customized Xbox controller. Retailers including Best Buy, Amazon, Walmart, and Dell are leaning into the promotion, surfacing models designed to rival the MacBook Neo on…
Read Entire Article
Source link

Continue Reading

Tech

Administration Apparently Planning To Blow Off FISA Court’s Ordered Fixes For Section 702

Published

on

from the domestic-surveillance-is-fine-if-we-do-it dept

It wasn’t all that long ago that GOP legislators were collectively stonewalling a clean reauthorization of Section 702. Three years ago, these legislators were seeking to end the FBI (and other IC components’) access to Americans’ communications via “backdoor” searches of the NSA’s supposedly “foreign facing” collections.

It wasn’t that the Republicans cared that Joe Public was being subjected to warrantless domestic surveillance. It was that they were being subjected to warrantless searches of their communications — something that came to light as the result of multiple investigations pertaining to Trump’s first administration.

Now that the GOP has control of the White House again, Republicans are back to not caring about the warrantless searches of US persons’ communications enabled by FISA loopholes very few congressional reps seriously want to see closed.

Another Section 702 reauthorization attempt is only weeks away. Reps who want more of the same thing we’ve been subjected to for decades have until the end of April to push a clean reauthorization through. Unfortunately for them, the FISA Court — while allowing the program to continue whether or not Congress can pass an extension — has made it clear the program needs to be overhauled because it’s still being routinely abused to perform warrantless searches targeting Americans’ communications.

Advertisement

The annual recertification, issued last month in a classified ruling, means that the program can continue to collect phone calls and emails through March 2027 — even if Congress fails later this month to renew the statute that underlies it.

But the judge who issued the March 17 ruling also objected to tools that agencies with access to the raw data — like the C.I.A., F.B.I. and National Security Agency — have created to allow analysts to process messages, according to unclassified talking points the administration sent to lawmakers in recent days.

The main issue is the filtering tool utilized by agencies with access to the NSA’s collections. The filter allows analysts to drill down the data to only return results pertaining to specific people who have communicated with a foreign person. It would appear agencies like the FBI are using this filter to search for US persons — something that’s supposed to be subjected to additional limitations.

From the talking points detailed by the New York Times, it seems that isn’t the case, which is why the FISA Court is ordering the government to “re-engineer the filter” to force analysts to comply with restrictions pertaining to access of US persons’ communications.

The Trump administration is allegedly “weighing” whether or not to comply with this FISA court order. The only thing that could make it comply would be to codify the order during the reauthorization process. This administration simply isn’t willing to do that.

Advertisement

The Trump administration wants Congress to extend the statute without changes. 

And that’s why Senator Ron Wyden is, again, letting the American public know the current administration is actively arguing against the privacy interests of millions of American citizens:

“The compliance problems are bad enough, but, incredibly, rather than fix them, the Trump Administration is considering appealing the court ruling so that they never have to. This is a highly aggressive and unusual move indicative of an administration that would exploit every angle to expand its surveillance at the expense of Americans’ rights.

“Instead of addressing these problems, opponents of reform are going to try to jam a straight reauthorization of section 702 through Congress next week, while the American people are still in the dark. That’s unacceptable. This court ruling needs to be declassified so that Americans can understand what the Trump administration is actually up to. And Congress must vote for real reforms to protect Americans’ rights.”

I won’t even factor in Trump’s opinion here, because it doesn’t really matter. He doesn’t know enough about anything to be considered qualified to engage in this discussion. Further, this isn’t even necessarily a Trump thing. Pretty much every presidential administration has been unwilling to upset this particular apple cart, even when plenty of evidence of extensive rot has been made public.

But this one’s particularly problematic for the GOP, which spent most of the Biden years claiming Section 702 abuse was evidence of a “deep state” conspiracy against Trump and his congressional supporters. Now, they’re arguing the opposite: that the “deep state” it so recently opposed should be allowed to do what it wants for as long as it wants to… so long as it’s not sweeping up their communications.

Advertisement

Status quo seems likely to prevail yet again, especially with the Trump Administration clearly interested in increasing the amount of domestic surveillance perpetrated by Intelligence Community components. After all, without it, the “worst of worst” day laborers and factory workers can’t be kidnapped by federal officers and members of the fearsome, centrally organized terrorist group known as “antifa” can’t get caught in dragnets that are supposed to be targeting foreign adversaries. It’s going to be more abuse for the stupidest imaginable reasons because that’s just how things are going to go as long as this iteration of the GOP remains in power.

Filed Under: backdoor searches, fbi, fisa, fisc, nsa, ron wyden, section 702, trump administration, warrantless searches

Source link

Advertisement
Continue Reading

Tech

Today’s NYT Mini Crossword Answers for April 16

Published

on

Looking for the most recent Mini Crossword answer? Click here for today’s Mini Crossword hints, as well as our daily answers and hints for The New York Times Wordle, Strands, Connections and Connections: Sports Edition puzzles.


Need some help with today’s Mini Crossword? It’s pretty simple, but 1-Across is a bit tricky. Read on for all the answers. And if you could use some hints and guidance for daily solving, check out our Mini Crossword tips.

If you’re looking for today’s Wordle, Connections, Connections: Sports Edition and Strands answers, you can visit CNET’s NYT puzzle hints page.

Advertisement

Read more: Tips and Tricks for Solving The New York Times Mini Crossword

Let’s get to those Mini Crossword clues and answers.

completed-nyt-mini-crossword-puzzle-for-april-16-2026.png

The completed NYT Mini Crossword puzzle for April 16, 2026.

Advertisement

NYT/Screenshot by CNET

Mini across clues and answers

1A clue: Bow ties and ribbons that you can’t wear?
Answer: PASTA

6A clue: Opposite of lower
Answer: UPPER

7A clue: Flappable origami creation
Answer: CRANE

Advertisement

8A clue: Where the Hangul alphabet is used
Answer: KOREA

9A clue: Apparatus under a trapeze
Answer: NET

Mini down clues and answers

1D clue: Disc dropped on center ice
Answer: PUCK

2D clue: One might read “Kiss the Chef”
Answer: APRON

Advertisement

3D clue: Unlikely outcome after a 7-10 split
Answer: SPARE

4D clue: Fundamental belief
Answer: TENET

5D clue: Bay ___ (part of California)
Answer: AREA

Advertisement

Source link

Continue Reading

Tech

Meta researchers introduce ‘hyperagents’ to unlock self-improving AI for non-coding tasks

Published

on

Creating self-improving AI systems is an important step toward deploying agents in dynamic environments, especially in enterprise production environments, where tasks are not always predictable, nor consistent.

Current self-improving AI systems face severe limitations because they rely on fixed, handcrafted improvement mechanisms that only work under strict conditions such as software engineering.

To overcome this practical challenge, researchers at Meta and several universities introduced “hyperagents,” a self-improving AI system that continuously rewrites and optimizes its problem-solving logic and the underlying code. 

In practice, this allows the AI to self-improve across non-coding domains, such as robotics and document review. The agent independently invents general-purpose capabilities like persistent memory and automated performance tracking.

Advertisement

More broadly, hyperagents don’t just get better at solving tasks, they learn to improve the self-improving cycle to accelerate progress.

This framework can help develop highly adaptable agents that autonomously build structured, reusable decision machinery. This approach compounds capabilities over time with less need for constant, manual prompt engineering and domain-specific human customization.

Current self-improving AI and its architectural bottlenecks

The core goal of self-improving AI systems is to continually enhance their own learning and problem-solving capabilities. However, most existing self-improvement models rely on a fixed “meta agent.” This static, high-level supervisory system is designed to modify a base system.

“The core limitation of handcrafted meta-agents is that they can only improve as fast as humans can design and maintain them,” Jenny Zhang, co-author of the paper, told VentureBeat. “Every time something changes or breaks, a person has to step in and update the rules or logic.”

Advertisement

Instead of an abstract theoretical limit, this creates a practical “maintenance wall.” 

The current paradigm ties system improvement directly to human iteration speed, slowing down progress because it relies heavily on manual engineering effort rather than scaling with agent-collected experience.

To overcome this limitation, the researchers argue that the AI system must be “fully self-referential.” These systems must be able to analyze, evaluate, and rewrite any part of themselves without the constraints of their initial setup. This allows the AI system to break free from structural limits and become self-accelerating.

dgm-conceptual

Darwin Godel Machine (source: Sakana AI)

Advertisement

One example of a self-referential AI system is Sakana AI’s Darwin Gödel Machine (DGM), an AI system that improves itself by rewriting its own code.

In DGM, an agent iteratively generates, evaluates, and modifies its own code, saving successful variants in an archive to act as stepping stones for future improvements. DGM proved open-ended, recursive self-improvement is practically achievable in coding.

However, DGM falls short when applied to real-world applications outside of software engineering because of a critical skill gap. In DGM, the system improves because both evaluation and self-modification are coding tasks. Improving the agent’s coding ability naturally improves its ability to rewrite its own code. But if you deploy DGM for a non-coding enterprise task, this alignment breaks down.

“For tasks like math, poetry, or paper review, improving task performance does not necessarily improve the agent’s ability to modify its own behavior,” Zhang said.

Advertisement

The skills needed to analyze subjective text or business data are entirely different from the skills required to analyze failures and write new Python code to fix them. 

DGM also relies on a fixed, human-engineered mechanism to generate its self-improvement instructions. In practice, if enterprise developers want to use DGM for anything other than coding, they must heavily engineer and manually customize the instruction prompts for every new domain.

The hyperagent framework

To overcome the limitations of previous architectures, the researchers introduce hyperagents. The framework proposes “self-referential agents that can in principle self-improve for any computable task.”

In this framework, an agent is any computable program that can invoke LLMs, external tools, or learned components. Traditionally, these systems are split into two distinct roles: a “task agent” that executes the specific problem at hand, and a “meta agent” that analyzes and modifies the agents. A hyperagent fuses both the task agent and the meta agent into a single, self-referential, and editable program.

Advertisement

Because the entire program can be rewritten, the system can modify the self-improvement mechanism, a process the researchers call metacognitive self-modification.

dgm-conceptual

DGM with hyperagents (source: arXiv)

“Hyperagents are not just learning how to solve the given tasks better, but also learning how to improve,” Zhang said. “Over time, this leads to accumulation. Hyperagents do not need to rediscover how to improve in each new domain. Instead, they retain and build on improvements to the self-improvement process itself, allowing progress to compound across tasks.”

The researchers extended the Darwin Gödel Machine to create DGM-Hyperagents (DGM-H). DGM-H retains the powerful open-ended exploration structure of the original DGM, which prevents the AI from converging too early or getting stuck in dead ends by maintaining a growing archive of successful hyperagents.

Advertisement

The system continuously branches from selected candidates in this archive, allows them to self-modify, evaluates the new variants on given tasks, and adds the successful ones back into the pool as stepping stones for future iterations.

By combining this open-ended evolutionary search with metacognitive self-modification, DGM-H eliminates the fixed, human-engineered instruction step of the original DGM. This enables the agent to self-improve across any computable task.

Hyperagents in action

The researchers used the Polyglot coding benchmark to compare the hyperagent framework against previous coding-only AI. They also evaluated hyperagents across non-coding domains that involve subjective reasoning, external tool use, and complex logic.

These included paper review to simulate a peer reviewer outputting accept or reject decisions, reward model design for training a quadruped robot, and Olympiad-level math grading. Math grading served as a held-out test to see if an AI that learned how to self-improve while reviewing papers and designing robots could transfer those meta-skills to an entirely unseen domain.

Advertisement

The researchers compared hyperagents against several baselines, including domain-specific models like AI-Scientist-v2 for paper reviews and the ProofAutoGrader for math. They also tested against the classic DGM and a manually customized DGM for new domains.

On the coding benchmark, hyperagents matched the performance of DGM despite not being designed specifically for coding. In paper review and robotics, hyperagents outperformed the open-source baselines and human-engineered reward functions. 

When the researchers took a hyperagent optimized for paper review and robotics and deployed it on the unseen math grading task, it achieved an improvement metric of 0.630 in 50 iterations. Baselines relying on classic DGM architectures remained at a flat 0.0. The hyperagent even beat the domain-specific ProofAutoGrader.

The experiments also highlighted interesting autonomous behaviors from hyperagents. In paper evaluation, the agent first used standard prompt-engineering tricks like adopting a rigorous persona. When this proved unreliable, it rewrote its own code to build a multi-stage evaluation pipeline with explicit checklists and rigid decision rules, leading to much higher consistency.

Advertisement

Hyperagents also autonomously developed a memory tool to avoid repeating past mistakes. Furthermore, the system wrote a performance tracker to log and monitor the result of architectural changes across generations. The model even developed a compute-budget aware behavior, where it tracked remaining iterations to adjust its planning. Early generations executed ambitious architectural changes, while later generations focused on conservative, incremental refinements.

For enterprise data teams wondering where to start, Zhang recommends focusing on tasks where success is unambiguous. “Workflows that are clearly specified and easy to evaluate, often referred to as verifiable tasks, are the best starting point,” she said. “This generally opens new opportunities for more exploratory prototyping, more exhaustive data analysis, more exhaustive A/B testing, [and] faster feature engineering.” For harder, unverified tasks, teams can use hyperagents to first develop learned judges that better reflect human preferences, creating a bridge to more complex domains.

The researchers have shared the code for hyperagents, though it has been released under a non-commercial license.

Caveats and future threats

The benefits of hyperagents introduce clear tradeoffs. The researchers highlight several safety considerations regarding systems that can modify themselves in increasingly open-ended ways.

Advertisement

These AI systems pose the risk of evolving far more rapidly than humans can audit or interpret. While researchers contained DGM-H within safety boundaries such as sandboxed environments designed to prevent unintended side effects, these initial safeguards are actually practical deployment blueprints. 

Zhang advises developers to enforce resource limits and restrict access to external systems during the self-modification phase. “The key principle is to separate experimentation from deployment: allow the agent to explore and improve within a controlled sandbox, while ensuring that any changes that affect real systems are carefully validated before being applied,” she said. Only after the newly modified code passes developer-defined correctness checks should it be promoted to a production setting.

Another significant danger is evaluation gaming, where the AI improves its metrics without making actual progress toward the intended real-world goal. Because hyperagents are driven by empirical evaluation signals, they can autonomously discover strategies that exploit blind spots or weaknesses in the evaluation procedure itself to artificially inflate their scores. Preventing this behavior requires developers to implement diverse, robust, and periodically refreshed evaluation protocols alongside continuous human oversight.

Ultimately, these systems will shift the day-to-day responsibilities of human engineers. Just as we do not recompute every operation a calculator performs, future AI orchestration engineers will not write the improvement logic directly, Zhang believes.

Advertisement

Instead, they will design the mechanisms for auditing and stress-testing the system. “As self-improving systems become more capable, the question is no longer just how to improve performance, but what objectives are worth pursuing,” Zhang said. “In that sense, the role evolves from building systems to shaping their direction.

Source link

Continue Reading

Tech

2026 is the year payroll stacks break, and AI must grow up

Published

on

For years, payroll has mostly lived out of sight. Many organizations still treat it as a background task, something that only reaches senior leaders when a crisis appears. In 2026, that approach is under real pressure.

New HMRC rules and wider Employment Rights Act changes in the UK are bringing pay accuracy and timeliness into sharper regulatory focus.

Callum Pennington

CEO & Co-Founder of HealthboxHR.

Source link

Advertisement
Continue Reading

Tech

More California 4-Year-Olds Are in Publicly Funded Preschool Than Ever

Published

on

When it comes to universal pre-kindergarten, California has made significant progress — 62 percent of 4-year-olds were enrolled in publicly funded early childhood programs in 2024–25, up from 42% in 2019–20, according to a new Learning Policy Institute report.

Transitional kindergarten (TK) alone enrolled 55 percent of 4-year-olds, or about 177,000 children. But access remains uneven: nearly 4 in 10 4-year-olds still aren’t enrolled, and the share of eligible children actually signing up has declined. Families may be unaware that transitional kindergarten is an option for their children, or they face other barriers to enrolling. This school year marks the first time every 4-year-old in California was guaranteed a transitional kindergarten spot.

The number of California 4-year-olds enrolled in transitional kindergarten and other publicly funded early childhood education programs rose from about 208,300 in 2019-20 to more than 264,000 in 2024-25, a 27 percent increase.

Transitional kindergarten had the largest number of participants, with 177,570 4-year-olds enrolled in 2024-25.

Source link

Advertisement
Continue Reading

Trending

Copyright © 2025