Connect with us

Tech

Databricks built a RAG agent it says can handle every kind of enterprise search

Published

on

Most enterprise RAG pipelines are optimized for one search behavior. They fail silently on the others. A model trained to synthesize cross-document reports handles constraint-driven entity search poorly. A model tuned for simple lookup tasks falls apart on multi-step reasoning over internal notes. Most teams find out when something breaks.

Databricks set out to fix that with KARL, short for Knowledge Agents via Reinforcement Learning. The company trained an agent across six distinct enterprise search behaviors simultaneously using a new reinforcement learning algorithm. The result, the company claims, is a model that matches Claude Opus 4.6 on a purpose-built benchmark at 33% lower cost per query and 47% lower latency, trained entirely on synthetic data the agent generated itself with no human labeling required. That comparison is based on KARLBench, which Databricks built to evaluate enterprise search behaviors.

“A lot of the big reinforcement learning wins that we’ve seen in the community in the past year have been on verifiable tasks where there is a right and a wrong answer,” Jonathan Frankle, Chief AI Scientist at Databricks, told VentureBeat in an exclusive interview. “The tasks that we’re working on for KARL, and that are just normal for most enterprises, are not strictly verifiable in that same way.”

Those tasks include synthesizing intelligence across product manager meeting notes, reconstructing competitive deal outcomes from fragmented customer records, answering questions about account history where no single document has the full answer and generating battle cards from unstructured internal data. None of those has a single correct answer that a system can check automatically.

Advertisement

“Doing reinforcement learning in a world where you don’t have a strict right and wrong answer, and figuring out how to guide the process and make sure reward hacking doesn’t happen — that’s really non-trivial,” Frankle said. “Very little of what companies do day to day on knowledge tasks are verifiable.”

The generalization trap in enterprise RAG

Standard RAG breaks down on ambiguous, multi-step queries drawing on fragmented internal data that was never designed to be queried.

To evaluate KARL, Databricks built the KARLBench benchmark to measure performance across six enterprise search behaviors: constraint-driven entity search, cross-document report synthesis, long-document traversal with tabular numerical reasoning, exhaustive entity retrieval, procedural reasoning over technical documentation and fact aggregation over internal company notes. That last task is PMBench, built from Databricks’ own product manager meeting notes — fragmented, ambiguous and unstructured in ways that frontier models handle poorly.

Training on any single task and testing on the others produces poor results. The KARL paper shows that multi-task RL generalizes in ways single-task training does not. The team trained KARL on synthetic data for two of the six tasks and found it performed well on all four it had never seen.

Advertisement

To build a competitive battle card for a financial services customer, for example, the agent has to identify relevant accounts, filter for recency, reconstruct past competitive deals and infer outcomes — none of which is labeled anywhere in the data.

Frankle calls what KARL does “grounded reasoning”: running a difficult reasoning chain while anchoring every step in retrieved facts. “You can think of this as RAG,” he said, “but like RAG plus plus plus plus plus plus, all the way up to 200 vector database calls.”

The RL engine: why OAPL matters

KARL’s training is powered by OAPL, short for Optimal Advantage-based Policy Optimization with Lagged Inference policy. It’s a new approach, developed jointly by researchers from Cornell, Databricks and Harvard and published in a separate paper the week before KARL.

Standard LLM reinforcement learning uses on-policy algorithms like GRPO (Group Relative Policy Optimization), which assume the model generating training data and the model being updated are in sync. In distributed training, they never are. Prior approaches corrected for this with importance sampling, introducing variance and instability. OAPL embraces the off-policy nature of distributed training instead, using a regression objective that stays stable with policy lags of more than 400 gradient steps, 100 times more off-policy than prior approaches handled. In code generation experiments, it matched a GRPO-trained model using roughly three times fewer training samples.

Advertisement

OAPL’s sample efficiency is what keeps the training budget accessible. Reusing previously collected rollouts rather than requiring fresh on-policy data for every update meant the full KARL training run stayed within a few thousand GPU hours. That is the difference between a research project and something an enterprise team can realistically attempt.

Agents, memory and the context stack

There has been a lot of discussion in the industry in recent months about how RAG can be replaced with contextual memory, also sometimes referred to as agentic memory.

For Frankle, it’s not an either/or discussion, rather he sees it as a layered stack. A vector database with millions of entries sits at the base, which is too large for context. The LLM context window sits at the top. Between them, compression and caching layers are emerging that determine how much of what an agent has already learned it can carry forward.

For KARL, this is not abstract. Some KARLBench tasks required 200 sequential vector database queries, with the agent refining searches, verifying details and cross-referencing documents before committing to an answer, exhausting the context window many times over. Rather than training a separate summarization model, the team let KARL learn compression end-to-end through RL: when context grows too large, the agent compresses it and continues, with the only training signal being the reward at the end of the task. Removing that learned compression dropped accuracy on one benchmark from 57% to 39%.

Advertisement

“We just let the model figure out how to compress its own context,” Frankle said. “And this worked phenomenally well.”

Where KARL falls short

Frankle was candid about the failure modes. KARL struggles most on questions with significant ambiguity, where multiple valid answers exist and the model can’t determine whether the question is genuinely open-ended or just hard to answer. That judgment call is still an unsolved problem.

The model also exhibits what Frankle described as giving up early on some queries — stopping before producing a final answer. He pushed back on framing this as a failure, noting that the most expensive queries are typically the ones the model gets wrong anyway. Stopping is often the right call.

KARL was also trained and evaluated exclusively on vector search. Tasks requiring SQL queries, file search, or Python-based calculation are not yet in scope. Frankle said those capabilities are next on the roadmap, but they are not in the current system.

Advertisement

What this means for enterprise data teams

KARL surfaces three decisions worth revisiting for teams evaluating their retrieval infrastructure.

The first is pipeline architecture. If your RAG agent is optimized for one search behavior, the KARL results suggest it is failing on others. Multi-task training across diverse retrieval behaviors produces models that generalize. Narrow pipelines do not.

The second is why RL matters here — and it’s not just a training detail. Databricks tested the alternative: distilling from expert models via supervised fine-tuning. That approach improved in-distribution performance but produced negligible gains on tasks the model had never seen. RL developed general search behaviors that transferred. For enterprise teams facing heterogeneous data and unpredictable query types, that distinction is the whole game.

The third is what RL efficiency actually means in practice. A model trained to search better completes tasks in fewer steps, stops earlier on queries it cannot answer, diversifies its search rather than repeating failed queries, and compresses its own context rather than running out of room. The argument for training purpose-built search agents rather than routing everything through general-purpose frontier APIs is not primarily about cost. It is about building a model that knows how to do the job.

Advertisement

Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Tech

Valve doesn’t sound confident the Steam Machine will ship in 2026

Published

on

As part of a Year in Review blog detailing changes Valve made to Steam in 2025, the company shared a minor update on its hardware plans that doesn’t sound good for anyone hoping to buy a Steam Machine, Steam Controller or Steam Frame in 2026. Specifically, the company is now opening up the possibility its new hardware won’t ship this year at all.

In February, when Valve acknowledged the ongoing memory and storage shortage had delayed the launch of its hardware and could lead to higher prices, the company was still committing to a (fairly wide) window of when its hardware would ship:

“Our goal of shipping all three products in the first half of the year has not changed. But we have work to do to land on concrete pricing and launch dates that we can confidently announce, being mindful of how quickly the circumstances around both of those things can change.”

As of the company’s latest post, however, things somehow sound even less certain. “We hope to ship in 2026, but as we shared recently, memory and storage shortages have created challenges for us,” Valve wrote in its Year in Review post. “We’ll share updates publicly when we finalize our plans!”

While Valve’s air of secrecy can make it easy to read too much into the limited information the company does share, moving from “the first half of the year” to “[hoping] to ship in 2026” certainly gives it wiggle room to not release new hardware this year. And considering the difficulties other companies are facing sourcing memory and storage, it wouldn’t be all that surprising.

Advertisement

HP said in February that RAM accounts for a third of its PC costs, and industry analysts expect the RAM shortage could radically alter the PC landscape as companies are forced to raise prices. Valve’s already struggling to keep the Steam Deck in stock due to its issues securing RAM, it stands to reason sourcing components for even more devices wouldn’t make that process any easier. Then again, the company hasn’t updated its launch timing FAQ, so there’s still reason to hope the Steam Machine ships in 2026.

Source link

Continue Reading

Tech

One Sailing Pulley To Rule Them All

Published

on

When thinking of humanity’s ability to harness wind energy, many people will conjure images of windmills from places like The Netherlands or Persia. But people have been using wind energy for far longer than that in the form of sailing ships. Using the wind for transportation goes back another four thousand years or so, but despite our vast experience navigating the seas with wind alone there is still some room for improvement. Many modern sailboats use a number of different pulleys to manage all of the rigging, but this new, open-source pulley can replace many of them.

The pulley, or “block” as they are sometimes called, is built with a polymer roller made out of a type of nylon, which has the benefit of being extremely durable and self-lubricating but is a bit expensive. Durability and lack of squeakiness is important in sailing applications, though. The body is made from CNC-machined aluminum and is composed of two parts, which pivot around the pulley’s axis to allow various ropes (or “lines”) to be inserted without freeing one end of the rope. In testing, this design outperformed some proprietary stainless steel pulleys of similar size.

Another perk of this design is that it can be set up to work in many different applications on a sailboat, whether that’s for hoisting a mainsail or pulling in a jib or any other task a pulley could be used for. It can also be stacked with others in many different configurations to build custom pulleys of almost any type, and can support up to 14 mm lines. For a sailor this could be extremely valuable, because as it stands each pulley on a ship tends to be used in only certain applications, and might also be proprietary from a specific company. This pulley is being released into the open-source world, allowing anyone to create them who wants one.

Advertisement

Thanks to [Keith] for the tip!

Advertisement

Source link

Continue Reading

Tech

Seagate is now shipping HAMR disk drives holding up to 44TB of data

Published

on


Seagate introduced the Mozaic 3+ platform in 2024, turning the heat-assisted magnetic recording (HAMR) dream into a real product for customers in need of massive storage capacities. The HDD maker is now introducing the next-generation Mozaic 4+ drives, which offer capacities up to 44TB.
Read Entire Article
Source link

Continue Reading

Tech

Apple thinks it can lure in the 'Apple curious' for $599

Published

on

Apple has made it pretty clear that it wants to siphon off Android and Windows users, and it’s doing it by adopting an aggressive, “budget-friendlier” model across nearly its entire ecosystem.

Large bold blue price text reading $599 with layered rainbow-colored shadows on a solid black background
Apple is using $599 devices to grow its ecosystem

When I first entered the Apple ecosystem, it was when I bought an iPhone 4 in 2011 — I got it right after the 4s made its debut. I don’t remember exactly what I paid, but I know it was less than the initial $199 price tag.
And back then, I thought that was a completely asinine amount of money to pay for a phone. Fortunately, or maybe unfortunately, I had more money in my pocket than brains in my head, so I bought it just the same.
Continue Reading on AppleInsider | Discuss on our Forums

Source link

Continue Reading

Tech

Anthropic will fight US ‘supply chain risk’ designation in court

Published

on

Anthropic confirmed it has been designated a ‘supply chain risk’ by the US administration, and said it has no choice but to challenge in the courts.

Despite ongoing talks between Anthropic and the US Department of Defense, Anthropic confirmed last night it had received a letter from defense secretary Pete Hegseth confirming the ‘supply chain risk’ designation that had been threatened.

“Yesterday (March 4) Anthropic received a letter from the Department of [Defense] confirming that we have been designated as a supply chain risk to America’s national security,” wrote co-founder and CEO Dario Amodei last night in an official statement. “We do not believe this action is legally sound, and we see no choice but to challenge it in court.”

Amodei was quick to point out that “even supposing it was legally sound”, the limited application of the designation means the “vast majority” of its customers will be unaffected by the move. He said the restriction clearly only applied to the use of Claude by customers as a direct part of contracts with the US defense department, “not all use of Claude by customers who have such contracts”.

Advertisement

“The Department’s letter has a narrow scope, and this is because the relevant statute is narrow, too,” wrote Amodei. “It exists to protect the government rather than to punish a supplier.”

As with previous statements, Amodei strikes a conciliatory tone, saying Anthropic is committed to US national security and will offer continuing support from its engineers to ensure a smooth transition from Claude “for as long as we are permitted to do so”.

Anthropic drew the ire of the US administration after a standoff with the Pentagon, where Anthropic refused to change its safeguards related to using its AI for fully autonomous weapons, or for mass surveillance of US citizens.

With many in Silicon Valley supporting its relatively principled stand, and general users sending it to the top of the US Apple charts in recent days for free downloads – beating OpenAI’s ChatGPT for the first time – its flagship Claude.ai and Claude Code apps went down for around three hours on 2 March due to “unprecedented demand”.

Advertisement

Claude Cowork in particular was already becoming the darling of AI enthusiasts in the professional world, and Bloomberg reported on Tuesday that Anthropic was on track to generate annual revenue of almost $20bn, more than double its run rate from late 2025, signalling the rapid growth at the AI company which is today valued at around $380bn.

Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.

Source link

Advertisement
Continue Reading

Tech

Tinder settles age discrimination lawsuit for $60 million, see if you qualify for a payout

Published

on


According to the plaintiff, Tinder charged users aged 29 and older more for premium subscriptions such as Tinder Plus and Tinder Gold, while offering cheaper rates for the same services to users in their teens and 20s. The lawsuit claimed the tiered pricing model violated multiple California laws, including the…
Read Entire Article
Source link

Continue Reading

Tech

Cognizant TriZetto breach exposes health data of 3.4 million patients

Published

on

Cognizant TriZetto breach exposes health data of 3.4 million patients

TriZetto Provider Solutions, a healthcare IT company that develops software and services used by health insurers and healthcare providers, has suffered a data breach that exposed the sensitive information of over 3.4 million people.

The firm, which has been operating under the Cognizant umbrella since 2014, disclosed that it detected suspicious activity on a web portal on October 2, 2025, and launched an investigation with the help of external cybersecurity experts.

The investigation revealed that unauthorized access began nearly a year before, on November 19, 2024.

During the exposure period, the threat actors accessed records relating to insurance eligibility verification transactions, which are part of the process providers use to confirm a patient’s insurance coverage before treatment.

Advertisement

The types of data that have been exposed vary per individual, and may include one or more of the following:

  • Full names
  • Physical address
  • Date of birth
  • Social Security number
  • Health insurance member number
  • Medicare beneficiary identifier
  • Provider name
  • Health insurer name
  • Demographic, health, and insurance information

Affected providers were alerted on December 9, 2025, but customer notification started in early February 2026. According to a filing Maine’s Attorney General submitted today, the number of exposed individuals is 3,433,965.

TriZetto says that payment card, bank account, or other financial information was not exposed in this incident.

Also, the company is not aware of any cases where cybercriminals have attempted to misuse this information.

TriZetto says it has taken steps to strengthen cybersecurity on its systems and informed law enforcement authorities of the incident.

Advertisement

Notification recipients are offered free 12-month coverage of credit monitoring and identity protection services from Kroll to help mitigate risks arising from compromised data.

BleepingComputer has contacted TriZetto to learn more about the nature of the security breach and why the firm delayed notifications to consumers for several months, but we have not received a response by publication time.

No ransomware groups have taken responsibility for the attack yet, and no data leaks linked to TriZetto have appeared on underground forums.

Cognizant itself was rumored to have suffered a Maze ransomware breach in 2020. In June 2025, Clorox sued the IT firm for gross negligence after it allegedly let Scattered Spider operatives into its network following a social engineering attack in September 2023.

Advertisement

Malware is getting smarter. The Red Report 2026 reveals how new threats use math to detect sandboxes and hide in plain sight.

Download our analysis of 1.1 million malicious samples to uncover the top 10 techniques and see if your security stack is blinded.

Source link

Continue Reading

Tech

The remake of one of the best Assassin’s Creed games is actually happening

Published

on

Ubisoft has finally confirmed what Assassin’s Creed fans have suspected for years: a remake of Assassin’s Creed IV: Black Flag is officially in the works.

The company revealed the project, titled Assassin’s Creed: Black Flag Resynced, in a new blog post outlining the future of the long-running series.

We don’t know much about the game yet, but initial reports suggest that Resynced will be a full remake rather than a simple remaster, with upgraded visuals and gameplay improvements, bringing one of the best AC games into the modern age.

It’s also suggested that new story content will be added to flesh out the world around Edward Kenway’s life – at the expense of the modern day gameplay, which has apparently been removed from the remake altogether. It’ll be interesting to see how this all works, given how the original game weaved parts of both storylines into the ending.

Advertisement

We’ve known for quite some time that Ubisoft has been thinking about breathing life into the 2013 game, but this was more or less confirmed when the name surfaced on a European ratings board listing late last year.

Advertisement

We don’t yet have a release date for the game, but we know that an unannounced game was due to arrive before the end of the current financial year. Of course, Ubisoft delayed seven games earlier this year – and Black Flag is expected to be one of them.

Whether or not we see the game before the end of 2026 remains to be seen, but for now we’ll keep our “spyglass on the horizon”.

Advertisement

Source link

Continue Reading

Tech

Fully charged: Meet the local leader energizing the Pacific Northwest battery boom

Published

on

Grayson Shor, far right, at a recent Pacific Northwest Battery Collaborative meet up at a Seattle brewery on Capitol Hill. Shor launched the organization to help the sector build connections. (PNWBC Photo)

Grayson Shor, founder and executive director of the Pacific Northwest Battery Collaborative, is the driving force that’s uniting and energizing the region’s battery community.

The collaborative’s launch in October 2024 was so popular it ran out of chairs and the group now caps RSVPs because venues keep maxing out. The nonprofit has hosted 1,400 attendees at 17 different events in Washington, Oregon and online. Shor’s latest project is helping create a battery-focused mini-series he describes as a hybrid between Anthony Bourdain’s “Parts Unknown” and “Cosmos.”

Who knew that energy storage devices could generate so much enthusiasm?

“Batteries are sexy right now,” Shor said.

Batteries are making electric vehicle adoption more attractive as they’ve become increasingly powerful and quicker to recharge. They’re ubiquitous given the pervasive use of phones and consumer electronics. And as electricity demand is spiking thanks to data centers and other energy users, they’re a relatively quick, affordable way to add more power to the grid.

Advertisement

“We are installing more grid batteries in 2025 than the total amount that existed globally just two years ago,” Shor said. “This isn’t just growth, it’s a total reimagining of how our economy is powered.”

A battery ecosystem emerges

Part of the crowd at the Pacific Northwest Battery Collaborative launch party, with founder Grayson Shor in the front row in a tie. (PNWBC Photo)

Shor has spent nearly a decade working on sustainability, circular economy and battery-related issues for organizations ranging from the U.S. Department of State to Amazon to startups. When the former diplomat landed in Seattle from the other Washington more than two years ago, he was impressed by the region’s battery sector.

That included startups in electric aviation, alternative chemistries such as sodium batteries, and next-generation silicon battery materials, plus R&D resources and support at the University of Washington’s Clean Energy Institute.

But he realized the industry lacked the connections to bring together companies, academics, entrepreneurs and investors, and set out to address it. The sector welcomes his efforts.

“I’ve paid attention to folks trying to knit together community, and for the Northwest battery innovation and application ecosystem, Grayson Shor has been an unrelenting force seeking to build and amplify our unique strengths,” said Dan Schwartz, founding director of the Clean Energy Institute.

Advertisement

Tom Gurski, founder of the plug-in hybrid vehicle startup Blue Dot Motorworks, has attended the group’s functions. “In a region famous for introverted personalities their events and happy hours are invaluable for breaking down silos and getting people to connect,” Gurski said.

Beyond building community, Shor is lobbying for support for local and state policies that promote the industry and get more batteries deployed in the state. The energy storage devices have important societal benefits, he said, including better electrical grid performance and helping meet power needs during peak demand.

‘The Battery Life’

Shor speaking at a Pacific Northwest Battery Collaborative event in Seattle during 2025 PNW Climate Week. (PNBC Photo)

Shor is also the co-founder and chief product officer for Buckstop, an “urban mining” startup helping recover critical minerals from waste electronics. He also volunteers as the policy and government affairs director for the Volta Foundation, the world’s largest battery industry association.

And there’s the TV series, called “The Battery Life.” Crews recently spent three days in the Seattle area filming the first episode, visiting the battery materials company Group14 Technologies and interviewing startups at the UW’s Clean Energy Test Beds.

“We’re doing walks through factories. We’re meeting with the CEOs and the inventors, diving deep into their technology,” Shor said. But the series also has “the ‘Carl Sagan vibe,’” he added, explaining “how does this technology actually impact humanity, and why does it matter to the average person?”

Advertisement

Additional episodes will be shot in Portland and Vancouver, B.C. The plan is to air the series later this year at energy events in Oregon and Las Vegas, plus other area venues.

Future Pacific Northwest Battery Collaborative plans include a job fair and fundraising gala. Shor also envisions a convention where the entrepreneurs and innovators could set up booths to show off their technologies. The ideas keep coming.

“This is playing my little role in trying to tackle climate change, to try to advance the energy transition,” he said. “It helps with equity, it helps with economic opportunity …. It makes me happy.”
 

Source link

Advertisement
Continue Reading

Tech

The World’s Smallest Marble Clock With Pick And Place Arm

Published

on

Clocks come in many styles and sizes, with perhaps the most visually pleasing ones involving marbles. Watching these little spheres obey gravity and form clearly readable numbers on a clock has strong mesmerizing qualities. If you’re not into really big marble clocks, or cannot quite find the space for a desk-sized clock, then the tiny marble clock by [Jens] may be an option.

While he totally loved the massive marble clock that [Ivan Miranda] built, it is a massive contraption that’s hard to justify as a permanent installation. His take on the concept thus makes it as small as possible, by using a pick-and-place style arm to place the marbles instead. Although the marbles don’t do a lot of rolling this way, it’s decidedly more quiet, and replace the rumbling and click-clacking of marbles with the smooth motion of a robotic arm.

Another benefit of this clock is that it’s cheap to make, with a price tag of less than $23. A big part of this is the use of cheap SG90 micro servos, and a permanent magnet along with a mechanism that pushes the marble off said magnet. Perhaps the biggest issue with this clock is that the arm somewhat obscures the time while it’s moving around, but it’s definitely another interesting addition to the gallery of marble clocks.

Advertisement

We have previously seen such clocks built out of wood and brass as well as 3D-printed using pendulum mechanisms, which can be made pretty compact as well, albeit with a more analog vibe.

Thanks to [Hari] for the tip.

Advertisement

Source link

Continue Reading

Trending

Copyright © 2025