Ad patterns alone reveal identity traits without accessing personal data directly
AI profiling from ads is faster, cheaper, and more scalable
Short browsing sessions provide enough data for accurate personal inference
The ads that appear on your screen are not chosen at random, and researchers have now proven AI can turn those ads into a detailed picture of your private life.
A team from UNSW Sydney and QUT examined more than 435,000 Facebook ads collected from 891 Australians through a citizen science project.
Using widely available large language models, the researchers found they could predict users’ personal alignments without ever seeing their history or personal data.
Latest Videos From
Advertisement
How your ad stream becomes a mirror of your life
Advertising platforms build profiles on you and then choose which ads to send your way – those choices create a unique pattern of ads that reveals information about you to anyone who can see that pattern.
The study showed AI tools could infer gender, age, education, employment, political preference, and economic standing from ad exposure alone.
The process was more than 200 times cheaper and 50 times faster than human analysis of the same ad patterns.
Even short browsing sessions gave the AI enough data to build an accurate profile, meaning attackers do not need to watch you for weeks on end.
Advertisement
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
The most likely attack vector is browser extensions, because most of these tools already require permission to read web page content.
Popular extensions like ad blockers, coupon finders, and page translators need that access to function normally – however, those same permissions could be repurposed to quietly collect the ads you see and send them to an attacker.
This scenario is severe due to its inherent stealth, as the extension continues to perform its normal job, so you would never suspect anything is wrong.
Advertisement
No hacking is required, and the advertising platform never knows that its delivery system is being used as a surveillance tool.
What this means for your online privacy
You can lower your risk by being careful with browser extensions and adjusting your privacy settings.
Advertisement
Unfortunately, a VPN offers no protection here, because the ads reach your device no matter how you connect to the internet.
Individuals cannot fully solve this problem on their own because they cannot easily opt out of the ad economy entirely.
The researchers argue that privacy laws must evolve to address not just what data is collected, but what can be inferred from the content you passively consume.
Your ad stream is a fingerprint that AI can now read, and it is only ethical for laws to protect that fingerprint.
Both the Nissan Frontier and Toyota Tacoma start around $33,000 (including destination fee) for the 2026 model year, but they won’t be worth that much for long. In fact, as soon as drivers leave the dealership, their trucks will lose some of their original value; they’ll continue to do so as the months and years tick by. Yet based on the latest data, one of these two trucks is likely to lose its value notably faster than the other.
The estimated difference in depreciation between the two trucks varies between data sources, but the overall picture remains consistent. The Tacoma is predicted to lose less of its value over a 5 year period than the Frontier, although both models hold their value well compared to best-selling full-size pickups like the Ford F-150 and Chevrolet Silverado 1500.
Advertisement
According to the latest iSeeCars study, the Frontier will lose an average of 35.5% of its value after 5 years on the road, while the Tacoma will lose just 19.9% of its value across the same period of time. That makes the Tacoma the least-depreciating pickup truck on the market according to the study, just ahead of the larger Toyota Tundra. Meanwhile, CarEdge predicts that a new Frontier will lose 37% of its value after five years, while a new Tacoma will only drop 22% in value. KBB isn’t so optimistic about either truck’s depreciation rates, predicting that the Frontier and Tacoma will lose 52.2% and 44.3% of their value respectively over the same period.
Advertisement
Value retention estimates are only a rough guide, but the Tacoma remains a winner
The difference in predicted values between sources can be attributed to a variety of factors, from differences in calculation methodology to assumptions about the average new price each buyer will be paying. The latter factor is particularly important when comparing the Frontier and Tacoma, since the Tacoma has a far bigger price difference between its base and top trims.
Although both trucks start around the same MSRP for a bare-bones, base-spec model, many buyers will be looking further up the trim range to add as much extra capability and comfort as their budget allows. The costliest trim of the 2026 Frontier is the Long Bed Pro-4X, which starts from $44,115 (including a $1,745 destination fee). That price dwarfs the top end of the Tacoma’s trim range, where the TRD Pro starts from $66,195 (also including a $1,745 destination fee).
The currently available study data doesn’t confirm whether buyers who pick a top-spec Tacoma, which retails for roughly double the price of a base variant, can expect to hold onto as much of their original investment as those who buy a base-spec truck. Nonetheless, average value retention across the model as a whole remains very high, and given that the Tacoma was crowned the most dependable midsize truck on the market by JD Power in 2026, that class-leading value retention is unlikely to change anytime soon.
Amazon is adding a short-form video feed to the Prime Video app called “Clips,” the company announced on Friday.
Rolling out first in the U.S., Clips will include… well, clips of shows on Prime Video that are designed to hook a viewer and get them to give the full show a try. From that clip, users can add a title to their watchlist, share it with a friend, or navigate to rent, buy, or access the title through their subscription.
“Clips gives customers a whole new way to browse with short, personalized snippets tailored to their interests,” said Prime Video’s director of Global Application Experiences, Brian Griffin, in a press release. “Whether they have a few minutes to scroll or are looking for something to watch when they have more time, entertainment is just a tap away.”
Amazon first tested this short-form feed during the NBA season, showing highlights that users can scroll through as though they’re watching TikToks.
Advertisement
It’s not a surprise to see Prime Video make this change — Netflix, Peacock, Tubi, Disney, and others have recently rolled out similar experiences, which are designed to promote discovery. Netflix’s short-form feed even shares the Clips name.
Clips is first rolling out to select U.S. customers on iOS, Android, and Fire tablets, but it will be available more broadly this summer. Users can navigate to Clips by scrolling down on the Clips carousel on the Prime Video mobile home page, which will surface a full-screen vertical feed.
When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.
“Plant seeds can sense the vibrations generated by falling raindrops,” reports ScienceAlert, “and respond by waking from their state of dormancy to welcome the water, new research shows…. to germinate in ‘anticipation’ of the coming deluge.” The finding, discovered by MIT mechanical engineers Nicholas Makris and Cadine Navarro, offers the first direct evidence that seeds and seedlings can sense and respond to sounds in nature… “The energy of the rain sound is enough to accelerate a seed’s growth,” [explains Markis].
Plants don’t have the same aural equipment we do to actually hear sounds, of course. But the study suggests that seeds respond to the same vibrations that can produce a sound experience in our human ears. Across a series of experiments, the researchers submerged nearly 8,000 rice seeds in shallow tubs of water, at a depth of around 3 centimeters (1 inch), and exposed some of them to falling water drops over periods of six days… A hydrophone recorded the acoustic vibrations produced by the drops, confirming that the experiment mimicked the vibrations produced by actual raindrops falling in nature — such as the driving downpours that can sometimes pelt Massachusetts’ puddles, ponds, and wetlands… In their study, the researchers observed that seeds exposed to the falling drops germinated up to around 37% faster, compared with seeds that did not receive the simulated rainstorm treatment but were housed in otherwise identical conditions.
Nvidia continues to be a major investor in the AI ecosystem, committing more than $40 billion to equity investments in AI companies — and that’s just in these early months of 2026, according to CNBC.
Much of that total comes from a single bet, a $30 billion investment in OpenAI. But CNBC reports that the chipmaker has also announced seven multi-billion dollar investments in publicly traded companies, most recently deals to invest up to $3.2 billion in glassmaker Corning and up to $2.1 billion in data center operator IREN.
We’ve previously rounded up Nvidia’s investments in AI startups, including 67 venture deals in 2025. And according to FactSet data, it’s already participated in around two dozen investment rounds in private startups in 2026.
The fact that Nvidia has been investing in some of its own customers has led to the recurring criticism that these are circular deals moving money back-and-forth between the same companies.
Advertisement
Wedbush Securities analyst Matthew Bryson said Nvidia’s investments fall “squarely into the circular investment theme,” but suggested that if successful, they could help the company build a “competitive moat.”
General Motors has reached a privacy-related settlement with a group of law enforcement agencies led by California Attorney General Rob Bonta.
Back in 2024, The New York Times reported that automakers including GM were sharing information about their customers’ driving behavior with insurance companies, and that some customers were concerned that their insurance rates had gone up as a result.
The settlement announcement from Bonta’s office similarly alleges that GM sold “the names, contact information, geolocation data, and driving behavior data of hundreds of thousands of Californians” to Verisk Analytics and LexisNexis Risk Solutions, which are both data brokers. Bonta’s office further alleges that this data was collected through GM’s OnStar program, and that the company made roughly $20 million from data sales.
However, Bonta’s office also said the data did not lead to increased insurance prices in California, “likely because under California’s insurance laws, insurers are prohibited from using driving data to set insurance rates.”
Advertisement
As part of the settlement, GM has agreed to pay $12.75 million in civil penalties and to stop selling driving data to any consumer reporting agencies for five years, Bonta’s office said. GM has also agreed to delete any driver data that it still retains within 180 days (unless it obtains consent from customers), and to request that Lexis and Verisk delete that data.
“General Motors sold the data of California drivers without their knowledge or consent and despite numerous statements reassuring drivers that it would not do so,” Bonta said in a statement, adding that the settlement “requires General Motors to abandon these illegal practices and underscores the importance of the data minimization in California’s privacy law — companies can’t just hold on to data and use it later for another purpose.”
GM told Reuters that the settlement “addresses Smart Driver, a product we discontinued in 2024, and reinforces steps we’ve taken to strengthen our privacy practices.”
Advertisement
When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.
Deletion of a longstanding privacy assurance sparks concerns
Google has changed Chrome’s disclosure language about how its on-device AI works, but that doesn’t mean the company intends to capture on-device AI interactions.
The Chrome menu modification, which isn’t universally rolled out yet even in Chrome 148, was noted this week on Reddit.
Advertisement
The “On-device AI” message in Chrome’s System settings previously read, “To power features like scam detection, Chrome can use AI models that run directly on your device without sending your data to Google servers. When this is off, these features might not work.”
But the message changed recently – it lost the phrase “without sending your data to Google servers.”
That prompted privacy advocate Alexander Hanff to question whether the edit signaled an architectural change that would see local AI interactions processed by Google servers instead of remaining on-device.
“Why was the sentence ‘without sending your data to Google servers’ removed from the on-device AI description in Chrome’s Settings UI?” Hanff asked. “Was the previous text inaccurate? Has the architecture changed? Was the wording withdrawn on legal advice because Google was unwilling to defend it as a representation?”
Advertisement
Asked about this, a Google spokesperson said, “This doesn’t reflect a change to how we handle on-device AI for Chrome. The data that is passed to the model is processed solely on device.”
It appears this situation deserves a more genteel rendering of Hanlon’s Razor – “Never attribute to malice that which is adequately explained by stupidity.”
In this case, it’s “Never attribute to malice that which is adequately explained by bad timing.”
Word of the menu modification surfaced as Chrome was rolling out the Prompt API, which is designed to provide web pages with a programmatic way to interact with a browser-resident AI model. The API’s arrival and public discussion of it drew attention to the fact that Chrome has been silently downloading Google’s 4GB Nano model onto users’ devices. The coincidence of these events made it seem that Google was preparing to capture on-device prompts and responses, which would be a significant privacy retreat.
Advertisement
In fact, Chrome has been letting Nano sleep on the couch for early adopters dating back two years when local AI was implemented in Chrome 126 as a preview program. While Google hasn’t yet made model downloading and storage opt-in, the biz did earlier this year implement a way to deactivate and remove the space-hogging model.
“We’ve offered Gemini Nano for Chrome since 2024 as a lightweight, on-device model,” a Google spokesperson explained, pointing to relevant help documentation.
“It powers important security capabilities like scam detection and developer APIs without sending your data to the cloud. While this requires some local space on the desktop to run, the model will automatically uninstall if the device is low on resources. In February, we began rolling out the ability for users to easily turn off and remove the model directly in Chrome settings. Once disabled, the model will no longer download or update.”
The edit to the “On-device AI” message occurred in early April. According to Google, Gemini Nano in Chrome processes all data on-device.
Advertisement
But when websites interact with Gemini Nano in Chrome – via the Prompt API, for example – they can see the inputs and outputs of the model. In such cases, the data handling would fall under the privacy policy of the website interacting with the user’s Nano instance.
Google decided to change its “On-device AI” message to avoid confusion – and perhaps to preclude legal claims alleging policy violations – when the user is interacting with a Google site that calls out to the Nano model on-device, in support of some service it provides.
In that scenario, the Google site would have access to the prompts it sends and responses it gets from the user’s on-device model. That interaction would happen “without sending your data to Google servers,” at least in the context of a user querying a model running in Google Cloud.
But since the user’s on-device Chrome-resident Nano model would send data to the Google site in response to that site’s API calls, that data transmission might be interpreted as a violation of the local AI commitment language. Hence the edit.
Advertisement
Google’s decision to have Gemini Nano become a Chrome squatter is a novel way of doing things, given that co-opting people’s computing resources has largely been the province of covert crypto-mining scripts. But perhaps after years of offering Gmail and Search at no monetary cost, Google feels entitled to a few gigabytes of Chrome users’ local storage and occasional bursts of their on-device compute. ®
Akamai disclosed a 1.8 billion dollar, seven-year cloud deal with Anthropic, its largest contract ever. The stock rose 27 per cent in a day as the CDN company’s AI cloud pivot received its most significant validation.
Advertisement
Akamai Technologies disclosed a 1.8 billion dollar, seven-year cloud infrastructure deal with a customer it described only as “a leading frontier model provider.” Bloomberg identified the customer as Anthropic. The stock rose 27 per cent in a single day, the largest rally in the company’s 28-year history. A company that built its business delivering web pages faster than anyone else just became an AI infrastructure provider on the strength of one contract.
The deal is the centrepiece of a quarter in which Akamai’s cloud infrastructure services revenue grew 40 per cent year over year to 95 million dollars, while its legacy content delivery business declined 7 per cent. The company is being repriced by investors not for what it has been for two decades but for what one contract suggests it could become. The question is whether a single customer commitment, however large, constitutes a transformation or a concentration risk.
The 1.8 billion dollar contract is the largest in Akamai’s history. Revenue from the commitment is expected to begin in the fourth quarter of 2026, contributing approximately 20 to 25 million dollars in that period. The seven-year term provides visibility that Akamai’s legacy CDN business, which operates on shorter cycles and faces persistent price compression, has never offered.
The deal follows a 200 million dollar, four-year cloud services agreement that Akamai signed in February with another unnamed US technology company, under which the customer will use a multi-thousand NVIDIA Blackwell GPU cluster. Together, the two contracts represent two billion dollars in committed cloud revenue from customers that Akamai did not have two years ago.
Anthropic signed to take all of SpaceX’s Colossus 1 data centre capacity, adding more than 300 megawatts and over 220,000 NVIDIA GPUs to its compute footprint. The Akamai deal extends the same logic: Anthropic is buying compute capacity from every available provider as demand for Claude outpaces supply. Dario Amodei, Anthropic’s chief executive, said the company experienced “80x growth” in annualised revenue and usage in the first quarter of 2026 and is “working as quickly as possible” to secure more computing resources.
Advertisement
The pivot
Akamai was founded in 1998 at MIT to solve the problem of delivering web content without congestion. For two decades, it operated the world’s largest content delivery network, caching and distributing web pages, video streams, and software downloads across more than 4,000 locations in 130 countries. The CDN business made Akamai indispensable to the internet. It also became a commodity.
Under chief executive Tom Leighton, who moved from chief scientist to the top role in 2013, the company spent a decade diversifying. The first pivot was into cybersecurity, which now accounts for 55 per cent of revenue at 590 million dollars per quarter, growing 11 per cent year over year. The second pivot, into cloud computing, began with the 900 million dollar acquisition of Linode in 2022 and is now producing the growth that investors had been waiting to see.
Leighton told CNBC that the deal represents validation of the company’s “different approach” and that Akamai has “a very strong pipeline of major enterprise customers, including some that have very large cloud needs.” The company announced at NVIDIA’s GTC event in March that it would deploy thousands of NVIDIA RTX PRO 6000 GPUs and build what it described as the “industry’s first global-scale implementation of NVIDIA’s AI Grid,” pushing AI inference closer to end users to reduce latency and cost.
The customer
Anthropic’s decision to sign a 1.8 billion dollar contract with Akamai reflects the constraint that defines the current AI infrastructure market: demand for compute exceeds the capacity of any single provider. Anthropic already runs Claude across Google tensor processing units, Amazon’s custom chips, and NVIDIA hardware. It has signed with SpaceX for data centre capacity. It is exploring building its own chips.
Advertisement
Anthropic is exploring building its own AI chips as its run-rate revenue surpasses 30 billion dollars, but custom silicon takes years to design and validate. In the interim, Anthropic is buying capacity wherever it can find it. Akamai’s distributed network of edge locations, originally built for CDN traffic, offers something that centralised hyperscale data centres do not: the ability to run inference workloads close to end users, which reduces latency for the real-time applications that enterprises are beginning to deploy.
Nebius acquired Eigen AI for 643 million dollars to optimise inference performance, a bet that the most valuable layer in AI infrastructure is not raw compute but the efficiency with which that compute is used. Akamai’s pitch to Anthropic rests on a similar premise: that distributed inference at the edge is more efficient for certain workloads than centralised processing in a hyperscale facility.
The numbers
Akamai reported first-quarter revenue of 1.074 billion dollars, up 6 per cent year over year. Adjusted earnings per share were 1.61 dollars. Cloud infrastructure services revenue was 95 million dollars, up 40 per cent. Security revenue was 590 million dollars, up 11 per cent. Delivery and other revenue was 389 million dollars, down 7 per cent.
The cloud segment represents less than 9 per cent of total revenue. The 1.8 billion dollar deal, at approximately 257 million dollars per year, would more than double the segment’s current annual run rate. The contract transforms cloud from a promising but small division into the company’s primary growth engine, at least on a committed-revenue basis.
Advertisement
For the full year, Akamai is forecasting revenue of 4.45 to 4.55 billion dollars and adjusted earnings of 6.40 to 7.15 dollars per share. The guidance does not yet reflect the full impact of the Anthropic contract, which begins contributing in the fourth quarter. Analysts will spend the next two quarters trying to determine whether the deal is a one-off or the first in a series.
But a 1.8 billion dollar contract with one customer concentrates risk as much as it concentrates revenue. Anthropic’s annualised revenue has grown from approximately 900 million dollars in late 2025 to a reported 30 billion dollar run rate. Growth at that pace creates demand for infrastructure. It also creates the conditions for a correction if the demand curve flattens. Akamai’s stock gained 27 per cent on the announcement. The company’s ability to sustain that valuation depends on whether Anthropic’s growth trajectory holds for seven years.
Leighton said there is more coming. The company’s history suggests patience. Akamai survived the dot-com crash, navigated the commoditisation of its original business, and spent a decade building a cybersecurity franchise before the market rewarded it. The AI cloud deal is the latest reinvention of a company that has been reinventing itself since 1998. The difference is that this time, the reinvention depends on one customer’s continued appetite for compute, and on the assumption that the demand for AI inference at the edge will grow as fast as the demand for AI itself.
For the last 24 months, one narrative justified every over-provisioned data center and bloated IT budget: the GPU scramble. Silicon was the new oil, and H100s traded like contraband. Reserve capacity now or your enterprise would be left behind.
The bill is now due, and the CFO is paying attention. Gartner estimates AI infrastructure is adding $401 billion in new spending this year. Real-world audits tell a darker story: average GPU utilization in the enterprise is stuck at 5%.
That utilization floor is driven by a self-reinforcing procurement loop that makes idle GPUs nearly impossible to release. What makes this shift more urgent is the CapEx reality now hitting enterprise balance sheets. Many organizations locked in GPU capacity under traditional three- to five-year depreciation cycles, with the hyperscalers being at five years. That means the infrastructure purchased during the peak of the “GPU scramble” is now a fixed cost, regardless of how much it is actually used.
As those assets age, the question is no longer whether the investment was justified. It’s whether it can be made productive. Underutilized GPUs are not just idle resources, they are depreciating assets that must now generate measurable return. This is forcing a shift in mindset: from acquiring capacity to maximizing the economic output of what is already deployed.
Advertisement
The scramble was a sideshow
For the “Tier 1” enterprise — the Intuits, Mastercards, and Pfizers of the world — access was rarely the true bottleneck. Leveraging deep-pocketed relationships with AWS, Azure, and GCP, these organizations secured capacity reservations that sat idle while internal teams struggled with data gravity, governance, and architectural immaturity.
The industry narrative of “scarcity” served as a convenient smokescreen for this inefficiency. While the headlines focused on supply chain delays, the internal reality was a massive productivity gap. Organizations were activity-rich (buying chips) but output-poor (generating near-zero useful tokens).
At 5% utilization, the math simply doesn’t work. For every dollar spent on silicon, 95 cents is essentially a donation to a cloud provider’s bottom line. In any other department, a 95% waste metric would be a firing offense; in AI infrastructure, it was just called “preparedness.”
The Q1 tracker: A market in pivot
VentureBeat’s Q1 2026 AI Infrastructure & Compute Market Tracker confirms that the panic phase has officially broken. The tracker is directional rather than statistically definitive — January surveyed 53 qualified respondents, and in February there were 39 — but the pattern across both waves is consistent. When we asked IT decision-makers what actually drives their provider choices today, the results show a market in rapid pivot:
Advertisement
The access collapse: “Access to GPUs/availability” factor dropped from 20.8% to 15.4% in a single quarter — from primary concern to secondary in 90 days.
The pragmatic pivot: “Integration with existing cloud and data stacks” held steady as the top priority at roughly 43% across both waves, while security and compliance requirements surged from 41.5% to 48.7% — nearly closing the gap with integration.
The TCO mandate: “Cost per inference/TCO (total cost of ownership)” as a top priority jumped from 34% to 41% in a single quarter, overtaking performance as the dominant procurement lens.
The era of the blank check is dead. Inference is where AI becomes a line item.
Training and even fine-tuning were a tactical project; inference is a strategic business model. For most enterprises, the unit economics of that model are currently unsustainable. During the initial pilot phase, flat-fee licenses and bundled token deals allowed for architectural waste. Teams built long-context agents and complex retrieval pipelines because tokens were effectively a sunk cost.
As the industry moves toward usage-based pricing in 2026, those same architectures have become liabilities. When metered billing is applied to an infrastructure stack that sits idle 95% of the time, the cost per useful token becomes a line-item emergency the moment a project moves into production.
From activity to productivity
The shift highlighted in our Q1 data represents more than just a budget correction; it is a fundamental change in how the success of an AI leader is measured.
For the last two years, success was about “securing” the stack. In the efficiency era, success is “squeezing” the stack. This is why cost optimization platforms saw the largest planned budget increase in our survey, becoming a top-tier priority as organizations realize that buying more GPUs is often the wrong answer.
Advertisement
Increasingly IT users are asking how to stop paying for GPUs they aren’t using. They are moving away from measuring GPU activity (how many chips are powered on) and toward GPU productivity (how many useful tokens are generated per dollar spent).
The luxury of underutilization is now a liability. The next act of the enterprise AI play is more about finding a way to make the silicon you already have pay for itself.
Owning the mint: The choice between token consumer and producer
As organizations move from proof-of-concept to production, the focus is shifting away from the latest GPU and toward the architecture of token generation. In this new economic reality, every enterprise must decide its role in the token economy: will you be a token consumer, paying a permanent tax to a model provider, or a token producer, owning the infrastructure and the unit economics that come with it?
This choice is not just about cost; it is about how an organization decides to handle complexity. Owning inference infrastructure means overcoming KV cache persistence, understanding the storage architecture, knowing what are tolerable latency guarantees, and addressing power constraints. It also introduces real-world enterprise limitations, power availability, data center footprint, and operational complexity, that directly impact how far and how fast AI can scale.
Advertisement
At the core of this challenge is KV cache economics. Storing context in GPU memory delivers performance but comes at a premium, limiting concurrency and driving up cost per token. Offloading KV cache to shared NVMe-based storage can improve reuse and reduce prefill overhead, but introduces tradeoffs in latency and system design. As NVMe costs rise and GPU memory remains scarce, organizations are forced to balance performance against efficiency.
For a token producer, managing these tradeoffs, across memory, storage, power, and operations, is simply the cost of doing business at scale. For others, the overhead remains too high, requiring a different path.
The specialized cloud pivot
VentureBeat’s Q1 tracker shows that the market is already voting on this strategy. The top strategic direction for enterprises is now to move more workloads to specialized AI clouds, a category that grew from 30.2% to 35.9% in our latest survey.
These providers — including Coreweave, Lambda, and Crusoe — are evolving. While they initially gained ground by serving model builders and training-heavy workloads, their revenue mix is changing rapidly. Today, training represents roughly 70% of their business volume, but inference customers now make up 30%. We expect that ratio to flip by the end of 2026 as the long tail of enterprise inference begins to scale.
Advertisement
These specialized providers are gaining strategic attention because they are not just selling GPU access. They are selling the removal of infrastructure friction. They optimize the full stack — storage, networking, and scheduling — around inference-first economics rather than general-purpose cloud operations. For an organization aiming to be a token producer, these environments offer a more efficient factory floor than traditional hyperscalers.
The rise of managed inference
For organizations that realize they cannot efficiently build or manage their own inference factories, a different trend is emerging. Our survey found that the intention to evaluate inference outsourcing and managed LLM providers jumped from 13.2% to 23.1% in a single quarter.
This nearly 10-percentage-point increase represents a realization that building inference infrastructure internally often creates hidden costs. Providers like Baseten, Anyscale, FireworksAI, and Together AI offer predictable pricing and service-level agreements without requiring the customer to become experts in vLLM tuning or distributed GPU scheduling.
In this model, the enterprise remains a token consumer, but one that is actively looking to price away the complexity of the stack. They are learning that managing inference internally is only viable if they have the volume to justify the operational burden.
Advertisement
Simplifying the hybrid stack
The choice to be a producer is also being made easier by a new layer of hybrid-cloud AI platforms. Solutions from Red Hat, Nutanix, and Broadcom are designed to operationalize open-source inference infrastructure without forcing every company to become a systems integrator.
The challenge is that modern inference depends on complex open-source components like vLLM, Triton, and Kubernetes. These systems rely on a rapidly evolving stack, with vLLM for high-throughput serving, Triton for model orchestration, and Ray for distributed execution, each powerful on its own, but complex to integrate, tune, and operate at scale. For most enterprises, the challenge isn’t access to these tools, it’s stitching them together into a reliable, production-grade inference pipeline. The promise of these newer platforms is portability: the ability to build an inference stack once and deploy it anywhere, whether in a hyperscaler, a specialized cloud, or an on-premises data center.
Our Q1 2026 AI Infrastructure & Compute Market Tracker confirms that interest in these DIY-but-managed stacks is growing, jumping from 11.3% in January to 17.9% in February, alongside provider adoption, with a steady rise in organizations leaning into open source. This flexibility matters because enterprise AI will not be centralized in one place. Inference workloads will be distributed based on where data lives, how sensitive it is, and where the cost of running it is lowest.
The winner in the next phase of the token economy will not be the platform that forces standardization through restriction. It will be the one that delivers standardization through portability, allowing enterprises to switch between being consumers and producers as their needs evolve.
Advertisement
The architecture of efficiency: The technical levers of productivity
Fixing the 5% utilization wall requires more than just better software; it requires a structural overhaul of the efficiency stack. Many organizations are discovering that high activity is not the same as high productivity. A cluster can run at full tilt but remain economically inefficient if time-to-first-token is too high or if inference requests spend too much time in prefill.
Inference economics are determined by how much useful output a cluster generates per unit of cost. This requires a shift from measuring GPU activity — simply having the chips powered on — to measuring GPU productivity. Achieving that productivity depends on three technical levers: the network, the memory, and the storage stack.
Networking: The cost of waiting
The network is the often-ignored backbone of inference economics. In a distributed environment, the speed at which data moves between compute nodes and storage determines whether a GPU is actually working or merely waiting.
RDMA (Remote Direct Memory Access) has become the non-negotiable standard for this move. By allowing data to bypass the CPU and move directly between memory and the GPU, RDMA eliminates the latency spikes that traditional network architectures introduce. In practical terms, an RDMA-enabled architecture can increase the output per GPU by a factor of ten for concurrent workloads.
Without this level of networking, an enterprise is effectively paying a “waiting tax” on every chip in the rack. As model context windows expand and multi-node orchestration becomes the norm, the network determines whether a cluster is a high-speed factory or a bottlenecked warehouse.
Advertisement
Solving the memory tax: Shared KV cache
As models become larger and context windows expand toward the millions of tokens, the cost of repeatedly rebuilding the prompt state has become unsustainable. Large language models rely on key-value (KV) caches to maintain context during a session. Traditionally, these are stored in local GPU memory, which is both expensive and limited.
This creates a “memory tax” that crushes unit economics as concurrency rises. To solve this, the industry is moving toward persistent shared KV cache architectures. By storing the cache centrally on high-performance storage rather than redundantly across multiple GPU nodes, organizations can reduce prefill overhead and improve context reuse.
Newer architectures are already proving this out. The VAST Data AI Operating System, running on VAST C-nodes using Nvidia BlueField-4 DPUs, allows for pod-scale shared KV cache that collapses legacy storage tiers. Similarly, the HPE Alletra Storage MP X10000 — the first object-based platform to achieve Nvidia-Certified Storage validation — is designed specifically to feed data to inference resources without the coordination tax that causes bottlenecks at scale. WEKA is another provider in this space.
The compression edge
Beyond the physical hardware, new algorithmic contributions are redefining what is possible in inference memory. Google’s recent presentation of TurboQuant at ICLR 2026 demonstrates the scale of this shift. TurboQuant provides up to a 6x compression level for the KV cache with zero accuracy loss.
Advertisement
Techniques like these allow for building large vector indices with minimal memory footprints and near-zero preprocessing time. For the enterprise, this means more concurrent users on the same hardware estate without the “rebuild storms” that typically cause latency spikes. The caveat: compression standards remain contested — no open-source consensus has emerged, and the space is shaping up as a proprietary stack war between Google and Nvidia.
Storage as a financial decision
Storage is no longer just a backend decision; it is a financial one. Platforms like Dell PowerScale are now delivering up to 19x faster time-to-first-token compared to traditional approaches, according to Dell. By separating high-performance shared storage and memory-intensive data access from scarce GPU resources, these platforms allow inference to scale more efficiently.
When a storage layer can keep GPU-intensive workloads continuously fed with data, it prevents expensive resources from sitting idle. In the efficiency era, the goal is to drive the 5% utilization wall upward by ensuring that every cycle is spent on token generation, not on data movement.
But as the stack becomes more efficient, the perimeter becomes more porous. High-productivity tokens are worthless if the data powering them cannot be trusted.
Advertisement
Sovereignty and the agentic future: Building the trust foundation
The final barrier to achieving return on AI is not a technical bottleneck, but a trust bottleneck. As enterprise AI shifts from simple chatbots to autonomous agents, the risk profile changes. Agents require deep access to internal systems and intellectual property to be useful. Without a sovereign architecture, that access creates a liability that most organizations are not equipped to manage.
VentureBeat research into the state of AI governance reveals a stark disconnect. While many organizations believe they have secured their AI environments, 72% of enterprises admit they do not have the level of control and security they think they do. This governance mirage is particularly dangerous as agentic systems move into production. In the last 12 months, 88% of executives reported security incidents related to AI agents.
Sovereignty as an architecture principle
Data sovereignty is often treated as a geographic or regulatory checkbox. For the strategic enterprise, it must be treated as a core architecture principle. It is about maintaining control, lineage, and explainability over the data that powers an agentic workflow.
This requires a new approach to data maturity, modeled on the traditional medallion architecture. In this framework, data moves through layers of usability and trust — from raw ingestion at the bronze level to refined gold and, eventually, platinum-quality operational data. AI inference must follow this same discipline.
Advertisement
Agentic systems do not just need available context; they need trusted context. Providing the wrong data to an agent, or exposing sensitive intellectual property to a non-sovereign endpoint, creates both business and regulatory risk. Compartmentalization must be designed into the stack from the start. Organizations need to know which models and agents can access specific data layers, under what conditions, and with what lineage attached.
Bringing the AI to the data
The fundamental question for the agentic future is whether to bring the data to the AI or the AI to the data. For highly sensitive workloads, moving data to a centralized model endpoint is often the wrong answer.
The move toward private AI — where inference happens closer to where trusted data resides — is gaining momentum. This architecture uses sovereign clouds, private environments, or governed enterprise platforms to keep the data perimeter intact.
This is where the choice to be a token producer becomes a security advantage. By owning the inference stack, an enterprise can enforce governance and lineage at the infrastructure layer. It ensures that the intellectual property used to ground an agent never leaves the organization’s control.
Advertisement
The next platform war
The battle for AI dominance will not be decided by who owns the largest GPU clusters. It will be won by the companies with the best inference economics and the most trusted data foundation.
The organizations that win the efficiency era will be those that deliver the lowest cost per useful token and the fastest path to production. They will be the ones that have moved past the hoarding hangover to focus on productive output.
Achieving return on AI requires a shift in mindset. It means moving from a culture of securing the stack to a culture of squeezing the stack. It requires architectural rigor, a focus on token-level ROI and a commitment to sovereignty. When an organization can generate its own tokens efficiently and securely, AI moves from a science project to an economically repeatable business advantage.
That is how ROI becomes real. That is where the next generation of enterprise advantage will be built.
Advertisement
Rob Strechay is a Contributing VentureBeat analyst and principal at Smuget Consulting, a research and advisory firm focused on data infrastructure and AI systems.
Disclosure: Smuget Consulting engages or has engaged in research, consulting, and advisory services with many technology companies, which can include those mentioned in this article. Analysis and opinions expressed herein are specific to the analyst individually, and data and other information that might have been provided for validation, not those of VentureBeat as a whole.
All 20 of America’s state-run healthcare marketplace sites “include advertising trackers that share information with Big Tech companies,” reports Gizmodo, citing a report from Bloomberg:
Per the report, seven million Americans bought their health insurance through state exchanges in 2026, and many of them may have had personal information shared with companies, including Meta, TikTok, Snap, Google, Nextdoor, and LinkedIn, among others. Some of the data collected and shared with those companies included ZIP codes, a person’s sex and citizenship status, and race.
In addition to potentially sensitive biographical details about a person, the trackers also may reveal additional details about their life based on the sites they visit. For instance, Bloomberg found trackers on Medicaid-related web pages in Rhode Island, which could reveal information about a person’s financial status and need for assistance. In Maryland, a Spanish-language page titled “Good News for Noncitizen Pregnant Marylanders” and a page designed to help DACA recipients navigate their healthcare options were found to be transmitting data to Big Tech firms…
Per Bloomberg, several states have already removed some trackers from their exchange websites following the report.
Thanks to Slashdot reader JoeyRox for sharing the news.
Honeywell-backed Quantinuum filed for a US IPO targeting a valuation above 20 billion dollars. The quantum computing company reported 30.9 million dollars in annual revenue and 192.6 million in losses, pricing itself on a fault-tolerant machine planned for 2029.
Advertisement
Quantinuum filed for a US initial public offering on Thursday that could value the company at more than 20 billion dollars. In the year ended 31 December 2025, Quantinuum reported revenue of 30.9 million dollars and a net loss of 192.6 million dollars. The company is asking public market investors to pay a premium of more than 600 times revenue for a quantum computer that does not yet exist in its final form. The computer it is building, a universal fault-tolerant machine called Apollo, is scheduled for 2029.
The filing is significant not because of Quantinuum’s current financials, which are modest by any standard, but because of what the IPO market’s appetite for it will reveal about how investors price a technology that has been five to ten years away from commercial utility for the past twenty years. Quantinuum is backed by Honeywell, which owns 54 per cent of the company. JPMorgan and Morgan Stanley are leading the offering. The ticker will be QNT on the Nasdaq Global Select Market.
The company
Quantinuum was formed in 2021 from the merger of Honeywell Quantum Solutions and Cambridge Quantum Computing. It builds quantum computers based on trapped-ion architecture, a technology in which individual atoms are suspended in electromagnetic fields and manipulated with lasers to perform calculations. The company claims the highest average two-qubit gate fidelity in the industry as of December 2025, a measure of how accurately the machine performs the basic operations of quantum computation.
Its customers include BMW, Airbus, JPMorgan Chase, HSBC, Mitsui, and Thales. BMW expanded its multi-year partnership with Quantinuum in May 2026 to apply quantum computing to catalyst chemistry research for fuel cells. Airbus is exploring quantum simulation for hydrogen-powered aircraft. JPMorgan has been working with Quantinuum since 2020 and is one of the most active corporate users of its software development kit.
These are research partnerships, not production deployments. No company is running quantum computing in production at a scale that affects its bottom line. The partnerships exist because the companies believe quantum computing will eventually transform their industries and want to be ready when it does. The word “eventually” carries all the risk.
Advertisement
The numbers
Quantinuum’s 2025 revenue of 30.9 million dollars represented 34 per cent growth over the prior year’s 23 million dollars. The net loss of 192.6 million dollars represented 34 per cent growth over the prior year’s 144.1 million dollars. Revenue and losses grew at exactly the same rate.
The first quarter of 2026 was worse. Revenue fell to 5.2 million dollars from 19.1 million dollars in the same quarter a year earlier. The net loss expanded to 136.6 million dollars from 30.5 million dollars. The quarterly numbers suggest that revenue is lumpy and dependent on the timing of contract milestones, a pattern common in pre-commercial deep technology companies.
The target valuation of more than 20 billion dollars would represent a doubling from the 10 billion dollar pre-money valuation at which Quantinuum raised 600 million dollars in September 2025. Before that, it raised 300 million dollars in January 2024 at a 5 billion dollar valuation. The valuation has quadrupled in two years while the company’s revenue has grown from 23 million to 31 million dollars.
The roadmap
Quantinuum’s hardware roadmap has four generations. The current system, Helios, is commercially available. Sol is planned for 2027. Apollo, the system that the company describes as universal and fully fault-tolerant, is planned for 2029. A fault-tolerant quantum computer is one that can perform complex calculations with enough error correction to produce reliable results, the threshold at which quantum computing transitions from a research tool to a commercial platform.
Advertisement
Riverlane raised 75 million dollars to build chips that solve quantum error correction, targeting one million error-free operations by 2026. Error correction is the central engineering challenge of the field. Without it, quantum computers produce results that are too noisy to be useful for the complex simulations that justify the technology’s theoretical advantages. Quantinuum’s Apollo is designed to solve this problem at the system level. Whether it will, and whether 2029 is achievable, are the questions on which the IPO valuation rests.
Quantinuum would join a small cohort of publicly traded quantum computing companies. IonQ, which uses the same trapped-ion technology, went public via SPAC in 2021 and is the only pure-play quantum stock with positive returns in 2026, up 16 per cent year to date after posting more than 100 million dollars in annual revenue. Rigetti Computing, which uses superconducting qubits, is down 10 per cent. D-Wave Quantum is down 9 per cent.
IQM has built 30 full-stack quantum computers from its facility in Finland and announced a 1.8 billion dollar SPAC merger to list on the NYSE. The quantum computing sector is pre-profit and largely sentiment-driven, with stock prices moving on milestone announcements, government contracts, and capital raises rather than fundamentals. Quantinuum’s IPO would be the largest quantum computing listing to date and would set a valuation benchmark for the entire sector.
Advertisement
The risk is that the benchmark reflects the market’s enthusiasm for a technology whose commercial timeline remains uncertain. Industry experts surveyed in 2025 said quantum utility is at most ten years away, a timeline that has not changed meaningfully in a decade. Google’s chief executive said five to ten years. NVIDIA’s chief executive said at least fifteen.
The bet
Honeywell’s decision to take Quantinuum public is part of a broader restructuring that includes the spin-off of its aerospace division and the separation of its advanced materials business. The IPO gives Quantinuum access to public capital markets and gives Honeywell a path to gradually reduce its 54 per cent stake. The 600 million dollar raise in September 2025 was led by investors including JPMorgan, which is now also leading the IPO underwriting, a dual role that reflects the degree to which the investment banking community’s interests are aligned with the offering’s success.
Quantinuum’s filing is a bet that public market investors will value a quantum computing company the way private markets have: on the promise of a technology that does not yet work at scale, priced against a future in which it does. The 30.9 million dollars in revenue is not the product. The product is Apollo, a machine that is three years and several fundamental engineering breakthroughs away. The IPO is a wager that the market will pay 20 billion dollars for the right to wait.
You must be logged in to post a comment Login