Connect with us
DAPA Banner
DAPA Coin
DAPA
COIN PAYMENT ASSET
PRIVACY · BLOCKDAG · HOMOMORPHIC ENCRYPTION · RUST
ElGamal Encrypted MINE DAPA
🚫 GENESIS SOLD OUT
DAPAPAY COMING

Tech

Why Thermodynamics Rules Future Orbital Data Centers

Published

on

“Space computing, the final frontier, has arrived,” Nvidia CEO Jensen Huang declared at the Nvidia GTC conference in March.

Indeed, the idea of data centers in orbit has gone from science fiction to a serious spending category. Elon Musk’s SpaceX has acquired xAI (also Musk’s) and is planning a constellation of space-based data centers. Google, not to be outdone, announced Project Suncatcher in partnership with Planet, planning to launch two satellites equipped with Google Tensor Processing Unit (TPU) AI chips by early 2027. Startup Starcloud has already filed a proposal with the Federal Communications Commission for an 88,000-satellite constellation for orbital data centers. As Starcloud’s filing suggests, these companies are all proposing fleets of satellites numbering in the thousands, each housing a rack or multiple racks of AI-grade GPUs, interconnected with each other through free-space optical links and communicating back to Earth via microwave links, either directly or through other satellites.

Proponents tout the many wonders of computing in space: abundant solar energy, free cooling, and freedom from Earth-based disturbances like earthquakes, floods, and protesters. But a sober look at the physics of space-based computing paints a much more nuanced picture.

Free cooling is perhaps the biggest misconception. Space is cold, but it also has no atmosphere. That means the best heat-removal mechanisms, conduction and convection, are off the table. The only option is radiation. To prevent a chip from overheating in space, a large, costly surface area is required to dissipate the energy and then radiate it.

Advertisement

Solar energy is abundant, but collecting it with functional solar panels that maintain perfect alignment toward the sun is a complex task requiring extensive attitude control systems. On top of that, ionizing radiation in space from cosmic rays and other sources poses a unique challenge, degrading the solar panels, the radiative coolers, and the chips themselves. Because regular maintenance in space is difficult, redundancy has to be built in at launch, and cost estimates have to account for efficiency degradation over time.

At ABI Research, where I work as an aerospace analyst, we did a rough total-cost-of-ownership comparison between a data center on Earth and one in space. It showed that the cost to launch and run a GPU in space for a year is at least an order of magnitude higher than the same feat in a terrestrial data center. Our model was simple, assuming an Nvidia H100 server rack launched with the requisite-size solar panel and radiator on a spacecraft akin to Starcloud’s pilot launch. We assumed SpaceX’s Starship was used at a highly optimistic launch cost per kilogram of US $44, and a terrestrial energy cost of $0.20 per kilowatt hour. This is a simple back-of-the-envelope calculation, but it does signal something real.

From our perspective, the cost of delivery and space hardening of the payload makes general-purpose space-based data centers difficult to justify economically today, despite the fact that data-center builders in many regions are scrambling for electric power. However, there are niche applications where the much higher costs of computing in space could be justified. Examples include preprocessing data from Earth-observation satellites, real-time detection and tracking of hypersonic missiles, and active collision avoidance in the increasingly crowded low Earth orbit. Even for these, though, contending with fundamental physics will still be a demanding challenge. And a technologically compelling one, too.

The Cooling Challenge in Space

Cooling is where physics separates the science from the fiction. The governing equation for radiative cooling, the only type of cooling available in space, is known as the Stefan-Boltzmann Law. It states that the amount of power you can radiate is proportional to the area of the radiator times its temperature to the fourth power. For a space systems architect, the implications of this law are brutal. In orbit, the only variable we can control is area. This restriction creates a geometric penalty, or a “physics tax,” for cooling in space: The more power you need to reject, the bigger the area of the radiator you need to bring along from Earth.

Advertisement
chart visualization

The only cooling method available in space is radiation, and the radiator area required is derived using the Stephan-Boltzmann law. For a single chip drawing 700 watts, like Nvidia’s popular H100 GPU, the area required to keep it at 20 °C is just under 3 square meters, and it goes down to 1 square meter for an operating temperature of 85 °C. However, as the radiator surface is exposed to ionizing radiation, its emissivity decreases, and after 5 years in space the required area increases by about 40 percent.

To understand how big this baseline area is in practice, I used the Stefan-Boltzmann law to model the heat-rejection area needed to keep a single chip that draws 700 watts of power—such as the H100 GPU chip, an AI stalwart—at a constant 60 °C, usually considered the sweet spot for GPU longevity and stability. I further assumed that the radiator is perfectly facing deep space, at a chilly background temperature of 3 kelvins. By this calculation, a single chip would require 1.4 square meters of radiator surface.

To put this into perspective, consider that a common AI rack can hold approximately 32 GPUs (four H100 server boards). With CPUs, memory, and networking equipment, this rack would draw around 40 kilowatts of power. This single rack includes 2.5 terabytes of memory—enough capacity to serve over 20,000 concurrent users or run 16 simultaneous instances of Llama 3, an open-source AI model. But to cool this thermal load in a vacuum, that single rack would require an 80-square-meter radiator, roughly the size of a pickleball court. For an aggregate 100-megawatt data center, you’d need at least 2,500 of those radiators.

And that’s the best-case scenario. Additional problems are hidden in the low Earth orbit environment itself. Space exposes radiators and their coatings to a chemically hostile brew of ultraviolet light and atomic oxygen, quite the opposite of a clean-room environment. Over a LEO satellite’s typical 5-year lifespan, these elements degrade the radiator’s surface properties and lower its ability to shed heat.

Including this degradation in the model reveals that as the radiator degrades from a “fresh” state to an “end-of-life” state, the physics demands a further penalty. To maintain that same 60 °C operating temperature for the GPU chips, the required surface area jumps from about 1.4 square meters per chip to nearly 2.0 square meters. In other words, the physics tax rises by 40 percent. Therefore, you must launch at least 40 percent more radiator mass, endure higher atmospheric drag, and sacrifice valuable launch volume just to survive the degradation of the thermal coating. This increase adds significantly to the launch cost and further erodes the economics of a space-based data center.

Advertisement

The Silicon Challenge in Space

Solving the heat problem is only part of the battle. The other significant challenge in low Earth orbit is ionizing radiation, which affects the computing hardware itself. Today’s satellites typically use radiation-hardened processors, which are very reliable but also much more expensive, and they perform poorly compared to commercial off-the-shelf processors.

A standard rad-hard chip doesn’t have the processing power to run a modern large language model (LLM). As a result, satellite operators aspiring to launch a data center have no choice but to make a risky compromise: to use hardware meant for terrestrial use. In order to achieve the necessary compute density, orbital data centers must use the same Nvidia H100s or Google TPUs found in terrestrial server farms. The problem is that these chips are “soft” targets in space. High-energy particles can flip bits in memory or cause “latch-ups” in logic that fry the circuit.

One possible option is to shield the computers from radiation with thick, absorbent panels. However, the shielding would add significantly to the already heavy satellites. The other option is to compensate for the radiation damage with redundancy. Indeed, edge computing architects are moving toward software-defined resilience, where instead of one perfectly hardened computer, operators fly a cluster of imperfect, commercial ones whose total cost could be as low as one-tenth to one-hundredth that of the rad-hard model.

This redundant approach is used in many spacecraft, including Artemis II, which recently carried astronauts around the moon, as well as SpaceX’s flight computers and the Hewlett Packard Enterprise edge servers for the International Space Station. By running three (or more) instances of the same calculation on three different nodes and comparing the answers, the system can detect a corrupted processor. If a node fails, the “orchestrator” reboots it while the others continue the mission. While this ensures resiliency, it also means that some fraction of the compute capacity is dedicated to redundancy, further increasing the costs.

Advertisement

The Energy Challenge in Space

An often-touted advantage of space-based data centers is the seemingly unlimited supply of free, clean energy from the sun. Solar energy in orbit is indeed abundant, at 1,361 watts per square meter. Of course, capturing that free energy is made possible only by the very costly launching of large solar panels into orbit. And those solar panels also degrade over time due to radiation exposure, typically losing 1 to 3 percent efficiency per year.

Let’s say a solar array collects 1 MW of power to run an AI cluster. The laws of physics demand that the satellite must eventually radiate 1 MW of waste heat. Because the square area needed to generate the solar power—around 400 W/m2—and to reject the heat—around 450 W/m2—are nearly equivalent, every square meter of power generation now demands approximately another square meter of cooling. The radiator needs to be a structural equal, not merely a passive coating on a surface used for something else.

As Elon Musk recently noted in Davos, the most efficient radiator is one that never sees the sun. By orienting the spacecraft so the solar panels face the sun and the radiators face the deep vacuum of space, efficiency skyrockets for both. But there’s a catch: Maintaining this perfect three-way alignment—panels to sun, radiator to the void, antennas to Earth—requires complex, high-torque attitude control systems. So this configuration means more payload and more computing power. Plus, these control systems are complex components with many failure modes, which is not optimal in a situation where maintenance is difficult.

The Killer Apps for Computing in Space

Given all these challenges of deploying massive radiators for satellites in the hostile environment of space, why build data centers in space at all?

Advertisement

While training or inference on LLMs in space doesn’t seem economical today, there are other, very compelling applications for computing in space. Here are two: solving the downlink bottleneck from Earth-observation satellites and enabling collision-preventing maneuvers in the increasingly crowded low Earth orbit.

The latest Earth-observation satellites, equipped with hyperspectral and synthetic aperture radar sensors, are used for a range of important reconnaissance missions, such as battlefield intelligence, tracking the global shadow fleet of ships carrying contraband, and assessing earthquakes or infrastructure failures down to the millimeter. These systems can generate hundreds of terabytes of raw data per day that must be transmitted to Earth. However, the radio-frequency “pipes” used to downlink the data are congested, and the ground infrastructure cannot absorb the sheer volume of raw data.

Another immediate, mission-critical application for in-space computation is protecting the orbital environment. With over 17,000 satellites in orbit, the overwhelming majority of which are in low Earth orbit, avoiding collisions between these satellites is crucial. As NASA astrophysicist Donald Kessler pointed out back in 1978, a single space collision could cause a cascading effect that renders the entirety of LEO unusable.

Advertisement

According to SpaceX’s recent annual report, the Starlink constellation executes a collision avoidance maneuver every 2 minutes on average. Each maneuver already relies on onboard AI systems but still requires most of the processing to happen on the ground.

A rendering of the Starlink satellite system depicted as bright dots surrounding the Earth.

SpaceX’s Starlink system currently has over 10,000 satellites in low Earth orbit, each depicted here as a colored dot.

Satellitemap.space

Advertisement

As low Earth orbit gets increasingly populated, collision avoidance will have to break the traditional ground-loop model. In the megaconstellation era of space, the OODA (observe, orient, decide, act) loop must happen onboard, thereby reducing the analysis turnaround from minutes to milliseconds.

The problem is that the flight computers standard on satellites are not built for this level of processing. The complex probability models required for maneuvering cannot currently be implemented by onboard computers in conjunction with their navigation systems. Clearly, more powerful computers are needed.

This is the true economic justification for moving compute to space: to move insight generation there. By placing high-performance computing adjacent to the sensors, we can process terabytes of data in orbit and downlink only the relevant data in real time, and we can do the computations necessary to avoid satellite collisions in real time.

The Future of Computing in Space

So, assuming that some form of computing is inevitable in low Earth orbit in the foreseeable future, how will the heat be handled? The industry is currently experimenting with two main classes of solutions to cope with the Stefan-Boltzmann law.

Advertisement

One creative option is to use origami-inspired radiators, the kind used for the James Webb telescope. Companies are developing flexible, high-conductivity composite radiators that fold into a tight cube for launch and unfurl into enormous yet lightweight thermal wings in orbit.

Another possibility is to use liquid-droplet radiators. This concept proposes removing the rigid radiator structure completely and instead spraying a stream of coolant oil directly into the vacuum of space. The fluid travels through an open loop, exposed to the near-absolute zero of the void, maximizing radiative surface area before being caught by a collector and pumped back into the ship. It sounds like science fiction, but as the heat loads climb into the megawatts, liquid-droplet cooling may be the only way to cheat the mass limits of this exponential reality.

Our rough total-cost-of-ownership model uses optimistic versions of current numbers, such as launch cost, chip cost, and power use. A critic might point out that future technology will improve, both in efficiency, purpose-built designs, and costs.

Sure, the technology is bound to improve. But the critical factor isn’t just launch cost; it’s the computing power per unit mass and electric-power economics. Radiators and solar arrays can consume 65 to 70 percent of total satellite mass, and space-grade photovoltaics run orders of magnitude more expensive than terrestrial equivalents.

Advertisement

Spiral polygonal grid resembling a twisted spiderweb on a light background Chris Philpot

Even as launch costs fall, the mass and cost burden of power generation and thermal management will remain a fundamental problem.

Current space-grade solar panels rely on germanium substrates, whose supply is concentrated in China. It will be extremely difficult to scale up availability of these substrates. A transition to radiation-tolerant perovskite solar panels or a similar alternative could change the economics significantly, but that possibility is five years away or more. The technology will get cheaper, but the bottlenecks of power and thermal architecture will remain.

Recognizing the thermal reality of cooling in space forces us to shift how we view satellite operations. We are moving away from the “launch and forget” era toward an era of “autonomous logistics.” As our thermal model demonstrated, the harsh environment of space steadily attacks the hardware. UV radiation degrades thermal coatings; cosmic rays degrade silicon. In a traditional satellite model, when the radiator degrades or the memory fails, the satellite becomes space junk. For a multimillion-dollar data center, that disposal model is potentially ruinous.

To make the economics of orbital computation work, the infrastructure must be serviceable and the rockets to launch them reusable. The orbital domain will require automated servicing vehicles capable of swapping out degraded radiator panels and upgrading fried servers. In these ways, the future of the orbital data centers is dependent on the innovations of an emergent in-space economy.

Advertisement

There’s a good argument to be made that the need for space-based computation is less of a hype cycle and more of an enabler for the new space economy. Look no further than SpaceX’s recent regulatory filings proposing a constellation of up to a million satellites in low Earth orbit. At such a scale, routing all raw data back to Earth is physically impossible; the network itself must become the data center.

However, the winners in this sector will be determined by the systems architects who most cleverly accommodate the thermodynamics and the companies with sufficient vertical integration to take on the massive costs of operating data centers in orbit. Ultimately, the physics tax is universal. Whether managing heat rejection in the vacuum of low Earth orbit or managing power density in a hyperscale facility in Northern Virginia, the constraint is never the silicon. It’s the thermodynamics.

From Your Site Articles

Related Articles Around the Web

Advertisement

Source link

Continue Reading
Click to comment

You must be logged in to post a comment Login

Leave a Reply

Tech

Daily Deal: The Complete Photoshop Master Class Bundle

Published

on

from the good-deals-on-cool-stuff dept

It’s no secret that Photoshop can be a bit dense when you’re first getting your feet wet with it. That’s why it pays to have a expert instructors show you the ropes. Led by a Photoshop pro, the Complete Photoshop Master Class Bundle will help you master Photoshop CC and become an expert—no prior experience is required! From layers and filters to levels and curves, you’ll come to grips with essential Photoshop concepts and refine your skills with the included working files. It’s on sale for $30.

Note: The Techdirt Deals Store is powered and curated by StackCommerce. A portion of all sales from Techdirt Deals helps support Techdirt. The products featured do not reflect endorsements by our editorial team.

Filed Under: daily deal

Source link

Advertisement
Continue Reading

Tech

I hope these 4 Galaxy S26 Ultra software features make their way to the Galaxy A57 and more affordable Samsung phones soon

Published

on

When I was doing all the testing for our Samsung Galaxy A57 review, I enjoyed how streamlined its software was compared to that of the best Samsung phones. But since publishing that review, I’ve been jumping back and forth between the A57 and another Samsung flagship, and I’ve got a more nuanced view.

Before the A57 (and, for a little while, after it), I was using the Samsung Galaxy S26 Ultra, which is pretty much the best Android phone money can buy. It has similar hardware specs to the Galaxy S25 Ultra, with its biggest advancements instead coming in the form of new software tools and features.

Source link

Continue Reading

Tech

Nightmare Eclipse drops claimed BitLocker bypass for Microsoft Windows

Published

on

Security

Another day, another Windows exploit code

Nightmare Eclipse, the prolific zero-day vulnerability hunter with an axe to grind against Microsoft, released yet another exploit late Wednesday that the researcher claims will spawn a command prompt that provides total access to the BitLocker volume.

This bug, called GreatXML, was “an accidental discovery,” according to the researcher, who said it only took four hours to find. They claim this exploit (published on GitHub and Git-based code-hosting platforms) can bypass BitLocker on any system that has ever run a Microsoft Defender Offline scan at any point in the past.

Advertisement

GreatXML comes just a day after Nightmare released exploit code for RoguePlanet, which allows local privilege escalation and leads to SYSTEM-level control over an affected machine. This brings the researcher’s zero-day count to eight. The earlier six – RedSun, UnDefend, BlueHammer, YellowKey, GreenPlasma, and MiniPlasma – all have patches as of this week’s Patch Tuesday event. 

Redmond on Wednesday told The Register that it is aware of RoguePlanet, and “actively investigating the validity and potential applicability of these claims.” The Windows giant didn’t immediately respond to our inquiries about GreatXML, including when it planned to issue a patch.

Microsoft has said none of the vulnerabilities were reported via its official channels prior to being made public. The company also banned Nightmare’s earlier GitHub account, and seemingly threatened legal action before dialing back its rhetoric after steep backlash from the security community.

Nightmare Eclipse, who some researchers suggest is an ex-Microsoft employee, harbors a very personal grudge against the Windows giant and its communications with bug hunters. They have promised to keep the zero-days coming, but waffle on the timing. 

Advertisement

Last month, the researcher pledged a big July 14 drop: “I will make sure your bones are shattered that day,” and then added, “nothing will be released this June (or maybe I will release smtg, depending on circumstances).”

On Tuesday, they changed course. “I will be unable to mass disclose zerodays in July 14th, RoguePlanet took way more time than expected and truly drained me. I might take a break but I can’t say for sure what I will be doing for next month, maybe it’s nothing, maybe it’s smtg.”

A day later, Nightmare released the “accidental” GreatXML BitLocker bypass. 

According to the researcher, the BitLocker bypass first requires copying “unattend.xml” and the “Recovery” directory to the root of the recovery partition. The next step is rebooting into WinRE by Shift-clicking Restart. “If everything was done correctly, a shell with unrestricted access to the bitlocker volume will spawn,” Nightmare wrote.

Advertisement

Also, if the scan hasn’t even been initiated on the Windows system, first you’d need to either log in and initiate it, or “figure out a way to boot into WinRE in offline scan state.”

Security sleuth Will Dormann followed Nightmare’s steps to reproduce GreatXML, and said the writeup seems “flawed.” In his testing, Dormann said the command prompt appeared the next time a Defender Offline scan ran.

“And in order to trigger a Microsoft Defender Offline scan, you both need to be logged in to Windows, and also have admin credentials,” he wrote on social media. “And if you’ve already got that level of access, you can just turn off bitlocker.”

“The writeup for GreatXML suggests that the prerequisite is that Windows Defender Offline has been executed at some point in the past,” Dormann added. “And that after planting two files in WinRE, all you need to do is [Shift]-reboot into WinRE, and Windows will automatically go into Microsoft Defender Offline scan mode. But this is not the case in any of the 3 lineages of Win11 that I have handy.” ®

Advertisement

Source link

Continue Reading

Tech

Why Google’s New AI-Saturated Search Page Will Be A Disaster

Published

on

from the the-end-of-ten-blue-links dept

Google didn’t invent full-text search of the Internet – that honor belongs to early pioneers such as WebCrawlerLycos and AltaVista. But for the last 25 years or so, Google has been synonymous with online searching, providing the quickest and most effective way to find things online (although its results may be getting worse.) More recently, it has been adding to its search engine more features based on generative AI, first with its AI Overviews in 2024, and then a year later with its AI Mode in Search. Now it has announced the latest stage in that evolution with what it calls “A new era for AI Search”:

It’s more intuitive than ever, dynamically expanding to give you space to describe exactly what you need. Designed to anticipate your intent, it also helps you formulate your question with AI-powered suggestions that go beyond autocomplete. And you can search across modalities, using text, images, files, videos or Chrome tabs as inputs.

This new incarnation effectively turns search into a chatbot:

You can easily ask a follow-up question right from an AI Overview, and flow into a conversational back and forth with AI Mode. Your context stays with you, and as you explore more deeply, the links and supporting articles get even more relevant. This seamless experience is live today across desktop and mobile, worldwide.

As the the screenshot of the new interface above shows, the traditional search result links that are currently placed under the AI Overview have now been confined to a small panel on the right-hand side of the screen, which shows a cut-down version of today’s list. Users are encouraged to ask follow-up questions from the AI search chatbot, rather than exploring the links themselves.

What this is likely to mean in practice is that even fewer people will follow links to sites, something that was already happening last year; instead, they will engage with Google’s chatbot to gather information indirectly. This is terrible news for access to knowledge because it frames the Google AI search engine as the fount of all knowledge – one that will do all the hard work of finding information and combining it into an easily digested answer that can be interrogated further. It can do that because it has already ingested billions of Web pages and other information sources as part of the Large Language Model (LLM) training process. But search engine users will no longer know what some of those sources are unless they painstakingly click on the links in the new panel.

Most people will not bother, because the AI-generated results will be good enough – or at least will appear to be good enough. Unless visitors to the site take the trouble to follow the links to the sources they won’t really know how reliable those results are. For example, it is possible that the sources are wrong, or misleading; moreover, Google’s LLM may itself introduce new errors and distortions. There is also the question of how Google will insert ads into this AI-generated information, and to what extent advertisers will be able to buy preferential treatment in results.

Advertisement

This new mediated approach is clearly terrible news for Wikipedia – an issue already discussed on Walled Culture earlier this year – and for creators. Google will use the information found in their works, but will not actively encourage people to visit the originals. For many people, summaries will be good enough, and they will never discover the greater riches of the sites and creations that Google’s LLM is based on. Worse still, the original creators such as Wikipedia may not even be mentioned in answers that involve aggregating information from a large number of sources.

Similarly, the new Google search is the publishing industry’s worst nightmare. Not only is Google drawing on material they have published, but it is pushing links to those sources into the background. It seems inevitable that the Web traffic to publishers will fall yet further, making already struggling business models based on advertising even more precarious. That will have knock-on consequences for the funding of many sites – particularly newspapers and magazines – and for the commissioning of work from journalists and other creative professionals. Users won’t even need to visit Google Search much in order to keep up-to-date with topics of interest thanks to Google Search’s new agentic capabilities that will do the work for them in advance:

With information agents, you can stay updated on whatever matters most to you. Your agent will intelligently look across everything on the web, like blogs, news sites and social posts, plus our freshest data, such as real-time info on finance, shopping and sports, to monitor for changes related to your specific question.

In this case, not only will people not visit sites, but the latter will be constantly bombarded by various AI bots seeking information on behalf of users – increasing site running costs, and making sites less usable by humans. Another key announcement from Google will lead to a further flood of agentic activities that will pose new challenges to businesses:

We’re also expanding agentic booking capabilities in Search to a wide range of new tasks, including local experiences and services. Just share your specific criteria — like finding a private karaoke room for six on a Friday night that serves food late — and Search brings together the latest pricing and availability with direct links to finish booking through the provider of your choice. And for select categories like home repair, beauty or pet care, you can ask Google to call businesses on your behalf.

What emerges from Google’s latest announcements is less of a search engine, and more of an immersive virtual environment that is designed to keep people engaging with Google’s services, asking them for information, advice and even delegating actions to them. There is no doubt that many users will find these new features attractive, not least because they can use “conversational voice features” in Gmail, Docs and elsewhere. These are the digital assistants that have been promised for many years, able to understand spoken commands, provide information verbally, and carry out complex operations on behalf of users without the need for any complex training. For many people, that will be a boon, and they will doubtless migrate from the traditional search page, which will still be the default – at least for now – to the latest AI-infused version.

Advertisement

But these impressive technical features come at a high price, even leaving aside issues such as the environmental impact of the huge server farms they require. With the latest incarnation of its search engine, Google is making the World Wide Web as we have known it for over 30 years invisible, and therefore increasingly irrelevant to most people, who will be happy to let Google become their universal user interface to everything. And yet Google still depends on the Internet to supply all the information it is analyzing and repackaging. It risks killing the very thing that sustains it.

There’s another, more subtle issue. The new Google search features make finding information and carrying out actions very easy in many ways. Leaving aside the problem that this will require people to trust what is in effect a huge black box, where the internal workings cannot be examined, with all the loss of control this implies, there is another danger. People who use Google’s powerful new AI search services to offload many of their day-to-day actions may gradually lose the ability to understand the world and to act within it without that constant help. Such a dependence may be great for Google and its advertisers, but it surely cannot be a good thing for the future of society.

Follow me @glynmoody on Mastodon and on Bluesky. Originally published to WalledCulture.

Filed Under: ai, links, open web, search

Companies: google

Advertisement

Source link

Continue Reading

Tech

Your robot can’t be smart, fast, and free. Evolution solved that already.

Published

on

Here is a constraint that almost no one building physical AI says out loud, even though every one of them is quietly fighting it.

A robot’s intelligence wants three things at once. It wants to be smart, meaning it can reason at the level of a frontier model about an unfamiliar scene. It wants to be fast, meaning it responds inside the tight, deterministic timing a physical control loop demands. And it wants to be free, meaning it keeps working when the network drops, the warehouse Wi-Fi dies, or the machine goes somewhere no signal reaches.

You cannot have all three on one piece of compute. Pick any two.

To be precise, bounded autonomy already works. Industrial arms, drones, and constrained autonomy stacks can be fast and offline because their tasks are narrow. The trilemma bites at the frontier: you cannot put frontier-scale general reasoning, deterministic real-time response, and full offline autonomy into the same power-limited substrate, not for the same control loop.

Advertisement

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol’ founder Boris, and some questionable AI art. It’s free, every week, in your inbox. Sign up now!

A frontier-scale model is smart, and if you stream its sensors to a datacenter it can even be fast, but now it is tethered to a network and no longer free. Shrink that model until it fits on a 15-watt embedded module and it becomes fast and free, but it is no longer frontier-smart. Run the big model in the cloud and query it only occasionally, and you get smart and free, but never fast. Three corners, two available at a time. I have come to think of this as the embodied trilemma, and it is the real reason the edge/cloud question is the hardest architecture decision in robotics. Most teams treat it as a deployment detail. It is closer to a law.

Why you can’t cheat the triangle

The trilemma is not a fashion or a temporary hardware limitation you can wait out. It falls directly out of physics and power budgets.

Advertisement

Frontier reasoning quality currently lives in models that want tens of gigabytes of memory and datacenter-class accelerators. That hardware does not run on a battery a mobile robot can carry. So “smart” forces a choice: either bring the datacenter to the robot through a network link, which sacrifices freedom, or accept a smaller onboard model, which sacrifices smartness.

Real-time control is even less negotiable. A wide-area network round trip adds 30 to 100 milliseconds of latency, and the variance matters more than the average. A control loop that is usually fast but occasionally stalls is worse than one that is reliably mediocre, because controllers are tuned for deterministic timing. The moment “fast” depends on a network, you have surrendered “free,” because the network is now inside your control loop whether you meant it to be or not.

So the triangle holds. Quantization, distillation, and better accelerators move the corners, but they do not collapse them. Anyone claiming otherwise is usually hiding which corner they gave up.

Putting numbers on the triangle

It helps to make the constraint quantitative, because the moment you write the timing down, the corners stop being abstract.

Advertisement

Start with latency. The end-to-end delay of a perception-to-action decision made in the cloud is a sum of terms:

Lcloud = tcapture + tencode + tuplink + tinference + tdownlink + tdecode

Run the same decision onboard and most of that sum disappears:

Ledge = tcapture + tinference,local

Advertisement

The difference between the two is not the inference time, which can actually be lower in the cloud on better hardware. The difference is the network, tuplink + tdownlink, and more importantly its variance. A measured cloud-robotics setup over a fast wired link saw round trips of roughly 30 milliseconds [7], while real-world deployments commonly sit in the 100 to 300 millisecond range, and wireless links swing far higher. Edge processing, by contrast, pulls round trips down toward 1 to 5 milliseconds because nothing leaves the machine [8].

Now state the rule that decides where a loop can live. A control loop with timing budget Lbudget can run on a given compute path only if

Lpath + k·σjitter ≤ Lbudget

where σjitter is the standard deviation of the path’s latency and k is the safety factor you need for determinism. That k·σjitter term is the quiet killer. Teleoperation studies are blunt about it: a link that holds a steady 100 milliseconds is workable, but one oscillating between 30 and 200 milliseconds produces jerky, unpredictable motion, because the controller cannot plan around delay it cannot predict [9]. The reflex loop’s budget is 1 to 10 milliseconds. No wide-area path satisfies the inequality. The math, not the architect, forbids it.

Advertisement
Control loop Timing budget Onboard path (~1-5 ms) Wide-area path (~30-300 ms)
Reflex (motor control, e-stop) 1-10 ms Feasible Impossible
Perception (detection, tracking, SLAM) 30-100 ms Feasible Marginal, fails on jitter
Deliberation (planning, language) 1-10 s Feasible Feasible (async)

The table is the argument in one view. Reflex never clears a network round trip. Perception clears it only on unusually good links. Deliberation has budget to spare, which is why it can live in the cloud asynchronously.

Bandwidth closes the case for perception. A single 1080p camera at 30 frames per second produces raw video at 1920 × 1080 × 3 bytes × 30, which is about 1.5 gigabits per second. A modest four-camera plus depth rig clears 6 gigabits per second of raw sensor data. You can compress it, but compression costs latency and the link still has to carry it reliably, everywhere the robot goes. Edge perception is the robotic version of that move. Compress to a semantic representation on the spot; never ship the raw stream.

Finally, the economics, which is just the trilemma with a dollar sign. Onboard compute is a one-time capital cost. Cloud reasoning is an operating cost that accrues with every query:

Ccloud(t) = r·ctoken·t

Advertisement

where r is the query rate and ctoken the per-token price, against a flat Cedge = Ccapex. The two lines cross at t* = Ccapex / (r·ctoken). Push thirty frames a second to a cloud model and t* arrives almost immediately, so cloud cost dominates the lifetime of the fleet. Route only a few deliberation-class queries per minute upstream and t* recedes over the horizon.

Strategy What goes upstream Cost shape Break-even t*
Stream everything ~30 frames/sec to a cloud model Steep linear opex Almost immediate
Route deliberation only A few queries/min Shallow linear opex Past fleet service life
Fully onboard Nothing One-time capex, flat Never crossed

Same hardware, same models, opposite economics, decided entirely by which loop you placed in which corner. The gap is not subtle: a single camera streamed to a cloud vision model at 30 frames per second is on the order of a million inference calls a day per robot, while routing only deliberation-class queries upstream might be a few hundred. Across a fleet, that is the difference between cloud inference being a rounding error and being the largest line on the operating budget.

The escape nobody designed, because biology did it first

Here is the part I find beautiful, and the heart of what I want to argue: the way out of the embodied trilemma is not to solve it. It is to refuse to answer it at a single point.

Your own body is built this way, and it has been for roughly half a billion years.

Advertisement

When you touch a hot stove, your hand pulls back before your brain knows anything happened. That is the spinal reflex arc, a loop that runs through the spinal cord and never waits for the cortex. It is fast and free (it works even if you are barely conscious), and it is emphatically not smart. It does not reason about the stove. It does not need to.

Your retina does something just as telling. It has over a hundred million photoreceptors, but the optic nerve carrying signal to the brain has only about a million fibers [10]. The eye does roughly a hundredfold compression on the spot, locally, before transmitting anything. It does not ship raw pixels up the cable. It ships a processed, compact representation. Fast and free at the edge, by necessity.

And then there is the cortex, which is where the actual reasoning happens. It is slow, it is powerful, and crucially, the body has arranged things so that when the cortex is slow or offline, the reflexes still fire and you still pull your hand back. Evolution put the survival-critical functions where they never depend on the smart, slow part.

That is the whole trick. Biology never built a single neuron that was smart, fast, and free all at once. It built a hierarchy in which different loops each sit at a different corner of the triangle, and it made sure the corner each loop sacrifices is one that loop can afford to lose. Reflexes give up intelligence, which is fine, because a reflex that stops to think is a reflex that gets you killed. The cortex gives up speed, which is fine, because it has been kept off the survival path entirely.

Advertisement

A robot escapes the embodied trilemma the same way, or it does not escape at all.

Mapping the triangle onto a machine

Translate the nervous system into engineering and a practical architecture emerges. A robot has three loops, and each one belongs at a different corner.

The reflex loop (1 to 10 ms): motor control, stabilization, emergency stops. This is the spinal cord. It must be fast and free and is allowed to be dumb. It lives onboard, always, and never touches a network.

The perception loop (30 to 100 ms): detection, tracking, obstacle avoidance, visual odometry, SLAM. This is the retina. It must keep working when the link drops, and the bandwidth math forbids shipping raw sensor data anyway, since even a single camera produces well over a gigabit per second of raw video before compression. So perception compresses at the edge, exactly as the eye does, and emits a compact semantic representation rather than pixels. Fast and free, intelligence traded away on purpose.

Advertisement

The deliberation loop (1 to 10 seconds): task planning, language understanding, deciding what to do when the plan breaks. This is the cortex. It is allowed to be slow, and slowness is exactly the corner it trades away, reaching a frontier model in the cloud asynchronously rather than in the control path. It stays free in the only sense that matters, never holding the robot hostage to a live link. If connectivity vanishes, the robot gets less clever, not less safe.

The interface between these layers is the optic nerve of the system: a deliberately narrow channel carrying detections, tracks, and state summaries, never raw signal. Get that channel right and you have not just an inference boundary. You have defined your logging schema, your training-data pipeline, and your behavior when the link drops, all at once.

The industry is rediscovering the nervous system

What convinces me this is structural, not stylistic, is that the most advanced robotics programs keep reinventing the same hierarchy without necessarily naming it.

Figure AI’s Helix, the system running its humanoid robots through full eight-hour factory shifts, is explicitly two systems: a roughly 7-billion-parameter vision-language model at 7 to 9 Hz for scene understanding and language, coupled to a compact 80-million-parameter visuomotor policy that turns intent into continuous action at 200 Hz [1]. That is cortex and reflex on one robot, a 25-to-1 ratio in update rate between the loop that thinks and the loop that acts, each running at the timescale its job demands. Surveys of edge-cloud collaboration now describe the same division as standard practice, with small onboard models handling real-time perception and privacy-sensitive preprocessing while heavier reasoning is offloaded upstream [4].

Advertisement

Comparisons on real robot data quantify the trade directly: deploying an 11-billion-parameter vision-language model at the network edge held accuracy close to its cloud baseline while shaving only modest latency, whereas a compact 2-billion-parameter model more than halved latency into sub-second territory, paying for the speed with accuracy [5]. Reviews of foundation-model robotics keep flagging the same wall: LLM planners take seconds per decision, fine for the cortex, hopeless for the spinal cord [6]. NVIDIA’s own Jetson deployment guidance reflects it too, with optimized onboard inference for perception and policy and larger models living upstream [2].

Different teams, different machines, the same triangle, the same corners. When that many independent efforts converge, you are looking at structure, not style.

Lessons from the ultimate airgap

The starkest place to watch the trilemma bite is underwater robotics. An ROV below the surface has effectively no real-time link to the cloud. The ocean is the ultimate airgap, the freedom corner taken to its absolute extreme. In hands-on underwater robotics builds, perception (detection and tracking, optimized with TensorRT) runs entirely on an onboard module, while language-level mission interaction and fleet reasoning reach a frontier model in the cloud only asynchronously, on surfaced or relayed data, and never inside a control loop. The architecture is not a preference there. The water enforces it.

Three principles follow, and they generalize far beyond the sea.

Advertisement

Design for the disconnected case first. If the robot is safe and useful with zero connectivity, the cloud becomes pure upside: better reasoning, fleet learning, human oversight. If the robot needs the cloud to stay safe, you have built a cortex with no spinal cord, a liability on wheels.

Treat the narrow channel as a contract, not a cable. The compressed representation crossing the edge/cloud boundary is the single most important interface in the system. Teams that treat it as an afterthought re-architect twice.

Remember the trilemma is also an economics statement. Onboard compute is paid once, at purchase. Cloud reasoning is paid forever, per token. Routing only deliberation-class queries upstream, a few per minute instead of thirty frames per second, changes fleet unit economics by orders of magnitude. Cloud-inference cost can quietly become the largest operating line on a robotics program that put the wrong loop in the wrong corner.

The corners will move. The triangle won’t.

Onboard modules get more capable every generation, and distillation keeps narrowing the gap between edge models and their cloud teachers. Early-exit inference, where confident predictions resolve locally and only hard cases escalate, is maturing fast [3][5]. The deliberation loop will migrate partly onboard over the next few years, especially for safety-relevant replanning. The corners of the triangle will keep sliding.

Advertisement

But the triangle itself does not go away, because it is anchored in physics and energy, not in any model generation. Smart, fast, and free will never coexist on a single substrate as long as frontier intelligence costs more power than a robot can carry and the speed of light caps how fast a remote answer can return. The teams that internalize this, and that consciously assign each loop the corner it can afford to lose, will ship robots that work when the network does not. The rest will keep learning, in the field and at the worst possible moment, that they accidentally wired their spinal cord through a datacenter.

Evolution settled this argument before there were spines. We are just catching up.

References

1. Figure AI. “Helix: A Vision-Language-Action Model for Generalist Humanoid Control.” figure.ai/news/helix. 2025.

2. NVIDIA Developer Blog. “Getting Started with Edge AI on NVIDIA Jetson: LLMs, VLMs, and Foundation Models for Robotics.” developer.nvidia.com. 2025.

Advertisement

3. Qu, G., Chen, Q., Wei, W., Lin, Z., Chen, X., and Huang, K. “Mobile Edge Intelligence for Large Language Models: A Contemporary Survey.” IEEE Communications Surveys and Tutorials, 2025 (arXiv:2407.18921).

4. Li, S., Wang, H., Xu, W., Zhang, R., Guo, S., Yuan, J., Zhong, X., Zhang, T., and Li, R. “Collaborative Inference and Learning between Edge SLMs and Cloud LLMs: A Survey of Algorithms, Execution, and Open Challenges.” arXiv:2507.16731, 2025.

5. Ahmad, S., Hafeez, M., and Zaidi, S.A.R. “Vision-Language Models on the Edge for Real-Time Robotic Perception.” University of Leeds, arXiv:2601.14921, 2026.

6. Khan, M.T., and Waheed, A. “Foundation Model Driven Robotics: A Comprehensive Review.” arXiv:2507.10087, 2025.

Advertisement

7. Kapoor, A., et al. “A Predictive Application Offloading Algorithm Using Small Datasets for Cloud Robotics.” arXiv:2108.12616, 2021.

8. Coutinho, R.W.L., and Boukerche, A. “Design of Edge Computing for 5G-Enabled Tactile Internet-Based Industrial Applications.” IEEE Communications Magazine, 60(1), 2022.

9. Urbaniak, D., et al. “5G for Robotics: Ultra-Low Latency Control of Distributed Robotic Systems.” IEEE.

10. Kandel, E.R., Schwartz, J.H., and Jessell, T.M. “Principles of Neural Science.” McGraw-Hill.

Advertisement

Source link

Continue Reading

Tech

Context compression finally works in production: new research cuts LLM input 16x without the accuracy hit

Published

on

Context windows are becoming a computational bottleneck. The longer an agent runs, the more tokens accumulate from retrieved documents, reasoning traces and conversation history, and the more memory and compute that growing context demands. Most existing solutions either degrade model accuracy, require the full context to load before compression begins, or produce memory savings that don’t translate into real speedups in standard serving infrastructure.

A research team from NYU, Columbia, Princeton, University of Maryland, Harvard and Lawrence Livermore National Laboratory published a paper this week that proposes a novel fix. The researchers introduce the concept of  Latent Context Language Models, or LCLMs, a family of encoder-decoder compression models that compress input context before it reaches the decoder. The models are open-sourced on HuggingFace.

Unlike KV cache compression methods — the dominant approach in the field, which still materialize the full KV cache before evicting entries — LCLMs compress the input token sequence before decoder prefill, so higher compression ratios directly reduce decoder-side compute and memory. The paper reports LCLMs at 16x compression produced output 8.8 times faster than KV cache baselines on the RULER long-context benchmark.

“These ballooning contexts take up memory and compute, and they are becoming a computational bottleneck for LLMs,” Micah Goldblum, co-lead advisor on the project and a researcher at Columbia University, told VentureBeat. “Our goal was to train language models end-to-end that can handle very long contexts efficiently and accurately. If you can make such a language model, everything becomes cheaper and faster.”

Advertisement

What LCLMs can do

LCLMs let models process much longer contexts than would otherwise be practical, at a fraction of the memory and compute cost, without the accuracy degradation that makes most compression methods a poor tradeoff in production.

At 4x compression, the paper reports accuracy of 91.76% on the RULER benchmark, compared to 94.41% with no compression at all. That is less than a 3 point drop for cutting context to a quarter of its original size. At 16x compression, where 93.75% of input tokens are removed, accuracy fell to 75.06%. Every KV cache method tested at the same compression ratio scored lower.

The gains hold on shorter inputs too. On GSM8K math word problems, where the full prompt is compressed rather than just retrieved documents, LCLMs outscored every other method tested regardless of compression ratio.

 Latent Context Language Models achieve high quality compression while being fast and memory efficient

Credit: End-to-End Context Compression at Scale research paper https://arxiv.org/pdf/2606.09659

Advertisement

How it was built

The architecture pairs a 0.6B encoder with a 4B decoder. The encoder compresses blocks of input tokens into shorter sequences of latent embeddings. The decoder processes those in place of the original tokens. Training ran across more than 350 billion tokens.

The training recipe mixes three data types:

  • Continual pre-training data with compressed and uncompressed spans interleaved throughout

  • Supervised fine-tuning data covering reasoning and long-context tasks

  • An auxiliary reconstruction task that pushes the encoder to retain fine-grained detail

The combination addresses a tradeoff that limited earlier compression work, where preserving reconstruction accuracy came at the cost of general task performance.

An architecture search identified the optimal configuration. The paper found that scaling the decoder matters more than scaling the encoder.

Advertisement

Where it fits in an agentic stack

An LCLM is not an abstract research concept. It is designed to work with an existing stack. “You can simply swap out LCLMs for any existing LLM,” Goldblum said. “Whenever you retrieve data such as documents and want to dump it into your model’s context, simply run those documents through the LCLM’s compressor first.”

He noted that in the research paper, the researchers demonstrated how to build agents that selectively decompress useful text. 

“Think about this like a human skimming content before zooming in on relevant details,” Goldblum said.

Goldblum also cautioned that teams integrating the approach into existing agentic pipelines will need to tune their RAG systems accordingly.

Advertisement

“We also haven’t worked on online compression of reasoning traces,” he said. “The naive approach of just occasionally compressing the trace while generating it might work, but that remains to be determined.”

What this means for enterprises

Context windows are growing faster than inference infrastructure can keep up, and enterprises are already spending to fix it. VB Pulse Q1 2026 survey data from 100-plus employee organizations shows hybrid retrieval adoption intent tripling from 10.3% in January to 33.3% in March. Retrieval optimization overtook evaluation as the top investment priority by March, reaching 28.9% of qualified respondents.

Three things stand out for teams evaluating production fit:

  1. Inference cost scales with context length. At 1 million tokens, uncompressed inference with standard KV cache methods runs out of memory on a single H200 GPU. The paper reports LCLMs at 16x compression remain within memory bounds at that context length.

  2. RAG pipeline integration requires tuning. Teams with existing RAG pipelines will need to validate compression behavior against their retrieval quality metrics before deploying at scale.

  3. Reasoning trace compression is unsolved. For agents running long reasoning chains, context growth from the trace is a separate problem from document retrieval. Goldblum acknowledged the gap directly: the naive approach of periodic trace compression might work but has not been tested.

The models are available at huggingface.co/latent-context and the code at github.com/LeonLixyz/LCLM.

Advertisement

“The biggest things our architectures do is give your model access to much larger contexts, but they also unlock multiscale approaches where your model can skim vast amounts of text or code super fast and then only zooms in and fully reads a small portion of the most useful text,” Goldblum said.

Source link

Continue Reading

Tech

Meta’s Edits app is getting an AI assistant and a desktop version

Published

on

Meta on Wednesday previewed upcoming additions to its video-editing app Edits at an invite-only creator event in L.A., showing off features like a new AI assistant and a desktop version of the previously mobile-only app.

The company also announced other new tools will launch in the app today, such as a “Beta” tab for experiments and expanded audience insights.

Edits first arrived last year as a direct competitor to ByteDance’s CapCut. With the addition of the new and upcoming tools, Meta is looking to both retain and attract new users.

The upcoming AI assistant will help creators analyze their insights and brainstorm ideas for their content. The assistant will use their Instagram data, like their views and video-retention insights, to help them see what’s working and why. It will suggest video ideas based on performance and suggest making content with trending audio.

Advertisement

By integrating an AI assistant directly into Edits, Meta is aiming to keep creators engaged on Instagram as it continues to compete with TikTok and YouTube for creators’ attention. Additionally, by offering creators content ideas, Meta is encouraging more frequent posting, which could, in turn, boost user engagement. Direct access to an AI assistant also gets rid of the need for creators to turn to outside tools like ChatGPT when brainstorming content ideas and understanding performance.

Meta launched a similar AI assistant tool for creators on Facebook last week. It’s worth noting that YouTube and TikTok also offer tools to creators to help them brainstorm ideas. For instance, YouTube Studio features an Inspiration tab that uses AI to help creators generate video ideas, while TikTok offers creators an AI assistant that can brainstorm ideas and uncover trends.

The desktop version of Edits will give creators more precise control over the editing process as well as the ability to work on a larger screen, which can be helpful during more advanced editing workflows. The company says creators will be able to sync their workflows seamlessly between mobile and desktop devices.

The upcoming desktop version will also allow Edits to better compete with CapCut, which already offers a desktop version.

Advertisement
Image Credits:Instagram

Among the new features launching today is a Beta tab, which will provide creators with early access to experimental features that are still in development and allow them to provide Meta with feedback. The rollout of the Beta tab indicates that Meta wants to better compete with CapCut and accelerate feature development based on what creators actually want and will use.

Creators will also now be able to see more detailed metrics like their audience demographic breakdown and the time of day their audience is the most engaged. The new metrics join the app’s existing analytics, which include data such as how long viewers watch a video, how many followers were gained from a specific video, where users stop watching a certain video, and more.

Additionally, creators can search specific topics within the app’s Inspiration feed to discover reels and templates other creators are making around a given trend or idea. They’ll also be able to create multiple versions of a single piece of content to test what performs best before publishing.

Although Instagram didn’t share specific numbers about how many users Edits has, the company says that content made with the app sees a 10% higher save rate and 2% higher reshare rate compared to content not made on Edits, and that more than half of people watching reels on Instagram are seeing Edits-created content every day. 

Edits is free to download on iOS and Android.

Advertisement

The AI assistant announced today is currently in testing with attendees of Thursday’s creator event, while the desktop version of Edits is “coming soon,” Meta says. The rest of the features are launching to everyone today.

When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.

Source link

Advertisement
Continue Reading

Tech

AI is bankrupting your cloud budget. Here’s what savvy bizs are doing instead.

Published

on

[This is a sponsored article with Synology.]

The cloud was supposed to eliminate infrastructure headaches. Instead, some businesses are discovering a new one: invoices they no longer understand.

Storage fees, data retrieval charges, and backup costs are quietly pushing cloud spending higher than many organisations anticipated—and artificial intelligence (AI) is about to make it significantly worse. 

It’s a challenge Taiwanese storage company Synology has been watching closely. 

Advertisement

At Computex 2026 in Taipei last week, the company argued that the economics of cloud-first infrastructure are beginning to shift.

The cloud bill that keeps growing

Businesses that moved enthusiastically to cloud-first infrastructure in the early 2020s are now sitting with bills that look nothing like the ones they signed up for. As costs continue to climb, some are reassessing whether keeping more data on-premise could make better financial sense.

The main catalyst is artificial intelligence. 

AI doesn’t just store data—it constantly accesses, moves, and processes it. Every inference call, every model training run, every search query is pulling data in and out of storage.

Advertisement

Gartner forecasts that more than 80% of enterprises will have deployed AI-enabled applications by 2026. And in a cloud environment, every one of those activities comes with a price tag.

Image Credit: Summit Art Creations via Shutterstock

From the start, cloud storage has been largely marketed on a simple figure: storage cost per gigabyte. Amazon Web Services S3 Standard, for example, runs at roughly US$23 per terabyte per month, with Google Cloud and Microsoft Azure in a similar range. 

For many businesses, the math looked straightforward. But what’s less visible is everything layered on top of that base rate.

Every time data is retrieved or moved—something AI workloads do constantly—cloud providers charge additional fees. 

On AWS, data egress starts from around US$0.09 per gigabyte transferred out. At scale, even restoring a 10TB dataset can quietly add about US$700 to the bill, before factoring in anything else.

Advertisement

Add API requests, cross-region replication, and backup-related charges, the total cost can be pushed up to four times the advertised storage rate. 

A Backblaze survey of more than 400 IT leaders found that 95% encountered unexpected cloud storage costs. And according to the Wasabi Global Cloud Storage Index, which surveyed 1,600 IT decision-makers globally, including 525 across APAC, 63% of organisations in the region exceeded their cloud storage budget in 2024. 

Businesses are bringing data back in-house

Amid rising costs, cloud repatriation—bringing data and workloads back from public cloud providers onto private or on-premise infrastructure—has moved from a niche IT discussion to a mainstream business decision. 

Synology’s on-premise storage solutions, PAS7700 and FlashStation Series./ Image Credit: Synology

A 2025 Barclays CIO survey found that 86% of enterprise CIOs planned to shift at least some workloads back to private or on-premise systems, the highest level recorded. The Flexera 2025 State of the Cloud report similarly shows that 21% of workloads have already been repatriated, even as overall cloud spending continues to grow.

However, it’s important to note that most organisations aren’t abandoning cloud wholesale

Advertisement

Instead, they are taking a hybrid approach by keeping cloud platforms for global accessibility and collaboration, while bringing back workloads that involve heavy storage, protection, and processing. Backup and production storage are among the most commonly repatriated, as these are the areas where costs scale most quickly.

For Singapore businesses, there’s another factor at play: regulation. Under MAS Technology Risk Management guidelines and PDPA obligations, companies are expected to know where their data is stored and how it is being accessed.

That becomes harder in large public cloud setups, where data can be spread across multiple regions and servers. On-premise systems, by contrast, make it easier to keep track of exactly where information sits and who has access to it, since everything is managed within a company’s own infrastructure.

What Synology is bringing to the table

Image Credit: Synology

Synology is one of the companies building for this shift. 

The firm is best known for its Network Attached Storage (NAS) hardware—physical devices that store data locally while still functioning as a private cloud. 

Advertisement

At Computex 2026, it outlined how it is expanding its NAS ecosystem beyond storage into AI-enabled data management and backup infrastructure.

At the centre of this push is Synology’s next-generation DiskStation Manager (DSM), the operating system that powers every Synology NAS device.

The Taiwanese firm has spent more than two decades building NAS hardware and software. Today, it has shipped over 14 million systems worldwide, managing more than 400 exabytes of data. 

AI that stays in-house

Synology Product Marketing Manager Katherine Chiang unveils DSM Agent 2.0 at Computex 2026./ Image Credit: Synology

At Computex 2026, the company announced the roadmap for the next generation of DiskStation Manager, DSM Agent 2.0, expanding it from a storage operating system into an intelligent data platform for governed, on-premises AI workflows. The goal is to turn DSM from a storage system into a smarter data platform that can support AI tools running on a company’s own infrastructure.

Instead of sending data to external cloud services, businesses can use their own data, such as files, system logs, and usage data, to power AI tools internally, while keeping everything under their control.

Advertisement

“The next generation of DSM leverages over two decades of expertise to create an AI-ready platform that keeps organisations firmly in control of their data,” said Philip Wong, Chairman and CEO of Synology.

Some AI features available include a conversational assistant for troubleshooting and system management. More advanced AI agents are also in development, designed to handle tasks such as email drafting, formula searches, meeting transcription, and real-time translation, although no release date has been announced yet.

As these capabilities expand, privacy becomes even more important in the age of AI. The system already includes a feature that masks sensitive data such as names, ID numbers, email addresses, and financial information locally before anything is sent to external AI providers like OpenAI or Azure AI. 

Future updates will go further, with support for fully on-premise large language models, where no data needs to leave the organisation’s infrastructure.

Advertisement

Synology’s infrastructure is already at work in Singapore

The value of on-premise data infrastructure is already clear for Singapore businesses using Synology. 

Image Credit: I Love Taimei/ Lasalle College of the Arts

Food chain I Love Taimei, which has 17 outlets in Singapore, uses Synology’s DSM system to manage surveillance footage across all locations. This cuts management time by 65% and also allows the company to run AI-powered customer analysis without sending footage to the cloud.

LASALLE College of the Arts also uses Synology NAS for file storage and 4K video collaboration, allowing students and staff to access large project files easily across Mac computers without compatibility issues or rising costs.

Together, these examples show why some organisations are rethinking the assumption that everything belongs in the cloud.

Cutting backup costs without the cloud

Synology Product Manager Cody Hall unveils ActiveProtect Manager 2.0 at Computex 2026./ Image Credit: Synology

The same push toward more controlled, on-premise infrastructure also extends to backup. At Computex, Synology introduced ActiveProtect Manager 2.0, a centralised backup system that will launch in Q3 2026.

The key issue it addresses is cost. Most backup services charge per server, virtual machine, or device. ActiveProtect instead charges for the hardware, with no extra per-workload fees.

Advertisement

In some cases, customers have seen a lower total cost of ownership. For example, Taiwanese media company Info Times reduced setup costs by 65% and cut storage needs by 75%. Toyota also reduced its backup data by 75% through better storage efficiency.

ActiveProtect 2.0 works with existing systems, so companies don’t need to replace their current setup. It also uses machine learning to detect unusual backup activity and help prevent ransomware infections from being restored. 

And because everything is stored locally, recovery is faster—taking hours instead of days—and there are no cloud data transfer fees.

The bigger picture

Cloud still has an important role to play, whether for global access, extra computing capacity, or supporting teams across different regions.

Advertisement

What’s changing is that businesses are becoming more selective about what they keep in the cloud. Rather than moving everything to a single platform, many are deciding where data should live based on cost, performance, and compliance requirements.

For Singapore businesses that have quietly accepted rising cloud bills as part of the cost of doing business, it may be time to take a closer look at the numbers.

Explore Synology’s solutions here. 

Featured Image Credit: Synology

Advertisement

Source link

Advertisement
Continue Reading

Tech

Anthropic launches powerful Fable 5 model publicly, while keeping Mythos restricted over cybersecurity concerns

Published

on

In context: Anthropic’s latest release is really a story about control, not just capability. The company is offering two versions of the same underlying model: Claude Mythos 5 for a small circle of trusted partners, and Claude Fable 5 for everyone else. The split reflects a core challenge Anthropic is still trying to solve – how to deploy an extremely capable system into the wild without simultaneously handing attackers a new class of offensive tools.

Mythos has already shown what it can do when it is not heavily restricted. Since April, when an earlier preview was sent to about 150 organizations under the banner of Project Glasswing, users have reported more than 10,000 critical security flaws in their own systems. Those same capabilities could also be used by attackers looking to break in, rather than to patch security holes.

For that reason, Mythos 5 is staying behind the glass for now. Anthropic is keeping it in the hands of a “small group of cyberdefenders and infrastructure providers,” along with select biology researchers, and is coordinating with US government agencies as part of the rollout. Access is effectively on a need-to-know basis, with the company signaling that a broader “trusted access program” will come later.

Fable 5 is where Anthropic is testing what a general-purpose release of Mythos-class technology looks like under constraint. Technically, it runs on the same underlying model as Mythos 5, but with hard limits built in. The system is designed to refuse or redirect a long list of requests related to cybersecurity, biology, and chemistry. When those guardrails trigger, the query is silently routed to an older model, Claude Opus 4.8, instead.

Advertisement

Anthropic has also wired Fable 5 to watch for distillation, where a user tries to harvest large volumes of answers to train a smaller model of their own. If the system thinks that is happening, those requests are also redirected to Opus 4.8. In other words, the company is not only trying to control what the model will talk about, but also what others can learn from it.

Anthropic has been wrestling with these decisions for months. Diane Penn, the company’s head of product management, told Wired that testing and feedback since the April preview have helped shape the current strategy, even though it is still far from perfect.

“We’re trying to make improvements in a way that’s beneficial, even if we don’t have the perfect [solution] for every use case to start,” she says. “Out of all the different approaches, this emerged as the most viable and the best one. We just ended up feeling like this was the best product choice for users to get the maximum value out of Fable 5.”

For now, the filters are tuned to err on the side of over-blocking. Penn has acknowledged that some harmless queries will be routed to the older model. Anthropic says it wants to refine its classifiers over time but argues that this level of caution is the only way to justify a wider release at this stage.

Advertisement

The stakes are higher because Fable and Mythos are not just chatbots that respond to prompts and stop. Anthropic says both can run “unattended” for longer stretches than previous Claude models, carrying out sequences of instructions without constant supervision.

That shift toward more agent-like behavior could substantially boost software engineering and other technical work, especially given Fable 5’s stronger code generation and visual capabilities. But it also raises obvious questions about what happens if those capabilities are misused.

Anthropic’s pricing reflects how powerful it believes these systems are compared with its other models. Fable 5 and Mythos 5 cost $10 per million input tokens and $50 per million output tokens, roughly double the company’s other public models but still cheaper than the earlier Mythos Preview. The higher price reflects both the performance gains and the sense that these models are still positioned as specialized systems, not yet just another SKU in a growing catalog.

Around Anthropic, competitors are moving in a similar direction. OpenAI has rolled out its own advanced cybersecurity model to a small circle of partners and convened a working group that echoes Project Glasswing. Both companies are preparing for potential IPOs and are under pressure to show investors they can ship cutting-edge technology without triggering backlash over safety concerns.

Advertisement

Even some of the people watching from the outside say the unease is justified. Canadian finance minister François-Philippe Champagne told the BBC that public concern around Mythos stemmed from “it’s the unknown, unknown.”

Anthropic co-founder Jack Clark has made a similar point from the inside, arguing that the industry has not yet figured out how to slow itself down. “You want the option to be able to take your foot off the gas and put your foot on the brake,” he said. “Right now, it’s like the AI industry has a gas pedal, but it doesn’t have a brake pedal.”

Source link

Advertisement
Continue Reading

Tech

Microsoft’s open-source SkillOpt automatically upgrades AI agent skills without touching model weights

Published

on

Agent skills have become an important part of real-world AI applications, providing a mechanism — a set of instructions saved in a folder of text-based markdown (.md) files, usually — for models to adapt to specific enterprise use cases and complex workflows.

However, optimizing these skills is a slow process and faulty process, as they cannot be trained in the same way as the parameters of the underlying AI model. Instead, users typically must update them manually by retyping the instructions in each file, playing a “guessing game” as to what changes might improve agentic AI performance and reduce errors.

SkillOpt, a new, open source (MIT Licensed) framework developed by Microsoft, does one better: it introduces an optimizer designed for agent skills, turning the agent’s skill .md document as a trainable object that evolves based on performance feedback.

It uses deep-learning-style optimization to make it possible for the AI to systematically explore modifications to the document and find the best combination of instructions. Most importantly, it accomplishes this procedural adaptation without making changes to the underlying model’s weights.

Advertisement

On various industry benchmarks, SkillOpt outperforms existing baselines, significantly boosting accuracy for models like GPT-5.5 and Qwen. The result is a set of compact, transferable skill artifacts that allow AI agents to adapt to new domains effortlessly.

The challenge of optimizing agent skills

Agent skills package procedural knowledge into natural-language specifications, including domain heuristics, tool-use policies, output constraints, and known failure modes. These skills provide an external interface for agents to adapt to complex enterprise workflows. In practice, agent skills are stored as text documents and inserted into the agent’s context before execution.

One of the key benefits of skills is that they customize the behavior of the underlying model without changing its weights. However, the skill document itself needs to be tweaked and optimized to get the best performance out of the agent.

While deep learning relies on strict mathematical controls for stability, human prompt engineering often relies on trial and error. When attempting to automatically update a skill document based on feedback, the lack of mathematical discipline makes text highly volatile.

Advertisement

Yifan Yang, Senior Research SDE at Microsoft Research Asia, told VentureBeat that the problem is not making changes, but ensuring those changes are mathematically sound.

“The breaking point isn’t whether a team can change a skill, it’s that they can’t guarantee the change is an improvement,” Yang said. “Three failure modes recur: no step-size control, so skills drift; no validation, so a fix that reads as reasonable gets written in and can quietly regress performance; and no negative memory, so the same failed edit keeps coming back.”

SkillOpt framework

To illustrate how easily performance can drop when edits aren’t mathematically validated, Yang noted that “an ungated rewrite pushed GPT-5.5 on SpreadsheetBench from 41.8 down to 41.1.”

According to Yang, these failure modes are amplified in multi-step workflows “because that’s where frontier models are weakest zero-shot. Not on reasoning, but on procedural discipline: format, self-verification, tool policy.”

Before SkillOpt, agent skills were primarily hand-crafted, generated in a single shot, or evolved through loosely controlled self-revision pipelines that could not reliably improve under feedback.

Advertisement

Prompt optimization methods like TextGrad and GEPA treat language artifacts as optimizable objects and use trajectory feedback to evolve prompts, but they focus on single-prompt configurations rather than generating persistent, reusable skill artifacts.

Meanwhile, skill evolution and discovery methods like EvoSkill and Trace2Skill convert agent execution experiences into trajectory lessons to refine skill folders, build domain-specific libraries, or perform evolutionary search.

None of them apply deep-learning-style controls, such as learning rates, validation gates, and momentum, which are necessary to continuously train a single, compact skill document.

Importing mathematical discipline to text

SkillOpt optimizes a text document through an iterative propose-and-test loop that separates the model executing the tasks from the model optimizing the skill. The process unfolds in several steps:

Advertisement
  • SkillOpt starts with an initial skill document and a frozen target model (or harness), where the target model runs a batch of tasks to generate execution trajectories that act as the evidence for the current step.

  • An offline optimizer model analyzes these trajectories, separating successes from failures into minibatches. Looking at a minibatch helps the model identify systematic procedural errors rather than one-off anomalies. Based on these patterns, the optimizer proposes structural add, delete, or replace edits to the skill document.

  • The proposed edits are reviewed to filter out duplicates or contradictions, and the optimizer then ranks these candidate edits by their expected utility.

  • Rather than applying all proposed changes, SkillOpt clips the list to a maximum edit budget for that step, generating a candidate skill.

  • The candidate skill is evaluated on a held-out validation set using the target model. If the candidate improves the validation score, it is accepted and becomes the new current skill. If it fails, the edits are rejected and sent to a rejected-edit buffer, providing negative feedback so the optimizer knows not to repeat that mistake.

SkillOpt directly addresses the problem of treating text as a trainable object by importing mathematical concepts from deep learning. The creators note that “the deep-learning analogy is operational rather than decorative,” helping the framework avoid the instability issues associated with other optimization techniques.

SkillOpt pipeline

SkillOpt framework (source: arXiv)

The edit budget acts as a learning rate. By limiting how many edits can be applied at once, the skill version is prevented from moving too far from its previous state, preserving continuity while allowing new procedures to be acquired. 

Just like checking validation loss in deep learning, the strict held-out examples ensure that plausible-sounding text edits are only kept if they mathematically improve the agent’s actual performance on the validation split.

Advertisement

At the end of an epoch, SkillOpt performs a slow update by comparing tasks under the previous and current epoch’s skills. This acts like a momentum term, carrying durable, long-horizon procedural lessons forward while isolating them from the fast, step-level edits.

SkillOpt in action

To evaluate the technique in practice, researchers tested SkillOpt across different models, ranging from large-scale frontier models like GPT-5.5 to smaller closed and open models including GPT-5.4-mini and Qwen3.5-4B. They also deployed the skills within different execution harnesses, using plain chat as well as complex coding harnesses like the Codex CLI and Claude Code.

The evaluation spanned diverse industry benchmarks including single-round question-answering, multi-round code generation involving tool use, and multimodal document reasoning. SkillOpt was measured against multiple baselines ranging from a default no-skill setting to human-written skills and one-shot LLM-generated skills. It was also compared against advanced prompt-optimization and skill-evolution methods, specifically Trace2Skill, TextGrad, GEPA, and EvoSkill.

SkillOpt dominated across the board, proving highly effective on all 52 evaluated combinations of model, benchmark, and harness. It was particularly effective with frontier models, delivering an average absolute improvement of +23.5 points against the no-skill baseline on GPT-5.5. Furthermore, SkillOpt outperformed a hypothetical oracle baseline that cherry-picks the best competing method for every problem.

Advertisement

Small target models saw immense relative gains, proving that a compact text file can supply procedural knowledge that small models lack in their weights. For example, GPT-5.4-nano nearly doubled its score on multimodal document QA and tripled its score on embodied interaction and sequential decision-making.

These academic benchmarks map to critical enterprise pain points. Zero-shot models often hallucinate formatting or fail to use tools properly in multi-step scenarios. Yang explained that the biggest performance leaps occurred in operations that enterprises historically struggle to automate reliably.

“Document data extraction… exact figures out of contracts, invoices, and forms — AP automation, claims, compliance,” Yang said. “What improves is reliability: precise formatting, self-verification, auditable outputs. And the gains come from learning procedure, not memorizing answers.”

For enterprise practitioners, the true value of SkillOpt lies in its portability, efficiency, and compatibility with existing infrastructure. Experiments confirm that the framework is harness-agnostic. In addition to basic chat, the same optimization loop was successfully integrated into tool-backed execution environments like the Codex CLI and Claude Code with significant gains on industry benchmarks.

Advertisement

Developers can train a skill using one execution loop and deploy it in another. For example, a spreadsheet skill trained entirely inside the Codex loop was moved directly into Claude Code and drove a +59.7 point gain over Claude Code’s native baseline without any further changes.

SkillOpt artifacts also transfer cleanly across model scales. A skill optimized for GPT-5.4 was deployed onto the smaller GPT-5.4-mini and GPT-5.4-nano models with positive gains, proving that the learned procedures encode reusable workflows rather than just exploiting quirks of a specific model’s architecture.

Finally, the framework is highly efficient regarding token usage and context window real estate. Across all benchmarks, the final deployed skills never exceeded 2,000 tokens, with a median length of roughly 920 tokens. This results in highly readable, auditable artifacts that a human practitioner can review and manage in minutes.

Implementation strategies and the enterprise ‘catch’

For enterprise tech leaders, adopting a new framework requires understanding the overhead and limitations. While the research paper notes that training tokens can reach up to 210 million for academic benchmarks, the reality for day-to-day enterprise use cases is much lighter. The high token counts in testing were largely due to re-scoring massive held-out test sets.

Advertisement

“The real upfront work is the verifier and a representative held-out split. The optimizer is light; the evaluation harness is where the engineering goes,” Yang said. He added that for everyday use, “in community frameworks like GBrain, where SkillOpt updates run on Claude Sonnet, training a skill for a single task averages just $1–5.” This optimization cost is a one-time fee that amortizes completely at deployment.

However, the framework requires specific conditions to work effectively, namely a few dozen representative examples and a scorable feedback signal. Teams should avoid applying SkillOpt to open-ended or subjective tasks. “With no clean automatic scorer you have to design a human- or model-based evaluator and watch its stability,” Yang said.

SkillOpt also integrates smoothly with existing orchestration stacks, removing a major adoption hurdle. For instance, developers already using pipeline compilers can run both systems harmoniously. “DSPy is a different, complementary layer,” Yang said. “It compiles declarative LM pipelines and optimizes program structure; SkillOpt optimizes the external skill state a frozen agent loads. You can run them together.”

Looking ahead, open-source developers are already scheduling SkillOpt to run periodically over their agents’ past trajectories, creating a small ecosystem of self-optimizing code-agent plugins. This continuous feedback loop represents a significant shift in how AI systems adapt.

Advertisement

“The valuable version of self-improvement is an agent autonomously discovering knowledge to improve its own behavior and the user experience, under verification and audit,” Yang said. “Skills are the fastest, cheapest, most reversible first step, and the same mindset points toward agents eventually optimizing themselves, all the way down to their own weights.”

Source link

Continue Reading

Trending

Copyright © 2025