For most Singaporeans, Raffles Medical is a familiar name. The healthcare group has built a reputation as one of Singapore’s most established private medical providers.
But behind the scenes, the company has spent the last decade chasing a much bigger ambition.
Rather than remaining a Singapore-focused healthcare operator, Raffles Medical wanted to become a regional healthcare brand—one with hospitals and clinics stretching across Asia.
Today, however, that investment still hasn’t translated into equally impressive financial returns. While its Singapore operations are well-established and consistently profitable, its sizeable investment in China continues to lag behind.
Why China looked like an obvious market
Image Credit: Raffles Medical Group
Back in the mid-2010s, expanding into China seemed like the logical move.
The country’s population was ageing, disposable incomes were rising, and healthcare reforms were gradually opening the door to private healthcare providers.
Rather than stopping at outpatient clinics, Raffles Medical doubled down on its China ambitions by investing in full-service hospitals.
The Raffles Hospital in Beijing./ Image Credit: Raffles Medical Group
Together, the projects required years of planning, construction, regulatory approvals, specialist recruitment and investment in medical equipment before they could even begin seeing patients.
Advertisement
Unlike retail stores or restaurants, hospitals can’t simply open their doors and expect customers to flood in.
Patients need to trust the brand. Doctors need to establish referral networks. Insurance partnerships have to be secured. Operating theatres, diagnostic equipment and inpatient wards all have to be utilised before a hospital starts generating meaningful profits.
In other words, healthcare is a long game.
The investment is huge, but so is the gap
The Raffles Hospital in Chongqing./ Image Credit: Raffles Medical Group
That long game is becoming increasingly visible in Raffles Medical’s financials.
Ahead of its 2026 AGM, shareholders questioned why China’s business had grown so slowly despite years of investment. Between FY2018 and FY2025, revenue from China increased by only S$25.4 million, reaching S$65.4 million.
Advertisement
The disparity becomes even more striking when compared with the group’s asset base. China accounts for around 30% of Raffles Medical’s total assets, yet contributes only 10% of group revenue.
By comparison, Singapore’s asset base is only about 2.2 times larger than China’s, but generates more than 10 times the revenue.
The figures suggest that while Raffles Medical has built a sizeable presence in China, its overseas assets have yet to achieve the same level of utilisation and productivity as its mature Singapore operations.
The company says patience is part of the plan
Raffles Medical doesn’t dispute that its overseas operations are taking time. Instead, management argues that’s simply how hospital investments work.
Advertisement
Building a hospital isn’t the hardest part—building patient volumes is.
Image Credit: Getty Images
The group says overseas operations typically require years to develop clinical capabilities, improve utilisation and reach sufficient scale before becoming meaningfully profitable.
China has also become a tougher operating environment than many expected. The company cited geopolitical tensions, technological restrictions and broader economic challenges as factors weighing on its performance.
Even so, management continues to view China as a strategic market, pointing out that around 30% of the country’s population can already afford higher-quality healthcare, giving it a sizeable addressable market.
More importantly, Raffles Medical has gradually secured access to China’s public insurance system, allowing it to treat more local patients instead of relying primarily on expatriates—a key milestone that could improve patient volumes over time.
Advertisement
China was the exception, not the rule
Image Credit: Raffles Medical Group
Despite more than a decade of overseas expansion, Singapore still remains Raffles Medical’s financial backbone. In FY2025, the group’s local operations generated nearly 90% of its revenue, effectively funding its regional ambitions while newer markets continue to mature.
Not all of its overseas markets, however, have followed the same playbook.
While China saw Raffles Medical invest heavily in building full-fledged tertiary hospitals, its expansion elsewhere has been far more measured.
In markets such as Vietnam, Cambodia and Japan, the group has focused on outpatient clinics, specialist centres and partnerships with local healthcare providers instead of embarking on similarly capital-intensive hospital projects.
That more cautious approach is reflected in its balance sheet. As at FY2025, Raffles Medical’s non-current assets in Greater China stood at about S$304 million, compared with just S$13.4 million across the rest of Asia.
Advertisement
This makes China the group’s biggest regional bet and the market that will likely determine whether its international expansion ultimately pays off.
So, was the gamble worth it?
The Raffles Hospital in Shanghai./ Image Credit: Raffles Medical Group
Hospital investments are unlike most businesses. They take years to generate sustainable returns, but there are signs that Raffles Medical’s China operations are beginning to gain traction.
In FY2025, both its Shanghai and Chongqing hospitals reported higher patient volumes, while Shanghai also recorded revenue and profit growth. The group has also expanded partnerships with leading public hospitals and secured access to China’s National Health Insurance Programme for its Shanghai hospital, moves aimed at broadening its local patient base.
Still, there’s no denying that its financials are still catching up.
If Raffles Medical succeeds in improving utilisation and profitability, years of investment could prove worthwhile. If not, its China expansion could become a costly reminder that succeeding overseas is much harder than replicating a proven business model.
Advertisement
For now, Raffles Medical appears committed to seeing the strategy through.
After spending a decade—and hundreds of millions of dollars—building its regional footprint, turning back is no longer really an option.
Read other articles we’ve written on Singaporean businesses here.
Earlier this year we wrote about the ridiculous thin-skinned executives at Palantir suing a small independent Swiss online magazine, Republik, that had reported on the great lengths the company had gone to, trying to get the Swiss government to purchase Palantir’s surveillance technology. Palantir knew they couldn’t sue for defamation because, you know, everything Republik reported was true. Instead, they sued, trying to invoke a Swiss “right of reply” law, claiming that because Republik refused to publish the press release Palantir wanted to run in response to the reporting, the magazine had violated the law.
As we said at the time, this is the height of entitlement. Palantir doesn’t get to tell Republik how and what it must publish.
And, thankfully, a court has agreed. Zurich’s commercial court rejected 22 of 23 claims that Palantir made.
The data analytics company lost on 22 out of 23 counts of the suit. In a ruling on Friday, Zurich’s commercial court dismissed the majority of counterstatement requests filed by the company and its Swiss subsidiary finding that only a single passage in one article warranted a published response from the company.
While the court agrees that there is a “right of reply” law in Switzerland, it has limitations:
Advertisement
While Swiss media law allows the subjects of a story to request a right of reply, this has caveats: the right of reply has to be concise and stick to the facts of the story.
The one count that stuck: the court found that a single passage in just one article warranted a limited published reply from Palantir.
Also, the court told Palantir to pay Republik for its legal expenses wasted on this SLAPP suit:
The court on Friday ordered Palantir to bear 95% of the 9,000 Swiss francs ($11,300; £8,400) court costs and to pay Republik 9,900 francs in legal expenses.
Of course, this case was always less about the ‘right of reply’ than about making it clear to anyone who reports critically on Palantir that the company will go to war with them, seeking any legal theory, no matter how ridiculous, to tie them up in court — the textbook logic of a SLAPP suit. Republik has said that defending the case cost the small organization quite a lot in time and resources:
Balz Oertli, a journalist with WAV research collective, said: “We invested a great deal of effort into this case, and we are very pleased with the outcome.”
Anyway, given that Palantir seems really upset about Republik’s reporting, it sure would be a shame if you decided to go read this critical reporting of Palantir’s relentless attempts to win business from the Swiss government.
Many of us remember back in our school days taking tests and filling out answers on a Scantron sheet, those long rows of A, B, C, D, and E that had to be filled in with a #2 pencil. Ever wonder why it needed a #2 pencil, or what the point of using a Scantron was at all? That question is answered in the latest video from [SimonRetro], where he takes a look at the Scantron and how it works.
One of the more interesting things about the Scantron is that it’s such a standalone device. No software needed, no keypad to mess with just two rocker switches. The on/off switch is also the way you tell it to forget the last answer sheet and allow you to program in a new test. Upon booting, you feed in a Scantron sheet with some specific boxes filled in, and then it’s programmed and ready to take in and grade all the students’ answers. Opening up the Scantron reveals it’s pretty interesting inside: one control board with early-’90s-era chips. There’s also a lightbulb (no LEDs) shining through the six reading sections of the card, as well as an arrangement of belts and motors to move the card through the machine. The printer is a seven-pin printer used in conjunction with a pair of ink rollers to print out the results on the cards.
[SimonRetro] also went ahead and tried different ways to mark the sheets including pens, Sharpies, colored pencils, and different thicknesses of pencils besides the #2 to see which would and wouldn’t work in the Scantron. Thanks [SimonRetro] for exploring this machine from many of our childhoods and sharing its inner workings. Be sure to check out some of our other reverse engineering articles that explore how classic devices work.
GMKtec has, however, made significant changes to the chassis, abandoning the flat square box typical of most mini PCs entirely.
Latest Videos From
A tower-style redesign built to fix old complaints
The EVO-X3 trades the EVO-X2’s flat footprint for a tall, triple-fan tower that resembles a steel-wrapped graphics card more than a conventional mini PC.
Advertisement
Despite the added height, the footprint remains compact, comparable in size to a PS4 console sitting upright, with GMKtec saying the redesign balances performance, efficiency, and thermal stability across continuous professional workloads.
Reviewers had criticized the EVO-X2 mainly for build quality issues, citing a cheap-feeling case, difficult internal access, and persistent fan noise under load.
This probably informed the design changes on the EVO-X3, though whether the new chassis actually resolves those issues remains to be seen.
Advertisement
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
GMKtec crushed the expectations of enthusiasts when it snubbed AMD’s newer Ryzen AI Max+ 495 chip for the Ryzen AI Max+ 395 silicon.
The processor combines CPU, GPU, and a large NPU rated at 50 TOPS, comfortably above the 40 TOPS threshold required for Microsoft‘s Copilot+ designation.
The EVO-X3 will be available in two storage configurations — 2 TB or 4 TB — and both versions carry the same 128 GB of LPDDR5X-8000 memory.
Advertisement
The device will also feature two M.2 2280 PCIe Gen4x4 slots, allowing total storage to scale up to 8 TB on either configuration.
GMKtec bundles its proprietary Claw+Wrangler suite directly onto the EVO-X3, a local-inference toolkit built for one-click setup and round-the-clock AI agents.
The company claims the 128 GB memory configuration can run models as large as 235 billion parameters entirely on-device, and none of that inference relies on cloud servers, which means no per-token fees and no user data ever leaving the machine.
Advertisement
A steep price jump for a familiar chip
GMKtec lists pre-launch pricing at $3,600 for the 128 GB and 2 TB configuration, rising to $3,849 for the 4 TB version, both described as discounted early figures.
Early access registration opened on June 22, offering a further $20 discount, with the global launch and shipping date both set for July 6.
For comparison, the EVO-X2 launched at $1,999 with 64 GB of memory and a 1 TB drive, making the jump considerable even accounting for the EVO-X3’s larger memory and storage allowances.
Advertisement
It is even a higher jump from the EVO-X1, the model that began GMKtec’s mini PC lineage in late 2024, priced near $900 with a Ryzen AI 9 HX 370 processor.
This means GMKtec has roughly quadrupled its mini PC pricing within two years, a jump of close to 300% from the EVO-X1’s original $900 price point.
It is even a high jump from where GMKtec’s mini PC lineage began with the EVO-X1 in late 2024, a Ryzen AI 9 HX 370 machine priced near $900
The EVO-X3 will face direct competition from other Strix Halo devices carrying the same 128 GB memory ceiling, including the MINIX ER939-AI Pro and the ONEXStation.
Researchers have found a never-before-seen piece of macOS malware that combines a series of clever tradecraft to infect Macs with stealthy, custom-developed credential-stealing code.
The malware is delivered in two stages. The first is distributed in a disk image that masquerades as Maccy, a clipboard manager for Macs. It’s compiled as AppleScript that is notable for the way it delivers the second stage. The malware is named PamStealer because the Rust-written infostealer uses the Pluggable Authentication Modules interface built into macOS to validate the target’s login password before sending it to an attacker-controlled server.
A quieter execution chain
The use of both disk image and AppleScript is common in malware for Macs. More unusual is the way PamStealer combines them to gain stealth. When the AppleScript is double-clicked, it’s opened in the macOS Script Editor, where the malicious functionality is buried deep within the file.
“Rather than relying on shell commands such as curl or zsh, the AppleScript executes a self-contained JavaScript for Automation (JXA) downloader that retrieves and stages the payload using native Objective-C APIs,” researchers from Jamf, a security firm for macOS users, wrote. “Combined with a Rust-based second stage and a password capture workflow that validates credentials locally through PAM, the result is a quieter execution chain than we typically observe in commodity macOS stealers.”
Advertisement
When a user, expecting to install a trustworthy clipboard manager, encounters the disk image, they’re prompted to press Command-R immediately after double-clicking it. This command executes malicious code inside the AppleScript directly. It also allows the execution to bypass com.apple.quarantine, a macOS attribute that provides warnings and restrictions when executable files have been downloaded from the Internet.
As Jamf explained:
PamStealer combines a recently emerging delivery surface with a less familiar payload. While the clickable .scpt and Script Editor lure build on tradecraft that is already gaining adoption across the macOS threat landscape, the malware distinguishes itself through a self-contained JXA dropper, a Rust-based second stage, and a password capture workflow that validates credentials locally through PAM before harvesting them. That second stage puts considerable effort into staying hidden, masquerading as Finder, encrypting its command-and-control traffic, and holding back prompts like the Full Disk Access request for as long as forty minutes so its activity does not line up with launch. Together, these behaviors illustrate how commodity macOS stealers continue to evolve, adopting quieter execution chains and native implementations that reduce traditional detection opportunities while remaining compatible with standard macOS features.
The first stage puts its payload inside an app bundle that impersonates real components built into macOS. The component changes from sample to sample of the malware. Finder.app under com.apple.finder.core or com.apple.finder.monitor, and a Software Update.app under com.apple.security.daemon, are two examples. In either case, they run hidden. They also display macOS’s genuine Finder.icns as its icon.
The idea of an AI-powered device that’s not a smartphone is weird, but not unheard of. According to a report from The Wall Street Journal on Wednesday, SpaceX has already shown investors an early prototype of one.
The report says that Elon Musk’s SpaceX — which includes the social media platform X and the artificial intelligence startup xAI — has developed a handset-like device that’s sleeker and slimmer than an iPhone and runs a proprietary operating system that integrates xAI’s own technologies. The device reportedly runs on a Qualcomm Snapdragon chip, a common feature in many Android phones today.
On Thursday, Musk publicly denied the existence of such a device, calling the claims “utterly false” in a post on X.
Advertisement
In February, Musk publicly stated that a phone was not being developed. Earlier, during an event last October, Musk said, “the idea of making a phone makes me want to die,” while adding, “if we have to make a phone, we will.” However, there’s enough rumored evidence to believe that such a device may exist, even if Musk refuses to call it a phone.
SpaceX began being publicly traded earlier this month. Whether we see a device with its branding remains to be seen, but it wouldn’t be too much of a surprise.
SpaceX did not immediately respond to a request for comment.
Artificial intelligence is already everywhere on our smartphones, but tech companies are racing to build entirely new AI gadgets. OpenAI and Jony Ive are said to be working on a screenless AI device that might be worn on your ear as an always-on assistant.
In a world saturated with “smart” and AI technologies, creating a new device running a different operating system would free Musk from the potential restrictions imposed by Apple and Google’s ecosystems. It could allow SpaceX and xAI to rely on their own technology rather than the big players.
And given Apple and Google’s stranglehold on the smartphone industry, breaking away from the phone format would also let SpaceX’s new device escape strict app store rules.
When shown to institutional investors, SpaceX reportedly said the device was in the early stages of development and that the design could change over time. Although it’s not called a “phone,” it’s logical to assume the device could connect to SpaceX’s Starlink satellite network for connectivity.
In fact, while a physical smartphone has been denied, a branded consumer mobile service is likely. Last week, The Financial Times reported that SpaceX is actively weighing a Starlink-branded retail mobile plan, directly competing with T-Mobile, AT&T and Verizon.
Qualified immunity — crafted out of thin air by the US Supreme Court — has rarely been anything but an easy way for government employees to duck out of lawsuits before they’re actually asked to defend themselves against allegations of rights violations.
The Supreme Court has continually narrowed this doctrine, pretty much ensuring that if every single fact of an allegation doesn’t perfectly align with precedential rulings, qualified immunity will be awarded. The Supreme Court has ensured no further movement will take place by continually refusing to establish rights violations, even when it (very rarely!) disagrees with a lower court’s granting of qualified immunity.
The doctrine has been memorably pilloried more than once by appellate judges. Most famously, Judge Don Willett of the Fifth Circuit Appeals Court had this to say about the qualified immunity doctrine — something tends to reward rights violators just because they happened to find a slightly different way to violate someone’s rights.
To some observers, qualified immunity smacks of unqualified impunity, letting public officials duck consequences for bad behavior—no matter how palpably unreasonable—as long as they were the first to behave badly.
That was the wind-up. Here’s the pitch:
Advertisement
Section 1983 meets Catch-22. Plaintiffs must produce precedent even as fewer courts are producing precedent. Important constitutional questions go unanswered precisely because those questions are yet unanswered. Courts then rely on that judicial silence to conclude there’s no equivalent case on the books. No precedent = no clearly established law = no liability. An Escherian Stairwell. Heads defendants win, tails plaintiffs lose.
Justice Sotomayor’s dissent [PDF] isn’t as immediately quotable, but it still delivers a stinging indictment of the qualified immunity doctrine. The facts of the case are unpleasant, as they almost always are when government defendants start invoking qualified immunity.
Green Bay, Wisconsin jail staff responded to prisoner Antonio Smith’s refusal to submit to a wellness check (on day 46 of his hunger strike) by pepper spraying him in the face, ordering him to strip naked, and taking him to the health unit. When Smith refused the wellness check, he was dumped clothed in nothing but a small towel into an unheated, unfurnished “control cell” for the next 23 hours. The temperature in the cell ranged from “25 to 57 degrees Farenheit,” according to uncontested testimony.
When Smith was first placed in the cell around noon, Van Lanen told Smith that Smith could request a shower any time and that he would come back to discuss “‘clothing and stuff,’” but he never returned. Ibid. Three and a half hours later, Smith requested clothing, bedding, and a mattress from Lieutenant Timothy Retzlaff and asked to be moved to a warmer cell given the cold. Retzlaff said he would check with Van Lanen. Twelve additional hours went by with no word from Van Lanen or Retzlaff. Then, around 3 o’clock in the morning, a different officer told Smith that if he submitted to future wellness checks, he could have a smock, but that otherwise, “he would remain naked and cold.” Ibid. Smith declined. Another eight hours came and went without any word from Van Lanen or Retzlaff. Smith remained naked and frigid overnight as the temperature dropped below freezing to 25 degrees. After 23 hours, prison staff removed Smith from the cell. Smith later stated that he stayed on his feet for most of those 23 hours because it was too painful to sit, lie down, or sleep.
The Seventh Circuit Appeals Court actually said exactly this in its ruling granting qualified immunity to the defendants.
The Seventh Circuit held that the officers violated Smith’s Eighth Amendment right to be free from cruel and unusual punishment but nevertheless granted them qualified immunity, reasoning that the Circuit “had never held it unconstitutional on closely analogous facts to house an inmate in a cell that ranged in temperature from 25 to 57 degrees over a 23-hour period without clothes or a way to keep warm.”
Yep, that’s how fucking insane this doctrine is. The court even said this was a rights violation, but since it hadn’t said the same thing earlier about a nearly exactly matching set of circumstances, the defendants apparently had no way of knowing tossing someone naked in a freezing cell for nearly 24 hours would violate the prisoner’s rights.
Advertisement
As Sotomayor points out, the Seventh Circuit appeared to willfully disregard its own precedent when handing down this ruling.
As Judge Hamilton explained in dissent, the Seventh Circuit has itself held that intentionally subjecting prisoners to extreme cold conditions without any way to stay warm violates the Eighth Amendment. In Gillis v. Litscher(2006), for example, the Circuit held that a reasonable jury could find that prison officials violated a prisoner’s Eighth Amendment right when they deliberately left him naked in a cell blowing cool air for five days as part of an effort to “conform [his conduct] to the rules.” [S]ee Del Raine v. Williford,(1994) (officers deliberately strip-searched prisoner in cell for 15 to 30 minutes when windchill was 40 to 50 degrees below zero). The Seventh Circuit has also held that, when cold conditions are the product of heating-system failures, officers violate the Eighth Amendment if they are aware of such conditions and fail to take corrective measures such as providing an alternative way to keep warm.
That should have been enough for SCOTUS to review this one and, hopefully, send it back with a reminder that QI readings need to be narrow, but perhaps not so narrow they provoke gasps of disbelief.
But that’s not how this Supreme Court majority operates. Sotomayor calls them out for only reviewing certain QI cases. You know the ones.
This Term… the Court has exercised its discretion to summarily reverse supposed errors that were far less clear than the one here. See, e.g., McCarthy v. Hernandez, 607 U. S. _ (2026) (per curiam); Zorn v. Linton, 607 U. S.(2026) (per curiam); see also Smith v. Scott, 608 U. S. __ (2026) (summarily vacating and remanding denial of qualified-immunity in light of Zorn). If those cases were clear enough for summary action, the Court here should have readily concluded, based on precedent and basic human decency, that it is beyond debate that it is cruel and unusual to lock someone intentionally in a freezing prison cell completely naked for 23 hours.
The Court’s decision not to do so today exacerbates its asymmetrical trend of declining to intervene when courts wrongly afford officers the benefit of qualified immunity, but unflinchingly summarily reversing when it believes courts have wrongly denied officers the protection of qualified immunity.
Advertisement
This would be hypocrisy if it were being carried out by people who actually maintained a pretense of judicial fairness. But it’s being carried out by people who actively believe in the message they’re sending to the public, as well as to the administration they are so clearly devoted to pleasing.
Reversing only denials of qualified immunity sends the regrettable message that, when choosing between shielding government officials from liability and vindicating individuals’ constitutional rights, this Court will almost always choose the former.
Sotomayor is right. The message being sent is “regrettable.” Unfortunately for America, the people sending it have no regrets at all.
As enterprise AI systems scale to handle complex workflows, practitioners face the challenge of routing subtasks to the right tools and skills. Agents can have hundreds of tools and skills and get confused on which one to use for each step of a workflow.
To address this challenge, researchers at Alibaba developed SkillWeaver, a framework that creates an execution graph for a given task and chooses the right skills for each of the nodes. They also introduce Skill-Aware Decomposition (SAD), a novel technique that uses a feedback loop to enable the agent to fetch and vet relevant tool candidates iteratively. This compositional approach and feedback loop mechanism distinguishes SkillWeaver from other tool-routing frameworks that choose tools in a one-shot fashion.
SkillWeaver relates to real-world AI applications where agents autonomously orchestrate multi-tool ecosystems, such as the Model Context Protocol (MCP), to execute multi-step business operations like downloading datasets, transforming information, and creating visual reports.
In practice, the researchers’ experiments with SkillWeaver show that implementing this retrieve-and-route approach significantly increases accuracy while reducing token consumption by over 99% compared to naively exposing agents to an entire tool library.
Advertisement
For practitioners building AI agents, the main takeaway is that the granularity of task decomposition is the biggest bottleneck to accurate tool retrieval.
The challenge of skill routing
Skills are a key pattern in modern LLM agent architectures. A skill is a modular, reusable tool specification that uses structured natural language documentation.
As enterprise agents integrate with massive tool ecosystems, accurately routing user queries to the right skills becomes a difficult task. Exposing an entire library to an LLM to find the right tool is highly inefficient, quickly overwhelms context limits, and consumes hundreds of thousands of tokens.
Most current tool-use frameworks attempt to solve this through API retrieval, documentation matching, or hierarchical structures that treat routing strictly as a single-skill selection or per-step problem.
Advertisement
However, this single-skill paradigm is insufficient for enterprise environments because real-world queries are inherently compositional. A standard business request such as “Download the dataset, transform it, and create visual reports” cannot be fulfilled by one tool. It requires breaking the prompt down and sequencing an API client, a data processor, and a visualization tool into a cohesive, multi-step execution plan.
How SkillWeaver and SAD work
To tackle this, the researchers frame the problem of handling complex tasks that require multiple skills as “compositional skill routing.” Given a complex user prompt and a vast library of tools, an agent must simultaneously figure out how to break the request into a sequence of atomic sub-tasks, how to map each sub-task to the single best available skill, and how to compose those skills into an executable plan.
SkillWeaver orchestrates this process through three distinct stages: Decompose, Retrieve, and Compose. In the first stage, an LLM acts as a task decomposer, breaking the user’s complex query down into a sequence of sub-tasks that each require one skill. Once the sub-tasks are clearly defined, the system uses an embedding model to compare each subtask against the skill library to pull a shortlist of the top candidate tools for each step.
In the final stage, a planner evaluates the retrieved candidates based on how well they work together. It checks for inter-skill compatibility to ensure the outputs of one tool naturally flow into the inputs of the next. It then creates a final execution plan as a Directed Acyclic Graph (DAG) that maps out dependencies so independent tasks can potentially execute in parallel.
Advertisement
For example, consider a user asking an AI agent to “Download the dataset, transform it, and create visual reports.” In the decompose stage, the decomposer LLM breaks this into three distinct sub-tasks: downloading the dataset, transforming the data, and creating the reports.
In the retrieve stage, the system searches the library and finds candidates like “api-client” or “http-fetch” for task one, “csv-parser” or “etl-pipeline” for task two, and so on. Finally, the compose stage evaluates these options, selects the specific combination of “api-client,” “csv-parser,” and “chart-gen” that are most compatible, and wires them together into a final, ready-to-execute workflow.
A key challenge of this pipeline is that LLMs often produce generic step descriptions that fail to match the specific, technical vocabulary of the actual skills available in the library. To fix this, SkillWeaver introduces Iterative Skill-Aware Decomposition (SAD), a novel feedback loop. SAD works by having the LLM draft an initial plan, conducting a preliminary search to find loosely matching skills, and then feeding those retrieved skills back into the LLM as hints. This allows the LLM to rewrite its decomposition so the granularity and vocabulary perfectly align with the actual tools that exist.
SkillWeaver in action
To evaluate how SkillWeaver performs in realistic enterprise scenarios, the researchers created a custom benchmark called CompSkillBench. It consists of 300 multi-step queries of different difficulty levels. To mirror real-world environments, they used a library of 2,209 real-world skills sourced from the public MCP ecosystem, covering 24 functional categories like cloud infrastructure, finance, and databases.
For the core engine, the researchers primarily used a lightweight 7-billion parameter model (Qwen2.5-7B-Instruct) for task decomposition, paired with a standard semantic search retriever (MiniLM with a FAISS index) to find the tools. SkillWeaver was evaluated against three main setups: a brute-force “LLM-Direct” method where they stuffed all the tool names into the prompt of a large model, a vanilla LLM-based decomposition without SAD, and a ReAct-style agent loop.
Advertisement
The experiments indicate that task decomposition is the main bottleneck. Standard LLM behavior falls short when dealing with large tool libraries, but the SAD feedback loop dramatically moves the needle. In the vanilla setup, the 7B model achieved a decomposition accuracy (i.e., predicting the correct number of steps) only 51.0% of the time. By activating the SAD feedback loop, accuracy jumped to 67.7% (with the larger Qwen-Max model, the accuracy reached 92%). On “hard” tasks requiring four to five distinct skills, SAD improved accuracy by 50%.
In comparison to the naive approach, SkillWeaver reduces token consumption by more than 99% (source: arXiv)
One fascinating finding was that larger models can actually perform worse when unguided. When tested in the vanilla setup, a larger 14-billion parameter model saw its accuracy plummet below the 7B model’s accuracy because it tended to over-decompose tasks into microscopic, unnecessary steps. Once SAD was introduced, the retrieved tool hints anchored the model back to reality and increased its accuracy. This suggests that aligning an agent with the vocabulary of specific tools is often more impactful than paying for a larger, more expensive LLM.
Another important takeaway is token savings. The LLM-Direct baseline, which used the very large Qwen-Max model, showed that feeding all tools into the prompt of a large model fails. Despite near-perfect task breakdown capabilities, the massive model only retrieved the right tool category 21.1% of the time when flooded with tool options. SkillWeaver’s targeted retrieve-and-route approach vastly outperformed this in accuracy while slashing context window consumption from an estimated 884,000 tokens down to roughly 1,160 tokens per query, a 99.9% reduction. For practitioners, this translates directly to drastically lower API costs and faster response times.
Advertisement
Finally, the traditional ReAct baseline completely failed, achieving 0% decomposition accuracy. Its loop naturally collapses multi-step plans into isolated actions rather than explicitly mapping out a cohesive, multi-tool sequence.
Considerations for developers
While the researchers have not yet released the source code for SkillWeaver, their work was built on off-the-shelf tools that can easily be reproduced.
Skill-Aware Decomposition (SAD), which is the key innovation at the heart of the framework, is a clever prompt-engineering and retrieval loop. The authors have shared the prompt templates in their paper, and developers can implement it themselves quite easily using standard orchestration libraries like LangChain, LlamaIndex, or even raw Python scripts.
As for the retrieval component, the authors built the core framework using all-MiniLM-L6-v2, an open-source embedding model. They found that swapping in a slightly stronger off-the-shelf encoder (BGE-base-en-v1.5) immediately boosted accuracy without any fine-tuning. While an off-the-shelf bi-encoder is great at getting a relevant tool into the top 10 candidates nearly 70% of the time, it struggles to consistently rank the perfect tool at exactly number one, achieving that only about 37% of the time. To bridge this gap, teams will likely need to implement a secondary cross-encoder or LLM-based reranker to re-order those top 10 candidates.
Advertisement
One upfront preparation requirement is vectorizing the tool library and building a FAISS index in advance. In practice, this is a negligible hurdle. Embedding and indexing all 2,209 skills in the benchmark took a mere 15 seconds. Once built, retrieving tools from the index adds less than 15 milliseconds of latency per query. For enterprise environments, syncing the tool index is a trivial background job.
A current limitation in SkillWeaver is the lack of error recovery. While SkillWeaver successfully maps out a compatible DAG for execution, the authors’ pilot study revealed the challenges of multi-step tool chains. For example, if an API call fails in step two, the entire chain breaks. The paper’s core contribution is limited to the routing and planning phase. For a true production deployment, practitioners must build their own error recovery, fallback, and retry mechanisms on top of the compose stage to handle real-world API timeouts or malformed outputs.
Six new F-35Bs entered service carrying ballast inside their noses instead of radar
The APG-85 delay created stealth fighters without their primary sensor
Lot 17 redesign decisions eliminated compatibility with older radar hardware
The United States Marine Corps has accepted delivery of six newly built F-35B stealth fighters carrying ballast weights where a radar should be installed.
The aircraft left production lines without the AN/APG-85 radar that future F-35 variants are expected to rely upon for combat operations.
Instead of delaying delivery, officials accepted aircraft configured with dead weight occupying the nose section reserved for the missing equipment.
Latest Videos From
A new radar arrives before the aircraft can actually use it
The unusual situation emerged because Lot 17 aircraft were redesigned around the forthcoming AN/APG-85 radar architecture and mounting structure.
Advertisement
Those modifications prevent installation of the older AN/APG-81 radar, leaving no interim option while the replacement remains unavailable.
The radar is to be supplied by Northrop Grumman rather than prime contractor Lockheed Martin, further complicating delivery schedules.
Marine Corps Lieutenant General Gregory Masiello informed lawmakers on June 23 2026 that only six Marine aircraft currently lack installed radars.
Advertisement
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
Acceptance testing for those aircraft began in February 2026 after successfully completing production earlier in the year.
Although the Air Force and Navy have not yet received comparable aircraft, similar deliveries are reportedly expected later this year.
Without radars, the aircraft can support basic flight familiarisation and pilot instruction activities while remaining unsuitable for combat operations.
Advertisement
The Joint Program Office defended the decision, stating the Pentagon “deliberately undertook a highly concurrent development and production program for advanced capabilities.”
“This decision was made with full understanding of the risk of having production aircraft ready ahead of the capabilities.”
Advertisement
Delay exposes larger questions around F-35 readiness
The missing radar arrives amid wider concerns surrounding operational readiness across the broader F-35 fleet during fiscal year 2025.
Aircraft capable of performing at least one assigned mission reached 44.1%, although this remained considerably below historical expectations.
Masiello said he would not “dispute their numbers or how they do it” during congressional testimony discussing the findings.
Advertisement
Using Joint Program Office calculations, he argued the mission capable figure stood closer to 56% across operational fleets instead.
The delayed APG-85 radar forms part of the wider Block 4 modernization package that continues encountering schedule and integration challenges.
Future F-35 models will require more cooling capacity, as new systems will draw between 62 and 80 kilowatts, more than double the 32 kilowatts current hardware consumes.
A next-generation engine that could have addressed this cooling gap was developed but ultimately defunded after proving too costly.
Advertisement
Current plans indicate that the APG-85 will enter service around 2028 but meaningful cooling will not arrive until after 2031.
The sight of stealth fighters carrying lead ballast instead of radars may therefore become an enduring symbol of contemporary defence procurement realities.
The Godot Foundation will stop accepting AI-authored code, agent-submitted pull requests, and AI-generated text in contributor communications after maintainers were overwhelmed by low-effort submissions. “It is time for us to recognize that these problems aren’t going away and therefore we need to take steps to reduce the burden on maintainers while ensuring we still have a pipeline to mentor new contributors to become future maintainers,” the Godot Foundation said in a blog post. Contributors may still use AI for limited “menial things” if they disclose it, but humans must understand, own, and be able to fix the code they submit. PC Gamer reports: The Foundation says the pileup of Godot pull requests pending review isn’t all bad: It’s a sign that interest in using and contribution to Godot is increasing. But the influx of contributions authored or submitted by AI is sapping the projects’ maintainers of their willingness to confront the “already tedious” work of reviewing pull requests. “If your feedback on PRs is just being absorbed by a machine and not going towards mentoring a potential future maintainer, it becomes much harder to justify spending your free time on PR review,” the Foundation said.
As the problem becomes increasingly unsustainable, the Godot Foundation says it’s in the process of updating its contribution policies, focusing on “adding barriers to low-effort slop” contributions, encouraging maintainers to review code, developing new contributors into future maintainers, and crucially, requiring that all contributions come from humans who are accountable for their code — and fixing it if it fails. “AI cannot take responsibility, and we can’t trust heavy users of AI to understand their code enough to fix it,” the Foundation said.
The Foundation says we can expect Godot’s contributing policy to soon include explicit rejections of AI-authored code, noting that contributors should only use AI assistance for “menial things” and must disclose its use. Additionally, the Foundation will reject any AI-generated text in human-to-human communications, saying it’s “a basic principle of respect” — though it says machine translations “are still acceptable” if the original text was human-authored. “Things change every day with respect to the current suite of AI tools available,” the Foundation said. “We will continue taking a conservative approach in our policies towards them, but we will re-evaluate as things evolve.”
Sysdig says it has documented the first ransomware attack carried out end to end by an AI agent, which autonomously exploited exposed systems, stole credentials, established persistence, compromised a production database, and destroyed data. The research team named the attacker “JadePuffer” and said it gained initial access to an internet-facing Langflow instance by exploiting CVE-2025-3248. “The most striking characteristic, however, was the LLM’s behavior,” Sysdig director of threat research Michael Clark said in a blog post. An anonymous reader quotes an excerpt from The Register: JadePuffer’s “self-narrating” payloads “contained natural language reasoning, target prioritization, and the kind of detailed annotations that human operators don’t often write but LLM-generated code produces reflexively,” Clark added. “The operation also adapted in real time, retrying failed steps within refined parameters. In one sequence, it went from a failed login to a working fix in 31 seconds.” After exploiting CVE-2025-3248, a missing authentication vulnerability in Langflow that allows remote, unauthenticated attackers to execute arbitrary Python on the host, the AI agent began scanning for and collecting secrets, including LLM provider API keys, cloud credentials “with explicit coverage of Chinese providers” including Alibaba, Aliyun, Tencent, and Huawei, while also scanning for AWS, Azure and Google Cloud Platform, cryptocurrency wallets, and database credentials.
The AI also installed a crontab entry on the Langflow server to maintain persistence and call back to the attacker’s infrastructure every 30 minutes. JadePuffer’s intended target was a separate internet-exposed production server running a MySQL database and an Alibaba Nacos configuration service, we’re told. Nacos is an open-source service-discovery and dynamic configuration platform developed by Alibaba and used in the cloud provider’s microservices applications. The agent connected to the server’s exposed MySQL port using root credentials, although Sysdig doesn’t know how the attacker obtained them. These credentials weren’t stolen from the victim’s environment.
JadePuffer then attacked Nacos via multiple vectors including an authorization bypass flaw (CVE-2021-29441) and forging a valid JSON web token (JWT) using Nacos’s default signing key. Additionally, using its root database access, the LLM injected a backdoor administrator into the Nacos backing database. It ultimately encrypted all 1,342 Nacos service configuration items using MySQL’s built-in AES encryption function, and created an extortion demand, ransom note, Bitcoin payment address, and a Proton Mail contact […]. However, according to the threat hunters, the victim can’t recover the encrypted data, even if they paid the ransom demand, because the agent escalated “from row-level deletion to dropping entire database schemas, narrating its own targeting rationale,” without backing up any of the encrypted data.
You must be logged in to post a comment Login