Connect with us
DAPA Banner
DAPA Coin
DAPA
COIN PAYMENT ASSET
PRIVACY · BLOCKDAG · HOMOMORPHIC ENCRYPTION · RUST
ElGamal Encrypted MINE DAPA
🚫 GENESIS SOLD OUT
DAPAPAY COMING

Tech

Webb Telescope Reveals Stars Coming to Life Within FS Tau in a Scene Perfect for July Fourth

Published

on

Webb FS Tau Star System
Right before crowds across the country prepare to mark the Fourth of July with displays of light and sound, NASA’s James Webb Space Telescope offered a view from far beyond Earth that carries a similar sense of energy and new activity. Two protostars in the FS Tau system sit near the center of the frame. Both remain young enough that they still draw in gas and dust while pushing excess material outward through strong flows.


Webb FS Tau Star System
One of the two shoots off large orange streams that fan out and become entangled in the surrounding cloud. These streams compress the gas and dust, resulting in ridges visible in lighter blue where the material has been pushed together like a big cosmic bulldozer. Looking at the stars in near infrared light (thanks to Webb) provides insight into what is truly going on. The activity was invisible in visible light because of the dust, but infrared allows us to discern the form of the flows and textures in the cloud surrounding the primary stars in much greater detail.


LEGO Icons NASA Artemis Space Launch System – DIY Rocket Model Building Set for Adults, Ages 18+ – Gifts…
  • NASA rocket model kit – Launch into a creative project with the LEGO Icons NASA Artemis Space Launch System model building project for adult space…
  • What’s in the box? – This creative building set includes everything you need to craft a multistage rocket with 2 solid-fuel boosters, an Orion…
  • Features and Functions – This NASA-themed rocket model features retractable launch tower umbilicals, rocket support and crew bridge, detachable…

In the background, you can see faraway galaxies of various colors. Some are jumbled and appear redder because there is a lot of dust in their path. Others have a clear path to the camera and sparkle in yellow or white tones. They appear to be dispersed around the area, with brilliant spots sporadically appearing.

Webb FS Tau Star System
The decision to release the image on July 2 was wise, as it coincides with a holiday in a year when the country is commemorating a significant milestone of its founding. Many people will be staring up at the sky soon, watching all of the amazing fireworks and bursts, as it is a perfect moment to reflect on the history of the area. Systems like this one are extremely beneficial for researchers because they allow them to observe how lower mass stars originate in a non-overwhelming manner. Everything stays clean and clear, allowing you to observe what’s going on and track changes over time.

Webb FS Tau Star System
The obvious gaps in the orange streams are quite telling, as it appears that the material is dragged in and then blown out in stages rather than all at once. Webb continues to return to places like FS Tau because each encounter adds another brick to the understanding wall that transforms simple clouds into stars and planets.

Advertisement


Author
Jackson Chung

A technology, gadget and video game enthusiast that loves covering the latest industry news. Favorite trade show? Mobile World Congress in Barcelona.

Advertisement

Source link

Advertisement
Continue Reading
Click to comment

You must be logged in to post a comment Login

Leave a Reply

Tech

Meta Has Released An App For Making Generative AI Games

Published

on

Vibe-coding right in your Pocket.

Meta appears to have soft-launched a new app called Pocket that’s aimed at getting people to vibe-code their own minigames. Mobile developer and reverse engineer Alessandro Paluzzi spotted Pocket and posted about it to X today, but reporting platform AppFigures told TechCrunch that the app has been available on both iOS and Android since June 29. Though the app is listed publicly, it’s not available in the US on any of the half dozen phone models associated with our Google accounts, and a help page on Meta’s site says “the Pocket app is not yet available everywhere.” 

The company has not made any public announcement yet about the launch or where the app is being trialed. We’ve reached out for comment and will update this post if we receive a response.

From cosmetic tweaks to a standalone app for AI slop, Meta has been going gangbusters on getting artificial intelligence into its services in the past year. TechCrunch suggested that Pocket may be the result of the company wholesale hiring the team behind of Gizmo, an app that used AI to create interactive experiences based on prompts from users, earlier this year. Pocket uses that exact same nomenclature, dubbing itself “a creative platform for making and sharing gizmos” in the app listing, and the Play Store shortcode of “com.facebook.gizmo” does little to dispel the notion either.

Advertisement

Source link

Continue Reading

Tech

Claude Fable relaunch disappoints users with nerfed performance

Published

on

Claude

Claude Fable, the company’s most powerful model, is now available to all users, but early impressions are disappointing, as it appears to be nowhere near the original release.

When the Department of Commerce announced that it was lifting the ban on Claude Fable, I was holding my breath and counting seconds for the model to show up on Claude Code. I had also loaded up my usage-based credit wallet, just in case the model debuted as strictly usage-based.

To our surprise, Claude Fable shipped for everyone, including those with a $100 Max subscription, but there are multiple restrictions.

image

According to Anthropic, while Fable 5 is included in Max, Pro, and Team plans, it is heavily capped.

For example, you can use Fable for up to 50% of your weekly usage limits, which is not significant for such a powerful model. But it’ll get worse after July 7, as the model will transition entirely to a pay-to-play system via usage credits.

Advertisement

However, the real gut punch is the degraded performance, or as famously used in the AI community, the “nerfed” performance.

On Reddit, users are reporting that the restored Fable 5 feels weaker, or is simply being routed through stricter safety systems more often than before.

“The new guardrails are kicking in on way too many tasks and falling back to Opus 4.8,” one user wrote in a Reddit post. “This is not the model that got banned.”

The problem is not just limited to Claude desktop, as Claude Code is also struggling with similar issues.

Advertisement

One user said Fable “didn’t even let me search for dead code without switching to Opus,” while another said it was “very very obvious” when the fallback triggers because Claude tells the user and visibly shifts to Opus.

Another developer claimed the model was unusable for some systems-level coding work, saying that C, C++, Rust, Win32 API references, memory-related work, and files mentioning words like “security,” “vulnerable,” “unsafe,” or “hook” appeared to trigger a fallback or block.

Fable 5 may still be powerful when it actually handles the task, but the restored version appears to be far more sensitive to prompts, project files, and security-adjacent language.

However, BleepingComputer understands that the model itself has not been nerfed. Instead, it is likely that Anthropic is being extra careful with the safety guardrails, which is negatively affecting Fable’s daily use cases.

Advertisement

In fact, we observed that Fable is sometimes routed to Opus 4.8 even when the task does not appear to be a safety risk.

Anthropic has said that its updated safeguards rely on a large “safety margin,” which could explain the subpar experience some users are seeing with Fable.

Anthropic hasn’t acknowledged the reports of false positives yet, but it’s likely the company is aware of the problem and will address it in a future update.


article image

Security teams log 54% of successful attacks and alert on just 14%. The rest move through your environment unseen.

The Picus whitepaper shows how breach and attack simulation tests your SIEM and EDR rules so threats stop slipping by detection.

Advertisement

Get the whitepaper

Source link

Continue Reading

Tech

Thin-Skinned Palantir Loses Its Bid To Bully A Swiss Magazine Into Publishing Its Rebuttals To Embarrassing Reporting

Published

on

from the swiss-slapp-suits dept

Earlier this year we wrote about the ridiculous thin-skinned executives at Palantir suing a small independent Swiss online magazine, Republik, that had reported on the great lengths the company had gone to, trying to get the Swiss government to purchase Palantir’s surveillance technology. Palantir knew they couldn’t sue for defamation because, you know, everything Republik reported was true. Instead, they sued, trying to invoke a Swiss “right of reply” law, claiming that because Republik refused to publish the press release Palantir wanted to run in response to the reporting, the magazine had violated the law.

As we said at the time, this is the height of entitlement. Palantir doesn’t get to tell Republik how and what it must publish.

And, thankfully, a court has agreed. Zurich’s commercial court rejected 22 of 23 claims that Palantir made.

The data analytics company lost on 22 out of 23 counts of the suit. In a ruling on Friday, Zurich’s commercial court dismissed the majority of counterstatement requests filed by the company and its Swiss subsidiary finding that only a single passage in one article warranted a published response from the company.

While the court agrees that there is a “right of reply” law in Switzerland, it has limitations:

Advertisement

While Swiss media law allows the subjects of a story to request a right of reply, this has caveats: the right of reply has to be concise and stick to the facts of the story.

The one count that stuck: the court found that a single passage in just one article warranted a limited published reply from Palantir.

Also, the court told Palantir to pay Republik for its legal expenses wasted on this SLAPP suit:

The court on Friday ordered Palantir to bear 95% of the 9,000 Swiss francs ($11,300; £8,400) court costs and to pay Republik 9,900 francs in legal expenses.

Of course, this case was always less about the ‘right of reply’ than about making it clear to anyone who reports critically on Palantir that the company will go to war with them, seeking any legal theory, no matter how ridiculous, to tie them up in court — the textbook logic of a SLAPP suit. Republik has said that defending the case cost the small organization quite a lot in time and resources:

Balz Oertli, a journalist with WAV research collective, said: “We invested a great deal of effort into this case, and we are very pleased with the outcome.”

Anyway, given that Palantir seems really upset about Republik’s reporting, it sure would be a shame if you decided to go read this critical reporting of Palantir’s relentless attempts to win business from the Swiss government.

Advertisement

Filed Under: chilling effects, free speech, right of reply, switzerland

Companies: palantir, republik

Source link

Advertisement
Continue Reading

Tech

Bubbles, Belts, And Bulbs: How The Scantron Works

Published

on

Many of us remember back in our school days taking tests and filling out answers on a Scantron sheet, those long rows of A, B, C, D, and E that had to be filled in with a #2 pencil. Ever wonder why it needed a #2 pencil, or what the point of using a Scantron was at all? That question is answered in the latest video from [SimonRetro], where he takes a look at the Scantron and how it works.

One of the more interesting things about the Scantron is that it’s such a standalone device. No software needed, no keypad to mess with just two rocker switches. The on/off switch is also the way you tell it to forget the last answer sheet and allow you to program in a new test. Upon booting, you feed in a Scantron sheet with some specific boxes filled in, and then it’s programmed and ready to take in and grade all the students’ answers. Opening up the Scantron reveals it’s pretty interesting inside: one control board with early-’90s-era chips. There’s also a lightbulb (no LEDs) shining through the six reading sections of the card, as well as an arrangement of belts and motors to move the card through the machine. The printer is a seven-pin printer used in conjunction with a pair of ink rollers to print out the results on the cards.

[SimonRetro] also went ahead and tried different ways to mark the sheets including pens, Sharpies, colored pencils, and different thicknesses of pencils besides the #2 to see which would and wouldn’t work in the Scantron. Thanks [SimonRetro] for exploring this machine from many of our childhoods and sharing its inner workings. Be sure to check out some of our other reverse engineering articles that explore how classic devices work.

Advertisement

Source link

Advertisement
Continue Reading

Tech

GMKtec turned its AI mini PC into a tower and nearly quadrupled the price along the way

Published

on


  • GMKtec EVO-X3 abandoned flat mini PC designs for a vertical tower layout
  • The Ryzen AI Max+ 395 survives despite newer silicon already existing
  • Triple fan cooling replaces the thermal approach used by the EVO-X2

GMKtec has detailed the EVO-X3, an AI mini PC workstation built around AMD‘s Ryzen AI Max+ 395 ‘Strix Halo’ processor.

The company is retaining the same silicon used in its predecessor, the EVO-X2, which AMD CEO Lisa Su personally signed as a mark of approval.

Source link

Advertisement
Continue Reading

Tech

Newly discovered PamStealer isn’t your typical macOS malware

Published

on

Researchers have found a never-before-seen piece of macOS malware that combines a series of clever tradecraft to infect Macs with stealthy, custom-developed credential-stealing code.

The malware is delivered in two stages. The first is distributed in a disk image that masquerades as Maccy, a clipboard manager for Macs. It’s compiled as AppleScript that is notable for the way it delivers the second stage. The malware is named PamStealer because the Rust-written infostealer uses the Pluggable Authentication Modules interface built into macOS to validate the target’s login password before sending it to an attacker-controlled server.

A quieter execution chain

The use of both disk image and AppleScript is common in malware for Macs. More unusual is the way PamStealer combines them to gain stealth. When the AppleScript is double-clicked, it’s opened in the macOS Script Editor, where the malicious functionality is buried deep within the file.

“Rather than relying on shell commands such as curl or zsh, the AppleScript executes a self-contained JavaScript for Automation (JXA) downloader that retrieves and stages the payload using native Objective-C APIs,” researchers from Jamf, a security firm for macOS users, wrote. “Combined with a Rust-based second stage and a password capture workflow that validates credentials locally through PAM, the result is a quieter execution chain than we typically observe in commodity macOS stealers.”

Advertisement

When a user, expecting to install a trustworthy clipboard manager, encounters the disk image, they’re prompted to press Command-R immediately after double-clicking it. This command executes malicious code inside the AppleScript directly. It also allows the execution to bypass com.apple.quarantine, a macOS attribute that provides warnings and restrictions when executable files have been downloaded from the Internet.

As Jamf explained:

PamStealer combines a recently emerging delivery surface with a less familiar payload. While the clickable .scpt and Script Editor lure build on tradecraft that is already gaining adoption across the macOS threat landscape, the malware distinguishes itself through a self-contained JXA dropper, a Rust-based second stage, and a password capture workflow that validates credentials locally through PAM before harvesting them. That second stage puts considerable effort into staying hidden, masquerading as Finder, encrypting its command-and-control traffic, and holding back prompts like the Full Disk Access request for as long as forty minutes so its activity does not line up with launch. Together, these behaviors illustrate how commodity macOS stealers continue to evolve, adopting quieter execution chains and native implementations that reduce traditional detection opportunities while remaining compatible with standard macOS features.

The first stage puts its payload inside an app bundle that impersonates real components built into macOS. The component changes from sample to sample of the malware. Finder.app under com.apple.finder.core or com.apple.finder.monitor, and a Software Update.app under com.apple.security.daemon, are two examples. In either case, they run hidden. They also display macOS’s genuine Finder.icns as its icon.

Advertisement

Source link

Continue Reading

Tech

SpaceX Secretly Unveiled New AI Device to Investors. Is It a Phone or Not?

Published

on

The idea of an AI-powered device that’s not a smartphone is weird, but not unheard of. According to a report from The Wall Street Journal on Wednesday, SpaceX has already shown investors an early prototype of one. 

The report says that Elon Musk’s SpaceX — which includes the social media platform X and the artificial intelligence startup xAI — has developed a handset-like device that’s sleeker and slimmer than an iPhone and runs a proprietary operating system that integrates xAI’s own technologies. The device reportedly runs on a Qualcomm Snapdragon chip, a common feature in many Android phones today. 

On Thursday, Musk publicly denied the existence of such a device, calling the claims “utterly false” in a post on X. 

Advertisement

In February, Musk publicly stated that a phone was not being developed. Earlier, during an event last October, Musk said, “the idea of making a phone makes me want to die,” while adding, “if we have to make a phone, we will.” However, there’s enough rumored evidence to believe that such a device may exist, even if Musk refuses to call it a phone.

SpaceX began being publicly traded earlier this month. Whether we see a device with its branding remains to be seen, but it wouldn’t be too much of a surprise. 

SpaceX did not immediately respond to a request for comment. 

Advertisement
AI Atlas

A phone by any other name? 

Artificial intelligence is already everywhere on our smartphones, but tech companies are racing to build entirely new AI gadgets. OpenAI and Jony Ive are said to be working on a screenless AI device that might be worn on your ear as an always-on assistant. 

In a world saturated with “smart” and AI technologies, creating a new device running a different operating system would free Musk from the potential restrictions imposed by Apple and Google’s ecosystems. It could allow SpaceX and xAI to rely on their own technology rather than the big players. 

And given Apple and Google’s stranglehold on the smartphone industry, breaking away from the phone format would also let SpaceX’s new device escape strict app store rules. 

When shown to institutional investors, SpaceX reportedly said the device was in the early stages of development and that the design could change over time. Although it’s not called a “phone,” it’s logical to assume the device could connect to SpaceX’s Starlink satellite network for connectivity. 

In fact, while a physical smartphone has been denied, a branded consumer mobile service is likely. Last week, The Financial Times reported that SpaceX is actively weighing a Starlink-branded retail mobile plan, directly competing with T-Mobile, AT&T and Verizon.

Advertisement

Source link

Continue Reading

Tech

Sotomayor Trashes SCOTUS Majority For Cherry-Picking Qualified Immunity Cases To Reverse

Published

on

from the get-out-of-lawsuit-free-card dept

Qualified immunity — crafted out of thin air by the US Supreme Court — has rarely been anything but an easy way for government employees to duck out of lawsuits before they’re actually asked to defend themselves against allegations of rights violations.

The Supreme Court has continually narrowed this doctrine, pretty much ensuring that if every single fact of an allegation doesn’t perfectly align with precedential rulings, qualified immunity will be awarded. The Supreme Court has ensured no further movement will take place by continually refusing to establish rights violations, even when it (very rarely!) disagrees with a lower court’s granting of qualified immunity.

The doctrine has been memorably pilloried more than once by appellate judges. Most famously, Judge Don Willett of the Fifth Circuit Appeals Court had this to say about the qualified immunity doctrine — something tends to reward rights violators just because they happened to find a slightly different way to violate someone’s rights.

To some observers, qualified immunity smacks of unqualified impunity, letting public officials duck consequences for bad behavior—no matter how palpably unreasonable—as long as they were the first to behave badly. 

That was the wind-up. Here’s the pitch:

Advertisement

Section 1983 meets Catch-22. Plaintiffs must produce precedent even as fewer courts are producing precedent. Important constitutional questions go unanswered precisely because those questions are yet unanswered. Courts then rely on that judicial silence to conclude there’s no equivalent case on the books. No precedent = no clearly established law = no liability. An Escherian Stairwell. Heads defendants win, tails plaintiffs lose.

Justice Sotomayor’s dissent [PDF] isn’t as immediately quotable, but it still delivers a stinging indictment of the qualified immunity doctrine. The facts of the case are unpleasant, as they almost always are when government defendants start invoking qualified immunity.

Green Bay, Wisconsin jail staff responded to prisoner Antonio Smith’s refusal to submit to a wellness check (on day 46 of his hunger strike) by pepper spraying him in the face, ordering him to strip naked, and taking him to the health unit. When Smith refused the wellness check, he was dumped clothed in nothing but a small towel into an unheated, unfurnished “control cell” for the next 23 hours. The temperature in the cell ranged from “25 to 57 degrees Farenheit,” according to uncontested testimony.

When Smith was first placed in the cell around noon, Van Lanen told Smith that Smith could request a shower any time and that he would come back to discuss “‘clothing and stuff,’” but he never returned. Ibid. Three and a half hours later, Smith requested clothing, bedding, and a mattress from Lieutenant Timothy Retzlaff and asked to be moved to a warmer cell given the cold. Retzlaff said he would check with Van Lanen. Twelve additional hours went by with no word from Van Lanen or Retzlaff. Then, around 3 o’clock in the morning, a different officer told Smith that if he submitted to future wellness checks, he could have a smock, but that otherwise, “he would remain naked and cold.” Ibid. Smith declined. Another eight hours came and went without any word from Van Lanen or Retzlaff. Smith remained naked and frigid overnight as the temperature dropped below freezing to 25 degrees. After 23 hours, prison staff removed Smith from the cell. Smith later stated that he stayed on his feet for most of those 23 hours because it was too painful to sit, lie down, or sleep.

The Seventh Circuit Appeals Court actually said exactly this in its ruling granting qualified immunity to the defendants.

The Seventh Circuit held that the officers violated Smith’s Eighth Amendment right to be free from cruel and unusual punishment but nevertheless granted them qualified immunity, reasoning that the Circuit “had never held it unconstitutional on closely analogous facts to house an inmate in a cell that ranged in temperature from 25 to 57 degrees over a 23-hour period without clothes or a way to keep warm.”

Yep, that’s how fucking insane this doctrine is. The court even said this was a rights violation, but since it hadn’t said the same thing earlier about a nearly exactly matching set of circumstances, the defendants apparently had no way of knowing tossing someone naked in a freezing cell for nearly 24 hours would violate the prisoner’s rights.

Advertisement

As Sotomayor points out, the Seventh Circuit appeared to willfully disregard its own precedent when handing down this ruling.

As Judge Hamilton explained in dissent, the Seventh Circuit has itself held that intentionally subjecting prisoners to extreme cold conditions without any way to stay warm violates the Eighth Amendment. In Gillis v. Litscher (2006), for example, the Circuit held that a reasonable jury could find that prison officials violated a prisoner’s Eighth Amendment right when they deliberately left him naked in a cell blowing cool air for five days as part of an effort to “conform [his conduct] to the rules.” [S]ee Del Raine v. Williford,(1994) (officers deliberately strip-searched prisoner in cell for 15 to 30 minutes when windchill was 40 to 50 degrees below zero). The Seventh Circuit has also held that, when cold conditions are the product of heating-system failures, officers violate the Eighth Amendment if they are aware of such conditions and fail to take corrective measures such as providing an alternative way to keep warm.

That should have been enough for SCOTUS to review this one and, hopefully, send it back with a reminder that QI readings need to be narrow, but perhaps not so narrow they provoke gasps of disbelief.

But that’s not how this Supreme Court majority operates. Sotomayor calls them out for only reviewing certain QI cases. You know the ones.

This Term… the Court has exercised its discretion to summarily reverse supposed errors that were far less clear than the one here. See, e.g., McCarthy v. Hernandez, 607 U. S. _ (2026) (per curiam); Zorn v. Linton, 607 U. S. (2026) (per curiam); see also Smith v. Scott, 608 U. S. __ (2026) (summarily vacating and remanding denial of qualified-immunity in light of Zorn). If those cases were clear enough for summary action, the Court here should have readily concluded, based on precedent and basic human decency, that it is beyond debate that it is cruel and unusual to lock someone intentionally in a freezing prison cell completely naked for 23 hours.

The Court’s decision not to do so today exacerbates its asymmetrical trend of declining to intervene when courts wrongly afford officers the benefit of qualified immunity, but unflinchingly summarily reversing when it believes courts have wrongly denied officers the protection of qualified immunity.

Advertisement

This would be hypocrisy if it were being carried out by people who actually maintained a pretense of judicial fairness. But it’s being carried out by people who actively believe in the message they’re sending to the public, as well as to the administration they are so clearly devoted to pleasing.

Reversing only denials of qualified immunity sends the regrettable message that, when choosing between shielding government officials from liability and vindicating individuals’ constitutional rights, this Court will almost always choose the former.

Sotomayor is right. The message being sent is “regrettable.” Unfortunately for America, the people sending it have no regrets at all.

Filed Under: 7th circuit, 8th amendment, police misconduct, qualified immunity, rights violations, sonia sotomayor, supreme court

Advertisement

Source link

Continue Reading

Tech

New Alibaba AI framework skips loading every tool, cutting agent token use 99%

Published

on

As enterprise AI systems scale to handle complex workflows, practitioners face the challenge of routing subtasks to the right tools and skills. Agents can have hundreds of tools and skills and get confused on which one to use for each step of a workflow.

To address this challenge, researchers at Alibaba developed SkillWeaver, a framework that creates an execution graph for a given task and chooses the right skills for each of the nodes. They also introduce Skill-Aware Decomposition (SAD), a novel technique that uses a feedback loop to enable the agent to fetch and vet relevant tool candidates iteratively. This compositional approach and feedback loop mechanism distinguishes SkillWeaver from other tool-routing frameworks that choose tools in a one-shot fashion. 

SkillWeaver relates to real-world AI applications where agents autonomously orchestrate multi-tool ecosystems, such as the Model Context Protocol (MCP), to execute multi-step business operations like downloading datasets, transforming information, and creating visual reports. 

In practice, the researchers’ experiments with SkillWeaver show that implementing this retrieve-and-route approach significantly increases accuracy while reducing token consumption by over 99% compared to naively exposing agents to an entire tool library.

Advertisement

For practitioners building AI agents, the main takeaway is that the granularity of task decomposition is the biggest bottleneck to accurate tool retrieval. 

The challenge of skill routing

Skills are a key pattern in modern LLM agent architectures. A skill is a modular, reusable tool specification that uses structured natural language documentation. 

As enterprise agents integrate with massive tool ecosystems, accurately routing user queries to the right skills becomes a difficult task. Exposing an entire library to an LLM to find the right tool is highly inefficient, quickly overwhelms context limits, and consumes hundreds of thousands of tokens.

Most current tool-use frameworks attempt to solve this through API retrieval, documentation matching, or hierarchical structures that treat routing strictly as a single-skill selection or per-step problem. 

Advertisement

However, this single-skill paradigm is insufficient for enterprise environments because real-world queries are inherently compositional. A standard business request such as “Download the dataset, transform it, and create visual reports” cannot be fulfilled by one tool. It requires breaking the prompt down and sequencing an API client, a data processor, and a visualization tool into a cohesive, multi-step execution plan.

How SkillWeaver and SAD work

To tackle this, the researchers frame the problem of handling complex tasks that require multiple skills as “compositional skill routing.” Given a complex user prompt and a vast library of tools, an agent must simultaneously figure out how to break the request into a sequence of atomic sub-tasks, how to map each sub-task to the single best available skill, and how to compose those skills into an executable plan.

SkillWeaver orchestrates this process through three distinct stages: Decompose, Retrieve, and Compose. In the first stage, an LLM acts as a task decomposer, breaking the user’s complex query down into a sequence of sub-tasks that each require one skill. Once the sub-tasks are clearly defined, the system uses an embedding model to compare each subtask against the skill library to pull a shortlist of the top candidate tools for each step. 

In the final stage, a planner evaluates the retrieved candidates based on how well they work together. It checks for inter-skill compatibility to ensure the outputs of one tool naturally flow into the inputs of the next. It then creates a final execution plan as a Directed Acyclic Graph (DAG) that maps out dependencies so independent tasks can potentially execute in parallel.

Advertisement
SkillWeaver

For example, consider a user asking an AI agent to “Download the dataset, transform it, and create visual reports.” In the decompose stage, the decomposer LLM breaks this into three distinct sub-tasks: downloading the dataset, transforming the data, and creating the reports. 

In the retrieve stage, the system searches the library and finds candidates like “api-client” or “http-fetch” for task one, “csv-parser” or “etl-pipeline” for task two, and so on. Finally, the compose stage evaluates these options, selects the specific combination of “api-client,” “csv-parser,” and “chart-gen” that are most compatible, and wires them together into a final, ready-to-execute workflow.

A key challenge of this pipeline is that LLMs often produce generic step descriptions that fail to match the specific, technical vocabulary of the actual skills available in the library. To fix this, SkillWeaver introduces Iterative Skill-Aware Decomposition (SAD), a novel feedback loop. SAD works by having the LLM draft an initial plan, conducting a preliminary search to find loosely matching skills, and then feeding those retrieved skills back into the LLM as hints. This allows the LLM to rewrite its decomposition so the granularity and vocabulary perfectly align with the actual tools that exist.

SkillWeaver in action

To evaluate how SkillWeaver performs in realistic enterprise scenarios, the researchers created a custom benchmark called CompSkillBench. It consists of 300 multi-step queries of different difficulty levels. To mirror real-world environments, they used a library of 2,209 real-world skills sourced from the public MCP ecosystem, covering 24 functional categories like cloud infrastructure, finance, and databases. 

For the core engine, the researchers primarily used a lightweight 7-billion parameter model (Qwen2.5-7B-Instruct) for task decomposition, paired with a standard semantic search retriever (MiniLM with a FAISS index) to find the tools. SkillWeaver was evaluated against three main setups: a brute-force “LLM-Direct” method where they stuffed all the tool names into the prompt of a large model, a vanilla LLM-based decomposition without SAD, and a ReAct-style agent loop.

Advertisement

The experiments indicate that task decomposition is the main bottleneck. Standard LLM behavior falls short when dealing with large tool libraries, but the SAD feedback loop dramatically moves the needle. In the vanilla setup, the 7B model achieved a decomposition accuracy (i.e., predicting the correct number of steps) only 51.0% of the time. By activating the SAD feedback loop, accuracy jumped to 67.7% (with the larger Qwen-Max model, the accuracy reached 92%). On “hard” tasks requiring four to five distinct skills, SAD improved accuracy by 50%.

SkillWeaver results

In comparison to the naive approach, SkillWeaver reduces token consumption by more than 99% (source: arXiv)

One fascinating finding was that larger models can actually perform worse when unguided. When tested in the vanilla setup, a larger 14-billion parameter model saw its accuracy plummet below the 7B model’s accuracy because it tended to over-decompose tasks into microscopic, unnecessary steps. Once SAD was introduced, the retrieved tool hints anchored the model back to reality and increased its accuracy. This suggests that aligning an agent with the vocabulary of specific tools is often more impactful than paying for a larger, more expensive LLM.

Another important takeaway is token savings. The LLM-Direct baseline, which used the very large Qwen-Max model, showed that feeding all tools into the prompt of a large model fails. Despite near-perfect task breakdown capabilities, the massive model only retrieved the right tool category 21.1% of the time when flooded with tool options. SkillWeaver’s targeted retrieve-and-route approach vastly outperformed this in accuracy while slashing context window consumption from an estimated 884,000 tokens down to roughly 1,160 tokens per query, a 99.9% reduction. For practitioners, this translates directly to drastically lower API costs and faster response times. 

Advertisement

Finally, the traditional ReAct baseline completely failed, achieving 0% decomposition accuracy. Its loop naturally collapses multi-step plans into isolated actions rather than explicitly mapping out a cohesive, multi-tool sequence.

Considerations for developers

While the researchers have not yet released the source code for SkillWeaver, their work was built on off-the-shelf tools that can easily be reproduced. 

Skill-Aware Decomposition (SAD), which is the key innovation at the heart of the framework, is a clever prompt-engineering and retrieval loop. The authors have shared the prompt templates in their paper, and developers can implement it themselves quite easily using standard orchestration libraries like LangChain, LlamaIndex, or even raw Python scripts.

As for the retrieval component, the authors built the core framework using all-MiniLM-L6-v2, an open-source embedding model. They found that swapping in a slightly stronger off-the-shelf encoder (BGE-base-en-v1.5) immediately boosted accuracy without any fine-tuning. While an off-the-shelf bi-encoder is great at getting a relevant tool into the top 10 candidates nearly 70% of the time, it struggles to consistently rank the perfect tool at exactly number one, achieving that only about 37% of the time. To bridge this gap, teams will likely need to implement a secondary cross-encoder or LLM-based reranker to re-order those top 10 candidates.

Advertisement

One upfront preparation requirement is vectorizing the tool library and building a FAISS index in advance. In practice, this is a negligible hurdle. Embedding and indexing all 2,209 skills in the benchmark took a mere 15 seconds. Once built, retrieving tools from the index adds less than 15 milliseconds of latency per query. For enterprise environments, syncing the tool index is a trivial background job. 

A current limitation in SkillWeaver is the lack of error recovery. While SkillWeaver successfully maps out a compatible DAG for execution, the authors’ pilot study revealed the challenges of multi-step tool chains. For example, if an API call fails in step two, the entire chain breaks. The paper’s core contribution is limited to the routing and planning phase. For a true production deployment, practitioners must build their own error recovery, fallback, and retry mechanisms on top of the compose stage to handle real-world API timeouts or malformed outputs.

Source link

Advertisement
Continue Reading

Tech

The US Marines just accepted six F-35Bs carrying lead weights where their radars should have been installed

Published

on


  • Six new F-35Bs entered service carrying ballast inside their noses instead of radar
  • The APG-85 delay created stealth fighters without their primary sensor
  • Lot 17 redesign decisions eliminated compatibility with older radar hardware

The United States Marine Corps has accepted delivery of six newly built F-35B stealth fighters carrying ballast weights where a radar should be installed.

The aircraft left production lines without the AN/APG-85 radar that future F-35 variants are expected to rely upon for combat operations.

Source link

Advertisement
Continue Reading

Trending

Copyright © 2025