CryptoCurrency

LLMs Fail to Match Specialized AI Trading Bots That Adjust for Risk

Published

1 month ago

13 December 2025

LLMs Fail to Match Specialized AI Trading Bots That Adjust for Risk

AI-powered trading hasn’t yet reached an “iPhone moment,” when everyone is carrying around an algorithmic, reinforcement learning portfolio manager in their pocket, but something like that is coming, experts say.

In fact, the power of AI meets its match when faced with the dynamic, adversarial arena of trading markets. Unlike an AI agent informed by endless circuits of self-driving cars learning to accurately recognize traffic signals, no amount of data and modeling will ever be able to tell the future.

This makes refining AI trading models a complex, demanding process. The measure of success has typically been gauging profit and loss (P&L). But advancements in how to customize algorithms are engendering agents that continually learn to balance risk and reward when faced with a multitude of market conditions.

Allowing risk-adjusted metrics such as the Sharpe Ratio to inform the learning process multiplies the sophistication of a test, said Michael Sena, chief marketing officer at Recall Labs, a firm that has run 20 or so AI trading arenas, where a community submits AI trading agents, and those agents compete over a four or five day period.

“When it comes to scanning the market for alpha, the next generation of builders are exploring algo customization and specialization, taking user preferences into account,” Sena said in an interview. “Being optimized for a particular ratio and not just raw P&L is more like the way leading financial institutions work in traditional markets. So, looking at things like, what is your max drawdown, how much was your value at risk to make this P&L?”

Taking a step back, a recent trading competition on decentralized exchange Hyperliquid, involving several large language models (LLMs), such as GPT-5, DeepSeek and Gemini Pro, kind of set the baseline for where AI is in the trading world. These LLMs were all given the same prompt and executed autonomously, making decisions. But they weren’t that good, according to Sena, barely outperforming the market.

“We took the AI models used in the Hyperliquid contest and we let people submit their trading agents that they had built to compete against those models. We wanted to see if trading agents are better than the foundational models, with that added specialization,” Sena said.

The top three spots in Recall’s competition were taken by customized models. “Some models were unprofitable and underperformed, but it became obvious that specialized trading agents that take these models and apply additional logic and inference and data sources and things on top, are outperforming the base AI,” he said.

The democratization of AI-based trading raises interesting questions about whether there will be any alpha left to cover if everyone is using the same level of sophisticated machine-learning tech.

“If everyone’s using the same agent and that agent is executing the same strategy for everyone, does that sort of collapse into itself?” Sena said. “Does the alpha it’s detecting go away because it’s trying to execute it at scale for everyone else?”

That’s why those best positioned to benefit from the advantage AI trading will eventually bring are those with the resources to invest in the development of custom tools, Sena said. As in traditional finance, the highest quality tools that generate the most alpha are typically not public, he added.

“People want to keep these tools as private as possible, because they want to protect that alpha,” Sena said. “They paid a lot for it. You saw that with hedge funds buying data sets. You can see that with proprietary algos developed by family offices.

“I think the magical sweet spot will be where there’s a product that is a portfolio manager but the user still has some say in their strategy. They can say, ‘This is how I like to trade and here are my parameters, let’s implement something similar, but make it better.’”