Connect with us

Crypto World

OpenAI Pits AI Agents Against Each Other to Red-Team Smart Contracts

Published

on

Crypto Breaking News

OpenAI has unveiled a benchmarking framework aimed at measuring how effectively AI agents can detect, mitigate, and even exploit security vulnerabilities in crypto smart contracts. The project, titled “EVMbench: Evaluating AI Agents on Smart Contract Security,” was released in collaboration with Paradigm and OtterSec, two organizations with deep exposure to blockchain security and investment. The study assesses AI agents against a curated set of 120 potential weaknesses drawn from 40 smart contract audits, seeking to quantify not just detection and patching capabilities but also the theoretical exploit potential of these agents in a controlled environment.

Key takeaways

  • EVMbench tests AI agents against 120 vulnerabilities culled from 40 smart contract audits, emphasizing vulnerabilities sourced from open-source audit competitions.
  • Among the models tested, Anthropic’s Claude Opus 4.6 led with an average detect award of $37,824, followed by OpenAI’s OC-GPT-5.2 at $31,623 and Google’s Gemini 3 Pro at $25,112.
  • OpenAI frames the benchmark as a step toward measuring AI performance in “economically meaningful environments,” not just toy tasks, highlighting the real-world implications for attackers and defenders in the crypto security landscape.
  • The researchers note that smart contracts secure billions of dollars in assets, underscoring the strategic value of AI-enabled tooling for both offensive and defensive activities.
  • Industry observers have tied these developments to broader discussions about AI-driven payments and the role of stablecoins in everyday transactions, with major executives predicting growing agentic usage in the coming years.
  • The context for such work is underscored by 2025’s crypto-security incident data, which shows a continued flow of funds through vulnerabilities and attacks, reinforcing the demand for robust AI-enabled auditing and defense mechanisms.

Detect awards for AI agents are detailed in the OpenAI PDF accompanying the study, which also describes the evaluation methodology and the scenarios used to simulate real-world smart-contract risk. The authors emphasize that while AI agents have evolved to automate a wide range of routine tasks, assessing their performance in “economically meaningful environments” is essential to understanding how they’ll perform under pressure in production systems.

“Smart contracts secure billions of dollars in assets, and AI agents are likely to be transformative for both attackers and defenders.”

OpenAI notes that it expects agentic technologies to broaden the scope of payments and settlement, including stablecoins used in automated workflows. The discussion around AI-enabled payments extends beyond security testing to the broader question of how autonomous systems will participate in daily financial activity. The company’s own projections suggest that agentic payments could become more commonplace, grounding AI capabilities in practical use cases that touch everyday consumer transactions.

In tandem with the benchmark results, Circle CEO Jeremy Allaire has publicly forecast that billions of AI agents could be transacting with stablecoins for everyday payments within the next five years. That view intersects with a recurring theme in crypto circles: the potential for crypto to become the native currency of AI agents, a narrative that has gained notable attention from industry leaders and investors alike. While such predictions remain speculative, the underlying trend is clear—AI automation is moving from the lab to the transaction layer, where it could reshape how value moves across networks.

The study arrives at a moment when crypto security continues to be a significant risk factor for investors. The data point about 2025’s assault on crypto funds—where attackers pulled roughly $3.4 billion—highlights the urgency of improved tooling and faster, more reliable patching mechanisms. The EVMbench framework is positioned, in part, as a way to measure whether AI agents can meaningfully contribute to defensive capabilities at scale, reducing exploitation opportunities and accelerating threat mitigation.

Advertisement

To build the benchmark, researchers drew on 120 curated vulnerabilities spanning 40 smart contract audits, with many weaknesses traced back to open-source audit challenges. OpenAI argues the benchmark will help track AI progress in recognizing and mitigating contract-level weaknesses at scale, offering a standardized way to compare future AI models as they evolve. The study also provides a lens into how AI might be applied to normalizing risk assessment across a wide range of smart-contract architectures, rather than focusing solely on isolated cases.

Smart contracts weren’t built for humans: Dragonfly

In a contemporaneous thread on X, Haseeb Qureshi, a partner at Dragonfly, argued that crypto’s promise of replacing property rights and traditional contracts never materialized not because the technology failed, but because it was never designed with human intuition in mind. He has highlighted the persistent fear associated with signing large transactions in an environment where drainer wallets and other attack vectors remain a constant threat, in stark contrast to the comparatively smoother experience of traditional bank transfers.

Qureshi contends that the next phase of crypto transactions could be enabled by AI-intermediated, self-driving wallets. Such wallets would monitor risk, manage complex operations, and autonomously respond to threats on behalf of users, potentially reducing the friction and fear that characterize large transfers today.

“A technology often snaps into place once its complement finally arrives. GPS had to wait for the smartphone, TCP/IP had to wait for the browser. For crypto, we might just have found it in AI agents.”

The broader takeaway from this thread is that AI agents may play a critical role in transforming how people interact with crypto—shifting from manual, error-prone transactions to automated, risk-aware processes that can scale with adoption. As AI agents begin to demonstrate more competence in handling security concerns, users could see improved reliability and resilience in decentralized finance workflows, even as the underlying technologies continue to mature.

Advertisement

What to watch next

  • Publication and independent replication of the full EVMbench dataset across additional AI models and architectures.
  • Broader adoption of AI-assisted auditing workflows by auditors, exchanges, and DeFi projects looking to bolster security postures.
  • Explorations into agentic wallets and autonomous payment flows, including regulatory and compliance considerations for AI-managed assets.
  • Follow-up benchmarks comparing more AI systems as new versions roll out, tracking improvements in detection accuracy and patching speed.

Sources & verification

  • OpenAI: EVMbench: Evaluating AI Agents on Smart Contract Security — PDF: https://cdn.openai.com/evmbench/evmbench.pdf
  • OpenAI: Introducing EVMbench — https://openai.com/index/introducing-evmbench/
  • Crypto security losses in 2025 (reporting coverage): https://cointelegraph.com/news/crypto-3-4-billion-losses-2025-wallet-hacks
  • Dragonfly: Haseeb Qureshi on AI and crypto UX (X post): https://x.com/hosseeb/status/2024136762424185208
  • China’s AI lead and crypto implications (analysis): https://cointelegraph.com/news/china-ai-lead-future
  • AI Eye — IronClaw and AI bot developments in Polymarket coverage: https://cointelegraph.com/magazine/ironclaw-secure-private-sounds-cooler-openclaw-ai-eye/

Key figures and next steps

The EVMbench study demonstrates that large language models and related AI agents are beginning to perform meaningful security work in the smart contract space, with clearly quantifiable differences across models. Claude Opus 4.6’s lead in average detect awards signals that certain architectures may be more adept at spotting and mitigating vulnerabilities within complex contract logic, while others trail, offering a spectrum of capabilities that researchers will likely want to refine. The inclusion of multiple industry partnerships in the project underscores the growing consensus that AI-enabled security and automated risk management could become essential to scale in decentralized environments.

As the field evolves, observers will be watching for how quickly AI agents can transition from detection to remediation, and whether these agents can operate reliably in live systems without introducing new risks. The conversation about AI-driven wallets and autonomous payments touches on a broader set of questions around security governance, user consent, and regulatory alignment. If the trajectory suggested by OpenAI and its partners continues, AI-assisted tools could become a core component of future crypto infrastructure, changing both the risk calculus and the user experience in meaningful ways. The next round of benchmarks, alongside real-world deployments, will help determine how quickly this vision materializes and what safeguards must accompany it.

Risk & affiliate notice: Crypto assets are volatile and capital is at risk. This article may contain affiliate links. Read full disclosure

Source link

Advertisement
Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Crypto World

BTC can bounce but market still lacks fuel for a real run

Published

on

Bitcoin back up above $71,000

Bitcoin is finding space to bounce, but not yet the fuel to run.

The macro backdrop has improved just enough to give bulls something to work with. Cooling headline inflation has strengthened expectations for three rate cuts this year, reviving the familiar playbook in which easier monetary policy supports risk assets.

And it could signal the possibility of liquidity slowly returning after months of tight financial conditions for crypto markets.

But caution against reading too much into that shift. The Federal Reserve is unlikely to embark on an aggressive easing cycle. Instead, it appears set for a measured approach that rebuilds liquidity gradually. That creates an environment where bitcoin can stage tactical rallies yet struggle to hold them.

Advertisement

Bitfinex analysts describe the market as one prone to moves in waves rather than clean breakouts.

“In this environment, volatility remains likely,” the firm said in a note shared with CoinDesk. “Tactical upside moves can occur when positioning becomes overly defensive, but a durable structural advance will require clearer confirmation from both macro disinflation trends and sustained spot demand.”

Spot recoveries continue to meet steady selling. Each bounce is absorbed more smoothly than earlier in the quarter, suggesting some stabilization.

The overnight tape is a good example. Bitcoin traded as high as $68,500 before rolling over during the U.S. afternoon and sliding under $66,000, a move that lined up with a stronger dollar and hawkish Fed minutes. That kind of intraday reversal is the market’s way of saying rallies are still fragile, and that traders are quick to sell the moment macro conditions turn even slightly less friendly.

Advertisement

“It is alarming that Bitcoin’s dynamics mirror the recent strengthening of the dollar. When investors become convinced that the rise of the dollar is a trend, there may be a sharp increase in volatility,” Alex Kuptsikevich, the FxPro chief market analyst, said in an email.”

“Volatility seems to have been turned off in this market, while stock indices are much livelier. There, investors are actively buying up dips, relying on support in the form of important moving averages: 50-day for the Dow Jones and Russell 2000 and 200-day for the Nasdaq100. The crypto market is now below its 50- and 200-day curves by 17% and 31%, respectively,” he added.

Sentiment remains fragile, meanwhile, as a crypto fear gauge has printed single digits on nine of the past fourteen days, territory rarely seen outside prior cycle lows.

At the same time, stablecoin outflows from major exchanges point to tighter liquidity, and long-term holders have shown signs of stress comparable to late bear-market phases in 2022, according to Glassnode.

For now, bitcoin appears caught between improving macro optics and stubborn supply. Tactical upside remains possible, especially when positioning leans too defensive.

Advertisement

A durable advance, however, likely requires clearer evidence of disinflation, a softer dollar and consistent spot demand. Until then, the path higher may be uneven.

Source link

Advertisement
Continue Reading

Crypto World

Fueling Saudi Arabia’s Vision 2030

Published

on

Cb Img 2 1 2

Editor’s note: Global Games Show Riyadh 2026 signals a turning point for Saudi Arabia’s digital entertainment ecosystem as the kingdom accelerates growth across gaming, esports, and Web3. This press release outlines a multi-day program that combines live demonstrations, developer workshops, and high-profile panels, underscoring Riyadh’s emergence as a regional hub for interactive technology. The show also reinforces collaboration among startups, creators, and investors through dedicated networking spaces and matchmaking sessions. By bringing together leaders from across the industry, the event aims to catalyze partnerships and accelerate the creative economy envisioned in Vision 2030.

Key points

  • Global Games Show Riyadh 2026 brings together gaming, esports, and Web3 within Saudi Vision 2030.
  • The event features live demos, workshops, panels, and networking with industry leaders, indie developers to global publishers.
  • It is organized by VAP Group and powered by Times of Games, with parallel events Global AI Show and Global Blockchain Show on a single ticket.

Why this matters

By concentrating expertise and investment in Riyadh, the Global Games Show aims to accelerate Saudi Arabia’s creative economy and position the Kingdom as a regional and global hub for interactive technology. The conference highlights trends in immersive gaming, cloud gaming, and monetization strategies, and emphasizes collaboration across startups, developers, and publishers, aligning with Vision 2030’s diversification goals.

What to watch next

  • Updates on Day 1 and Day 2 sessions and key speakers.
  • Public announcements of participating companies and partnerships.
  • Ticketing details for the Global AI Show and Global Blockchain Show, accessible with one ticket.

Disclosure: The content below is a press release provided by the company/PR representative. It is published for informational purposes.

Global Games Show Riyadh 2026 : Fueling Saudi Arabia’s Vision 2030

Global Games Show Riyadh 2026 Riyadh edition is poised to become the ultimate destination for gaming enthusiasts, developers, and investors alike. Organized by VAP Group and powered by the Times of Games, the event promises a vibrant lineup of discussions and engaging experiences that symbolize the rapidly changing gaming sphere.

Participants can explore the latest in game development, esports, and interactive entertainment, with live demonstrations, workshops, and panels led by industry leaders. From indie developers to global publishers, companies will present their most innovative games and technologies, providing attendees with insights into the future of gaming.

Cb Img 2 1 2

Educational and strategic sessions focus on trends such as immersive gaming, cloud gaming, and monetization strategies. These discussions equip participants with knowledge to navigate challenges, leverage opportunities, and scale their ventures effectively.

Advertisement

Day 1 is all about the future of gaming technology, with talk on Saudi Arabia becoming a world esports capital, the next phase of gaming engines with Unreal Engine 6, brain–computer interfaces, and AI-generated game design. Experts will also discuss what the future of esports will look like in the Kingdom and how it is increasingly driving Vision 2030’s creative economy.

Day 2, entitled “Gameconomics,” explores the gaming business—from crowdfunded games to mobile gaming opportunity, player-coined communities, and developer–investor partnerships that form industry expansion.

By bringing a diverse mix of professionals under one roof, the Global Games Show strengthens Riyadh’s position as a hub for interactive technology and digital entertainment. Attendees also get access to other parallel flagship events, the Global AI Show and the Global Blockchain Show with just one ticket. GGS is a convergence of ideas, creativity, and opportunity in the gaming world.

Media enquiries :

Advertisement

Press contact : media@globalblockchainshow.com

Risk & affiliate notice: Crypto assets are volatile and capital is at risk. This article may contain affiliate links. Read full disclosure

Source link

Advertisement
Continue Reading

Crypto World

Moonwell’s AI-coded oracle glitch misprices cbETH at $1, drains $1.78M

Published

on

Crypto VC Funding Reaches $244M as Mesh Leads

Moonwell’s lending pools racked up about $1.78M in bad debt after a cbETH oracle mispriced the token at nearly $1 instead of around $2.2k, enabling bots and liquidators to drain collateral within hours of a misconfigured Chainlink-based update reportedly using AI-generated logic.

Summary

  • Misconfigured cbETH oracle set price near $1 vs roughly $2.2k, triggering a ~99% valuation gap that broke Moonwell’s collateral math.
  • Liquidators repaid around $1 per position to seize over 1,096 cbETH, leaving Moonwell with roughly $1.78M in protocol-level bad debt.
  • Faulty formula and scaling logic were reportedly co-authored by AI model Claude Opus 4.6, spotlighting new DeFi risk around AI-written oracle and pricing code.

Decentralized finance lending protocol Moonwell suffered a $1.78 million exploit due to a pricing oracle bug that misvalued Coinbase-wrapped ETH (cbETH), according to reports from the platform.

Advertisement

The vulnerability originated in oracle calculation logic reportedly generated by the AI model Claude Opus 4.6, which introduced an incorrect scaling factor in the asset price feed, according to the protocol’s disclosure. Attackers borrowed against severely underpriced collateral, extracting funds before the error was detected and corrected.

The cbETH mispricing effectively collapsed the collateral requirement for borrowing within affected pools. Because lending systems rely on accurate collateral ratios, the incorrect price allowed attackers to extract assets with minimal backing value, according to the protocol’s technical analysis.

Price oracles represent critical security components in DeFi lending systems. Incorrect asset valuation can enable under-collateralized borrowing or liquidation failures. Many major DeFi exploits have historically involved oracle manipulation or pricing errors rather than core protocol flaws, according to industry security reports.

The Moonwell incident differs from traditional oracle exploits in that the faulty logic appears linked to automated AI code generation rather than malicious oracle data feeds, according to the protocol’s preliminary investigation.

The exploit highlights risks associated with AI-assisted smart-contract development in financial applications. Language models can accelerate coding workflows, but financial protocols require precise numerical correctness, unit handling and edge-case validation, according to blockchain security experts.

Advertisement

In DeFi systems, small arithmetic or scaling mistakes can translate into systemic vulnerabilities affecting collateral valuation and solvency. The incident raises questions about whether AI-generated contract components may require stricter auditing standards than manually written code, according to security researchers.

AI-assisted development is increasingly used across Web3 engineering workflows, from contract templates to integration logic. Security models and audit frameworks have not yet fully adapted to AI-generated contract code, according to industry observers.

The broader implications center on how automated code generation errors in financial logic represent a new category of DeFi risk. Oracle math, scaling factors and unit conversions remain high-precision domains where automation failures can propagate into protocol-level vulnerabilities, according to technical analysis of the incident.

As AI-assisted smart-contract development expands, audit methodologies will likely need to evolve toward verifying not only code correctness but generation provenance and numerical invariants, according to blockchain security firms.

Advertisement

Source link

Advertisement
Continue Reading

Crypto World

Kalshi Data Could Inform Fed Reserve Policy, Say Researchers

Published

on

Kalshi Data Could Inform Fed Reserve Policy, Say Researchers

Three researchers at the US Federal Reserve argue that prediction market Kalshi can better measure macroeconomic expectations in real time than existing solutions and thus should be incorporated into the Fed’s decision-making process.

The “Kalshi and the Rise of Macro Markets” paper was released on Feb. 12 by Federal Reserve Board principal economist Anthony Diercks, Federal Reserve research assistant Jared Dean Katz and Johns Hopkins research associate Jonathan Wright.

Kalshi data was compared with traditional surveys and market-implied forecasts to examine how beliefs about future economic outcomes change in response to macroeconomic news and statements from policymakers.

Source: Tarek Mansour

“Managing expectations is central to modern macroeconomic policy. Yet the tools that are often relied upon—surveys and financial derivatives—have many drawbacks,” the researchers said, adding that Kalshi can capture the market’s “beliefs directly and in real time.”

“Kalshi markets provide a high-frequency, continuously updated, distributionally rich benchmark that is valuable to both researchers and policymakers.”

Kalshi traders can bet on a range of markets tied to the Federal Reserve’s decision-making, including consumer price index inflation and payroll, in addition to other macroeconomic outcomes such as gross domestic product growth and gas prices.

Advertisement

The Fed researchers said Kalshi data should be used to provide a risk-neutral probability density function, which shows all possible outcomes of Fed interest rate decisions and how likely each one is. 

“Overall, we argue that Kalshi should be used to provide risk-neutral [probability density functions] concerning FOMC decisions at specific meetings” arguing that the current benchmark is “too far removed from the monetary policy interest rate decision.”

However, Fed research papers are only “preliminary materials circulated to stimulate discussion” and do not impact the central bank’s decision-making.