Connect with us
DAPA Banner

Tech

Cursor’s new coding model Composer 2 is here: It beats Claude Opus 4.6 but still trails GPT-5.4

Published

on

Cursor, a San Francisco AI coding platform from startup Anysphere valued at $29.3 billion, has launched Composer 2, a new in-house coding model now available inside its agentic AI coding environment that’s a fine-tuned variant of Chinese open source model Kimi K2.5, and it offers drastically improved benchmarks from its prior in-house model.

It’s also launching and making Composer 2 Fast, a higher-priced but faster variant, the default experience for users.

Here’s the cost breakdown:

That’s a big drop from Cursor’s predecessor in-house model, Composer 1.5, from February, which cost $3.50 per million input tokens and $17.50 per million output tokens; Composer 2 is about 86% cheaper on both counts.

Advertisement

Composer 2 Fast is also roughly 57% cheaper than Composer 1.5.

There’s also discounts for “cache-read pricing,” that is, sending some of the same tokens in a prompt to the model again, of $0.20 per million tokens for Composer 2 and $0.35 per million for Composer 2 Fast, versus $0.35 per million for Composer 1.5.

It also matters that this appears to be a Cursor-native release, not a broadly distributed standalone model. In the company’s announcement and model documentation, Composer 2 is described as available in Cursor, tuned for Cursor’s agent workflow and integrated with the product’s tool stack.

The materials provided do not indicate separate availability through external model platforms or as a general-purpose API outside the Cursor environment.

Advertisement

Cursor is pitching long-horizon coding, not just better completions

The deeper technical claim in this release is not merely that Composer 2 scores higher than Composer 1.5. It is that Cursor says the model is better suited to long-horizon agentic coding.

In its blog, Cursor says the quality gains come from its first continued pretraining run, which gave it a stronger base for scaled reinforcement learning. From there, the company says it trained Composer 2 on long-horizon coding tasks and that the model can solve problems requiring hundreds of actions.

That framing is important because it addresses one of the biggest unresolved issues in coding AI. Many models are good at isolated code generation. Far fewer remain reliable across a longer workflow that includes reading a repository, deciding what to change, editing multiple files, running commands, interpreting failures and continuing toward a goal.

Cursor’s documentation reinforces that this is the use case it cares about. It describes Composer 2 as an agentic model with a 200,000-token context window, tuned for tool use, file edits and terminal operations inside Cursor.

Advertisement

It also notes training techniques such as self-summarization for long-running tasks. For developers already using Cursor as their main environment, that tighter tuning may matter more than a generic leaderboard claim.

The benchmark gains are substantial, even if GPT-5.4 still leads on one key chart

Cursor Composer 2 benchmarks.

Cursor Composer 2 comparison to other leading models on third-party benchmarks. Credit: Cursor

Cursor’s published results show a clear improvement over prior Composer models. The company lists Composer 2 at 61.3 on CursorBench, 61.7 on Terminal-Bench 2.0, and 73.7 on SWE-bench Multilingual.

That compares with Composer 1.5 at 44.2, 47.9 and 65.9, and Composer 1 at 38.0, 40.0 and 56.9.

Advertisement

The release is more measured than some model launches because Cursor is not claiming universal leadership.

On Terminal-Bench 2.0, which measures how well an AI agent performs tasks in command line terminal-style interfaces, GPT-5.4 still leads at 75.1, while Composer 2 scores 61.7, ahead of Opus 4.6 at 58.0, Opus 4.5 at 52.1 and Composer 1.5 at 47.9.

Cursor Composer 2 score on Terminal-Bench 2.0 compared to other leading models.

Cursor Composer 2 score on Terminal-Bench 2.0 compared to other leading models. Credit: Cursor

That makes Cursor’s pitch more pragmatic and arguably more useful for buyers. The company is not saying Composer 2 is the single best model at everything. It is saying the model has moved into a more competitive quality tier while offering more attractive economics and stronger integration with the product developers are already using.

Advertisement

Cursor also included a performance-versus-cost chart on its CursorBench benchmarking suite that appears designed to make a Pareto-style argument for Composer 2.

Cursor Composer 2 Performance vs. Cost model comparison chart

Cursor Composer 2 Performance vs. Cost model comparison chart. Credit: Cursor

In that graphic, Composer 2 sits at a stronger cost-to-performance point than Composer 1.5 and compares favorably with higher-cost GPT-5.4 and Opus 4.6 settings shown by Cursor. The company’s message is not simply that Composer 2 scores higher than its predecessor, but that it may offer a more efficient cost-to-intelligence tradeoff for everyday coding work inside Cursor.

Why the “locked to Cursor” point matters for buyers

For readers deciding whether to use Composer 2, the most important question may not be benchmark performance alone. It may be whether they want a model optimized for Cursor’s own product experience.

Advertisement

That can be a strength. According to the documentation, Composer 2 can access Cursor’s agent tool stack, including semantic code search, file and folder search, file reads, file edits, shell commands, browser control and web access.

That kind of integration can be more valuable than raw model quality if the goal is to complete real software tasks rather than produce impressive one-shot answers.

But it also narrows the addressable audience. Teams looking for a model they can deploy broadly across multiple external tools and platforms should recognize that Cursor is presenting Composer 2 as a model for Cursor users, not as a generally available standalone foundation model.

The bigger picture: Cursor is making an operational argument

The significance of Composer 2 is not that Cursor has suddenly taken the top spot on every coding benchmark. It has not. The more important point is that Cursor is making an operational argument: its model is getting better, its pricing is low enough to encourage broader use, and its faster tier is responsive enough that the company is comfortable making it the default despite the higher cost.

Advertisement

That combination could resonate with engineering teams that increasingly care less about abstract model prestige and more about whether an assistant can stay useful across long coding sessions without becoming prohibitively expensive.

Cursor’s broader pricing structure helps frame the competitive pressure around this launch. On its current pricing page, Cursor offers a free Hobby tier, a Pro plan at $20 per month, Pro+ at $60 per month, and Ultra at $200 per month for individual users, with higher tiers offering more usage across models from OpenAI, Anthropic and Google.

On the business side, Teams costs $40 per user per month, while Enterprise is custom-priced and adds pooled usage, centralized billing, usage analytics, privacy controls, SSO, audit logs and granular admin controls. In other words, Cursor is not just charging for access to a coding model. It is charging for a managed application layer that sits on top of multiple model providers while adding team features, governance and workflow tooling.

That model is increasingly under pressure as first-party AI companies push deeper into coding itself. OpenAI and Anthropic are no longer just selling models through third-party products; they are also shipping their own coding interfaces, agents and evaluation frameworks — such as Codex and Claude Code — raising the question of how much room remains for an intermediary platform.

Advertisement

Commenters on X, while unverified and not necessarily representative of the broader market, have increasingly described moving from Cursor to Anthropic’s Claude Code, especially among power users drawn to terminal-first workflows, longer-running agent behavior and lower perceived overhead.

Some of those posts describe frustration with Cursor’s pricing, context loss or editor-centric experience, while praising Claude Code as a more direct and fully agentic way to work. Even treated cautiously, that kind of social chatter points to the strategic problem Cursor faces: it has to prove that its integrated platform, team controls and now its own in-house models add enough value to justify sitting between developers and the model makers’ increasingly capable coding products.

That makes Composer 2 strategically important for Cursor.

By offering a much cheaper in-house model than Composer 1.5, tuning it tightly to Cursor’s own tool stack and making a faster version the default, the company is trying to show that it provides more than a wrapper around outside systems.

Advertisement

The challenge is that as first-party coding products improve, developers and enterprise buyers may increasingly ask whether they want a separate AI coding platform at all, or whether the model makers’ own tools are becoming sufficient on their own.

Source link

Continue Reading
Click to comment

You must be logged in to post a comment Login

Leave a Reply

Tech

Reworked Apple Watch avoids ban, but Masimo battle escalates

Published

on


The decision, made public on Thursday, concludes that Apple’s latest implementation of pulse-oximetry functionality falls outside the scope of Masimo’s asserted rights. The full ITC commission will now review the judge’s ruling and decide whether to adopt it – a step that will determine whether the redesigned watches remain protected…
Read Entire Article
Source link

Continue Reading

Tech

Daily Deal: The 2026 C# Course Bundle

Published

on

from the good-deals-on-cool-stuff dept

The 2026 C# Course Bundle offers 8 courses that cover everything C#. You’ll master the fundamentals, explore object-oriented programming, and start building your own apps in no time. It’s on sale for $40.

Note: The Techdirt Deals Store is powered and curated by StackCommerce. A portion of all sales from Techdirt Deals helps support Techdirt. The products featured do not reflect endorsements by our editorial team.

Filed Under: daily deal

Source link

Advertisement
Continue Reading

Tech

‘We should regard it as a privilege to be stepping stones to higher things’: How Arthur C Clarke predicted the rise of AGI and the looming demise of humanity back in 1964

Published

on

While debate over the timeline – or even the potential – for artificial general intelligence (AGI) rages on in 2026, one futurist may have predicted the breakthrough more than 60 years ago.

Noted British science fiction writer and futurist Arthur C. Clarke touted the arrival of AGI during an interview at the 1964 World’s Fair in New York City.

Source link

Continue Reading

Tech

This monitor claims paper-like viewing and huge energy savings by using ambient light instead of relying entirely on traditional backlighting

Published

on


  • Hannspree Hybri monitor uses ambient light to significantly reduce energy consumption
  • Reflective display design aims to mimic paper-like readability and comfort
  • Automatic switching enables backlight use in low ambient light conditions

The Hannspree Hybri monitor attempts to merge paper-like readability with modern display performance, claiming an 80% reduction in energy use through innovative use of ambient light.

At illumination levels above 1000lux, common in offices, classrooms, and outdoor-adjacent spaces, the monitor reflects surrounding light instead of relying solely on a backlight.

Source link

Advertisement
Continue Reading

Tech

Reddit wants to check if you’re using the iPhone’s Face ID camera

Published

on

Reddit may soon ask users to prove they’re human, and it might involve your face. During a TBPN podcast, Reddit’s CEO, Steve Huffman, confirmed that the platform is exploring new identity verification methods, including using Face ID or Touch ID-style authentication, to tackle its growing bot problem.

RDDT requiring Face ID was not something I had on my bingo card but something has got to be done about all the fake / botted content — I just don’t know how to sell face-scanning to redditors or even lurkers. https://t.co/7e7K3Di4ip

— Alexis Ohanian 🗽 (@alexisohanian) March 21, 2026

The idea is simple: as AI-generated accounts become more convincing, Reddit wants stronger ways to confirm that users are real people and not bots pretending to be one.

Why is Reddit considering Face ID-style verification?

Unfortunately, bots are getting too good. Huffman has previously emphasized keeping the platform “human,” and this move fits right into that strategy. AI-generated content and automated accounts are becoming harder to detect, making moderation more challenging and threatening the authenticity of discussions.

Advertisement

As such, verification methods like Face ID or biometric checks could act as a quick way to confirm a real person is behind an account, without requiring traditional ID uploads. But of course, it’s not that simple.

So… are we really scanning faces now?

Reddit isn’t going full sci-fi just yet. The company is still “weighing” its options, which could mean optional verification for certain features, regions, or accounts rather than forcing everyone to scan their face. We’ve already seen a preview of this in places like the UK, where Reddit uses selfies or ID checks for age verification.

The next step could make things feel a lot more seamless and a bit more invasive. Instead of uploading IDs, Reddit may lean on device-level tools like Face ID to confirm you’re human, turning verification into something that happens in the background rather than a full process. Of course, that’s where things get messy.

Biometric checks raise big questions around privacy, data security, and consent, and users aren’t exactly thrilled about handing over their face to prove they’re not a bot. Reddit may be solving one problem, but it opens up another: how much verification is too much? Especially on a platform where anonymity is kind of the whole point?

Source link

Advertisement
Continue Reading

Tech

Google isn't backing away from Pentagon AI work, it's doubling down

Published

on


According to Business Insider, the issue came up during a January Google DeepMind town hall, where VP of Global Affairs Tom Lue said the company was “leaning more” into national security work.
Read Entire Article
Source link

Continue Reading

Tech

Scientists find all five genetic building blocks for life in asteroid Ryugu

Published

on


Researchers are still studying samples of Ryugu collected by the Japanese Aerospace Exploration Agency from its Hayabusa2 mission. After the first papers focused on the composition of the recovered material, a Japanese team has now found a “complete” set of genetic bases belonging to both DNA and RNA.
Read Entire Article
Source link

Continue Reading

Tech

8Today’s NYT Strands Hints, Answer and Help for March 22 #749

Published

on

Looking for the most recent Strands answer? Click here for our daily Strands hints, as well as our daily answers and hints for The New York Times Mini Crossword, Wordle, Connections and Connections: Sports Edition puzzles.


Today’s NYT Strands puzzle is an intriguing one. It helps if you know a little bit about famous products throughout history. Some of the answers are difficult to unscramble, so if you need hints and answers, read on.

I go into depth about the rules for Strands in this story

Advertisement

If you’re looking for today’s Wordle, Connections and Mini Crossword answers, you can visit CNET’s NYT puzzle hints page.

Read more: NYT Connections Turns 1: These Are the 5 Toughest Puzzles So Far

Hint for today’s Strands puzzle

Today’s Strands theme is: Trademarked no more

Advertisement

If that doesn’t help you, here’s a clue: Brand names that became generic terms.

Clue words to unlock in-game hints

Your goal is to find hidden words that fit the puzzle’s theme. If you’re stuck, find any words you can. Every time you find three words of four letters or more, Strands will reveal one of the theme words. These are the words I used to get those hints but any words of four or more letters that you find will work:

  • SPIT, SPITE, SPITES, SPITS, PIER, PIERS, GAME, SAME, POPE, POPES, GASP

Answers for today’s Strands puzzle

These are the answers that tie into the theme. The goal of the puzzle is to find them all, including the spangram, a theme word that reaches from one side of the puzzle to the other. When you have all of them (I originally thought there were always eight but learned that the number can vary), every letter on the board will be used. Here are the nonspangram answers:

  • ZIPPER, ASPIRIN, THERMOS, DUMPSTER, ESCALATOR

Today’s Strands spangram

completed NYT Strands puzzle for March 22, 2026

The completed NYT Strands puzzle for March 22, 2026.

NYT/Screenshot by CNET

Today’s Strands spangram is GENERICTERM. To find it, start with the G that is three letters down on the far-left row, and wind across and then up again.

Advertisement

Source link

Continue Reading

Tech

MacBook Neo review: the new king of budget laptops

Published

on

Don’t call it compromised. The MacBook Neo is an amazing new entry point in Apple’s lineup that easily eclipses the base iPad and will be a revolution in the education market.

An open MacBook Neo viewed from the back on an outdoor table
MacBook Neo review: A18 Pro is more than enough compute

Apple is no stranger to attempting new and interesting budget products like the entry iPhone 17e or base iPad. While it thrives in the premium market, Apple’s best sellers are at the bottom of the lineup, and that bottom just dropped again for the MacBook.
MacBook Neo is yet another move towards a more affordable Mac that echoes previous attempts, like the iBook. Though, even in 2006, the iBook was a closer relation to today’s MacBook Air than to the MacBook Neo.
Continue Reading on AppleInsider | Discuss on our Forums

Source link

Continue Reading

Tech

Broadcom's VMware shake-up triggers EU antitrust complaint by cloud providers

Published

on


CISPE claims Broadcom’s actions have excluded most European cloud infrastructure partners, sharply reduced competition, and forced smaller firms out of the VMware ecosystem altogether.
Read Entire Article
Source link

Continue Reading

Trending

Copyright © 2025