TL;DR
Cloudflare, Mozilla, Google, Microsoft, and Shopify are building PACT, a privacy-first protocol to verify web traffic legitimacy.

Valve Software abruptly opened reservations for its latest Steam Machine on Monday, but due to the ongoing PC component shortage, did so at a significantly higher price than expected.
The company, headquartered in Bellevue, Wash., first announced the new version of the Steam Machine late last year. It’s a small-scale, high-powered gaming PC that’s designed for your living room, which runs the same Linux-based SteamOS as Valve’s Steam Deck.
The 2026 Steam Machine starts at a whopping $1,049 through Valve’s digital storefront Steam, which gets you the base model with an internal 512GB SSD. A higher-end model with a 2TB drive costs $1,349, and both also come in bundles with one of Valve’s new Steam Controllers.
It is, on paper, an impressive overall device, particularly as a sort of gateway product for anyone who’d like to break into gaming on PCs and/or Linux. However, its price tag is a significant barrier. A comparatively powerful PC would still cost as much or more, but Valve’s old strategy with the Steam Deck, by comparison, was to practically give it away.
As it turns out, Valve isn’t particularly happy about the price either, preemptively addressing concerns via a post on the official Steam blog. The short version is that the planned launch of the Machine has been complicated by the ongoing component crisis that surrounds SSDs and RAM.
The prices “reflect the state of the world for manufacturing; or, more accurately, it reflects the price of the components as we’ve secured them over the past 6 months,” the company said in the post.
The two Steam Machine models’ internal storage capacity is the only difference between them. Both are gaming PCs that pack “semi-custom” AMD CPUs and GPUs, 16 GB RAM, Bluetooth capability, an ethernet port, and a MicroSD card slot into a 6” black cube, complete with a removable faceplate.

The high cost of entry for the Steam Machine is another knock-on effect from the ongoing global RAM and SSD shortages, which were initially created by high demand from the burgeoning AI industry. The same problems have resulted in multiple price hikes for current-generation gaming consoles and spiked the costs for new-built gaming PCs. It’s been a bad time for the hobby overall, especially for newcomers and players on a budget.
The Machine isn’t likely to fail, but its costs may mean that for the time being, it turns into little more than an expensive toy for gadgetheads. One of Valve’s quiet ambitions for years has been to bring more people into PC gaming, and especially PC gaming on Linux, but for a thousand bucks a throw, the Machine isn’t likely to draw in any new customers.
That suggests that if a company like Valve, which controls roughly half the PC gaming on Earth via Steam, is having problems like this, then it’s wise to expect further disruption for the foreseeable future. Xbox in particular was talking about launching a new console at the end of 2027, but with RAM and SSD costs on the rise, it looks like the next generation of hardware will either be prohibitively expensive or best pushed off for a few years.
As with the Deck, you get games onto the Machine via direct download from Valve’s digital storefront Steam. Also as with the Deck, the Machine is designed so it can also be used as a desktop computer, with no particular guardrails to keep out tinkerers and modders.
Even with their high cost, and with a lower number of available units at launch than Valve had planned, the 2026 Steam Machine was already listed as “out of stock” within 10 minutes of the store page opening, which was before Valve itself had officially announced it had done so.
However, Valve has implemented a lottery system in order to stymie resellers and attempt to make the process as fair as possible. Any interested buyers can sign up for a Steam Machine reservation at any time before this coming Thursday, at which point Valve will randomize the queue. Anyone who doesn’t get in on Thursday will be added to a waiting list.
Cloudflare, Mozilla, Google, Microsoft, and Shopify are building PACT, a privacy-first protocol to verify web traffic legitimacy.
Cloudflare has announced a joint initiative with Mozilla Firefox, Google Chrome, and Microsoft Edge to develop a new internet protocol that verifies whether web traffic is legitimate without tracking users. The protocol, called Private Access Control Tokens, is designed to replace CAPTCHAs and forced logins with anonymous tokens that prove a visitor is human or an authorised bot. Shopify co-developed the technology and the group plans to submit it for formal standardisation.
The announcement comes as bot traffic has officially overtaken human activity online. Cloudflare Radar data shows automated systems now account for roughly 58 percent of HTTP requests to web content worldwide, against 42 percent from people. Cloudflare CEO Matthew Prince shared the milestone on June 3, noting that agentic AI programs browsing on behalf of assistants like ChatGPT and Gemini had accelerated the crossover by about 18 months ahead of his earlier predictions.
PACT works by allowing websites with strong knowledge of a visitor’s identity to issue anonymous tokens. A user’s browser stores the token and can present it to other websites as proof that a real person is behind the session, reducing the need for repeated identity checks. The protocol is designed so that the token cannot be used to track users or reconstruct their browsing history.
“The way we interact with the Internet is facing a fundamental shift,” Cloudflare CTO Dane Knecht said in the announcement. “As AI-powered traffic becomes widespread, existing tools to support its use are too generic and coarse.” He said the collaboration would eliminate the friction caused by security protocols for every visitor, whether human or agent, without sacrificing privacy.
The initiative does not aim to block all automated traffic. Cloudflare has itself embraced agentic AI, cutting 1,100 jobs earlier this year after declaring that AI agents now perform work previously done by humans. For many AI agents there is still a human somewhere in the loop with a legitimate reason to access a website.
PACT is meant to distinguish those authorised agents from malicious scrapers and abuse bots, not to shut down automation entirely.
The browser makers framed the effort as essential to the open web. Bobby Holley, CTO for Firefox at Mozilla, said an “avalanche of automated traffic” was pushing sites toward blunt defences like paywalls, identity checks, and invasive tracking. Erik Anderson, director of engineering for the web platform at Microsoft Edge, called effective privacy-preserving tools critical to combating abuse without unnecessary user friction.
Shopify’s involvement reflects the commercial stakes. Ilya Grigorik, a distinguished engineer at the company, said every extra challenge or false positive in ecommerce can turn a purchase into an abandoned cart. Covert browser fingerprinting and extension scanning have emerged as the default tools for platforms trying to identify users, a practice that privacy advocates and regulators have pushed back against.
PACT would offer a standardised alternative that does not require harvesting device characteristics or tracking browsing behaviour.
The protocol builds on earlier work in the same space. Apple already uses a related system called Privacy Pass, which works with a device’s secure enclave to attest to a user’s identity, and Cloudflare uses Privacy Pass as a signal in its bot management products. The IETF published the Privacy Pass Architecture as RFC 9576, and PACT extends that foundation with broader browser support and a focus on the agentic AI traffic that has reshaped the composition of the web in the past year.
No deployment timeline has been announced. The partners have committed to developing the protocol and submitting it for standardisation, but turning a specification into something that works across billions of browser sessions will take time. Users are already migrating away from platforms that impose AI features without consent, and the question of how to manage automated traffic without alienating human visitors is becoming more urgent by the quarter.
Whether PACT arrives fast enough to matter depends on how quickly the standards process moves and how willing websites are to adopt a system that, by design, gives them less data about their visitors rather than more.
Well, this is very disappointing. Over the first half of this year, we’ve talked about the resurgence of the Stop Killing Games movement, which aims to push various governments to legislate out the practice of video game publishers sunsetting their games and making them unplayable afterwards. The aims of the movement are simple: publishers can certainly sunset their games that require backend servers to work, but they should make them hostable and playable through fan-run servers if they do, should notify customers well in advance of the sunset date, or should make or alter the games so they can be played independent of the company keeping services running.
To that end, the movement managed to get enough signatures in the EU to get a parliamentary hearing, which was reported to have gone quite well. That’s why it’s a surprising to learn that the EU just ruled out issuing that kind of mandate to publishers. Instead, the EU wants this to be a voluntary process, and what it’s citing as the reason it can’t be done by mandate is breaking my brain.
The European Commission said on Tuesday it cannot require video games to remain playable after they are withdrawn from sale, but will work with industry and consumer groups on a voluntary code of conduct for managing games’ “end of life”. The Commission said copyright and other intellectual property rules prevent it from imposing an obligation to keep games playable. It added it would work with consumer organisations and authorities to raise awareness of existing rights.
“Active enforcement of these existing consumer rights can also incentivise the providers to offer video games with longer lifespans and explore solutions for meeting consumer expectations,” the Commission said in a statement.
Copyright law is my reason why mandates like this should exist. Like, American law, EU copyright law offers protection for a work for the author’s life plus 70 years. After that, the work enters the public domain. Unless, that is, we’re talking about video games that require backend support, in which case it never enters the public domain and instead just vanishes into vapor. And that, I have repeatedly argued, breaks the copyright bargain entirely. In fact, it seems to me that it breaks it so completely that works like that shouldn’t even get copyright protections without rules such as exactly what Stop Killing Games is advocating for.
And this plan for publishers to do all of this voluntarily? Don’t make me laugh. The very gaming companies that the EU wants to take this sort of preservation effort on voluntarily lobbied against codifying preservation efforts. Why would they do that if they were willing to do this all voluntarily?
Finally, the point of all of this is not merely to make games playable for longer. It’s to preserve them as close to infinitely as possible. That should be the aim of any cultural output.
Stop Killing Games hasn’t commented publicly on the decision yet. I doubt the movement will take this defeat lying down, however.
Filed Under: copyright, eu, eu commission, stop killing games, video games
Why you can trust TechRadar
We spend hours testing every product or service we review, so you can be sure you’re buying the best. Find out more about how we test.
With Microsoft seemingly intent on turning Windows into malware, Macs are increasingly appealing. But while the Apple tax is increasingly diminishing in the consumer laptop space, it’s rife when it comes to compatible monitors.
PC-focused alternatives have different colors, pixel densities and features that rarely play well with Macs, meaning users frequently fight losing battles matching what they see on their MacBook screen with a third-party monitor. But BenQ has taken note.
There are several models in BenQ’s new specialist Mac range. Most have 4K resolutions, are 27 or 32 inches in size, have 60Hz refresh rates and offer glossy or matte finishes.
There are two outliers: the 120Hz MA320UG and this, the glossy, 5K-resolution MA270S.
Setup is simple, with the clip-on stem affixing to the base with a single thumbscrew. There’s a generous amount of adjustment (including 150mm (5.9in) height and 90° bi-directional pivot) which, unlike Apple, BenQ doesn’t charge extra for.
There’s also an unApple-like plethora of ports including two HDMI, two Thunderbolt 4 and four USB 3.2 Gen 2 ports (two USB-C and two USB-A), with up to 96 watts of USB-C power delivery — so one cable can connect and charge a laptop.
The USB-A ports also offer 7.5W charging. They also facilitate KVM functionality to connect multiple devices. Apple’s monitors don’t.
Once connected, BenQ’s factory-calibrated screen instantly resembled the display of the MacBook sitting next to it. Like Apple’s own monitors, the MA270S has a native 5K resolution of 5120 x 2880, giving it a much higher pixel density (218 PPI) than 27-inch, 1440p PC equivalents (~109 PPI).
By default, macOS scales the interface to look like 2560 x 1440, which keeps text crisp without making everything tiny.
The IPS screen is very impressive with near-OLED levels of color saturation and LED-backlight-derived (almost completely) true blacks. Multimedia looks good at default settings (vibrant colors, respectable contrast and minimal noise in gradients), but turning on HDR significantly improves everything: more details simultaneously show up in shadows and highlights and all transitions become smooth.
The brightest highlights can blow out (with no easy fix), though, and note that the glossy coating can turn into a black mirror when displaying dark content.
A common curse of Retina displays is their sluggish speeds and the MA270S is no different. The slow, 5ms response time marries with a 60Hz refresh rate (it actually goes up to a Spinal Tap-esque 70Hz) to smear fast-moving objects across the screen, so forget about eye-friendly, fast-and-frantic gaming.
There’s no integrated webcam, but the two 3-watt speakers (surprisingly for a monitor) have well-rounded fidelity and sound good, despite not getting loud.
Despite having a joystick button, most advanced image settings are controlled by BenQ’s impressive DisplayPilot 2 app. The customizable options can easily swap between color modes and toggle settings like Low Blue Light.
Its FocuSync settings match Mac Focus adjustments and enable you to change core monitor settings using Mac settings. You can also auto-sync different color modes with different applications.
While it’s not a cheap monitor, the BenQ MA270S is significantly cheaper than Apple’s own Studio displays and, therefore, serves as a much-needed, more affordable, third-party alternative in a monopoly market.
|
Screen size |
27-inch |
|
Aspect ratio |
16:9 |
|
Resolution |
5120 x 2880 (5K) |
|
Brightness |
450 cd/m² typical |
|
Refresh rate |
70Hz |
|
Response time |
5ms GTG |
|
Viewing angle |
178°(H)/178°(V) |
|
Contrast ratio |
2,000:1 |
|
Color coverage |
99% sRGB, 99% P3 |
|
Inputs |
2x HDMI 2.1, 1x Thunderbolt 4 (96W PD), 1x Thunderbolt 4 out (15W PD), 1x USB-C 3.2 Gen 2 (35W PD), 1x USB-C 3.2 Gen 2 (15W PD), 2x USB-A 3.2 Gen 2 (7.5W charging), headphone jack |
|
Dimensions |
43.0-58.0 x 61.4 x 22.0cm with stand (16.9-22.8 x 24.2 x 8.7in); 36.8 x 61.4 x 7.6cm without stand (14.5 x 24.2 x 3.0in) |
|
Weight |
8.64kg with stand (19.1lb); 5.7kg without stand (12.6lb) |
Apple users have always been an aesthetically appreciative bunch, and so they’ll warm to the color scheme of the BenQ MA270S, which apes that of a standard silver MacBook.
The stand is simple to assemble and offers a generous amount of movement in every direction. It’s remarkable that Apple charges more for a feature like this — it’s standard on many PC displays.
The multiple ports (which offer different degrees of charging power) mean multiple devices can be simultaneously connected. Furthermore, a single keyboard and mouse can be shared across them using KVM functionality — unholy magic in the eyes of some Apple users.
The joystick button at the base of the screen only provides access to brightness, volume and input selections. To access more comprehensive settings, you’ll need the BenQ DisplayPilot 2 app and the OSD handily provides a QR code to locate it on BenQ’s own website (it’s not in the App Store).
It provides access to basic brightness settings, eye comfort and HDR, but also FocuSync settings on your Mac.
Another feature is the rubberized pad on the base of the stand. It provides slightly superior softness and friction compared to the plastic stand, so you may be more tempted to rest your phone on it.
The only element that some users might miss is a built-in webcam. Apple’s own monitors have them, but purchasers of the BenQ MA270S will have to buy a separate unit or use the one in their MacBook.
The best thing you can say about the BenQ MA270S is that it just works. Just connecting it to your MacBook provides you with a matching image of your MacBook’s screen without having to fiddle with countless settings.
The screen displays very sharp text, colors are very bright and vibrant, and contrast is generally impressive. However, bright areas and highlights can blow out rather easily when HDR is engaged.
The big drawback is that the 70Hz refresh rate is nowhere near enough to stop the sluggish 5ms pixel response time from smearing most moving objects across the screen. As such, it’s not good for gaming.
Unusually for a monitor, the two 3-watt speakers offer well-rounded fidelity with a modicum of bass. They don’t get particularly loud, though.
|
Value |
In terms of monitors, it’s not cheap. In terms of Apple monitors, it represents extraordinary value. |
4 / 5 |
|
Design |
The MA270S looks and feels like it belongs in Apple’s world — and that’s hard to achieve for third parties. |
4 / 5 |
|
Performance |
The colors, brightness, sharpness and contrast are everything we’d expect from an Apple monitor. Unfortunately, the sluggish speed is too. |
4 / 5 |
|
Final score |
At last, MacBook users can afford a compatible external monitor without breaking the bank. |
4 / 5 |
Want more options? Check out our guide to the best monitors for MacBook Pro.
The JaredFromSubway Ethereum MEV (Maximal Extractable Value) bot suffered a $15 million loss after an attacker manipulated the opportunity-detection logic by creating fake cryptocurrency trading opportunities.
The drain was detected on Saturday by blockchain security firm Blockaid, and today, JaredFromSubway confirmed that the attacker used fake pools and tokens to trick the bot into approving helper contracts.
According to Blockaid, the attacker deployed contracts designed to appear as profitable MEV opportunities to JaredFromSubway’s automated execution system.
The bot automatically analyzed routes and trade opportunities that seemed financially rewarding. It then generated the transactions needed to execute them, granting ERC-20 token approvals to contracts controlled by the attacker.
It appears that the attacker planned the heist carefully, as early transactions served as harmless tests to help confirm the bot’s action routines. Later, the threat actor changed the route so that the allowance was not consumed or revoked after the bot granted approvals.
The attacker accumulated valid spending permissions without immediately using them, reaching up to 92.1614 WETH approved to an attacker-controlled helper contract.
Finally, the attacker used the open approvals to withdraw WETH, USDC, and USDT from the JaredFromSubway MEV bot contract via the transferFrom function.
MEV bots are ultra-fast automated trading systems that scan Ethereum and other blockchains for opportunities to make money by exploiting the order and timing of transactions before they are included in a block.
JaredFromSubway is a private MEV operation with no publicly available code, known as one of Ethereum’s most aggressive and visible “sandwich”-bot operations.
In a sandwich attack, the bot detects a user’s pending trade, places a buy order immediately before it, and then sells immediately afterward, profiting from the price movement caused by the victim’s transaction.
The practice is controversial because it often results in worse prices for regular traders while generating profits for the bot operator.
Initially, JaredFromSubway offered a $3 million bounty to the attacker for the full return of the stolen funds, promising no further action would be taken.
After receiving no response, JaredFromSubway increased the bounty to $7.5 million for the return of just 50% of the stolen amount, with $1 million to be given to the community.
JaredFromSubway is also negotiating with “a white-hat hacking group” on the stolen $15 million but there is no confirmation of a deal yet.
Security teams log 54% of successful attacks and alert on just 14%. The rest move through your environment unseen.
The Picus whitepaper shows how breach and attack simulation tests your SIEM and EDR rules so threats stop slipping by detection.

A short exchange on X pulled antimatter propulsion back into view. Elon Musk posted that in the future a trillion times a trillion dollars will go toward making antimatter so people can travel to other star systems. He added that later civilizations may measure wealth in mass and energy rather than currency. NASA Administrator Jared Isaacman replied that he supports antimatter propulsion.
Antimatter is essentially the mirror image of normal matter. Every electron has a positron, which is identical to it but has the opposite charge. Each proton contains an antiproton. E = mc² states that when particles collide, their mass is converted into energy. When one gram of antimatter collides with one gram of normal matter, it produces energy equivalent to around 43 kilotons of TNT, nearly three times that of the Hiroshima bomb. Chemical rockets can only convert a small portion of their fuel into energy, and nuclear fusion is nowhere near as efficient as antimatter’s near-total conversion.
Sale
The fact that you can get a lot more oomph from a lot less mass has a significant impact on the type of spacecraft you can build. Current engines must carry the majority of their weight in propellant, which is then discarded as soon as it is spent. An antimatter system might provide a lot more kick for a lot less weight. What used to take six to nine months to get to Mars could now be done in weeks, and getting to the nearest stars, which takes tens of thousands of years with current technology, could be done in a human lifetime or a few decades at a few percent of lightspeed. Less time in space implies less radiation and weightlessness for any crew, which is a significant benefit.
I support antimatter propulsion.
— NASA Administrator Jared Isaacman (@NASAAdmin) June 19, 2026
Several ideas have been proposed to turn annihlation into thrust. One approach is to combine streams of antimatter and matter in a dedicated chamber, where the resultant particles and radiation are blasted out the rear. Another does the same thing, but uses the energy to heat up propellant such as hydrogen, which then expands and generates thrust. Then there’s the concept of using tiny amounts of antimatter to initiate larger nuclear fusion or fission reactions, which would stretch the limited antimatter supply even further. Every one of these approaches is trying to solve the same problem: how to take the energy release and turn it into push without wiping out the ship.
The trouble is that creating antimatter is an absolute nightmare. Particle accelerators generate it by smashing particles together, however the method creates extremely little amounts. After years of experimentation, places such as CERN have only been able to produce nanograms. Getting to the amounts required to create even a small probe would necessitate discovering new ways to manufacture it and much more effective ways to collect it.
[Source]
No matter your lifestyle or budget, Samsung has a range of phones to fit what you’re looking for. From baseline options such as the Galaxy S25 and S25 Plus (or the newer S26 and S26 Plus) to more premium picks like the Galaxy S26 Ultra, you can get solid performance, good cameras and long-lasting batteries regardless of how much you spend. Recently, the Galaxy S26 Ultra received a CNET Lab Award for having the fastest wired charging of 33 phones we tested. For a novel design, you can choose Samsung’s pricier Galaxy Z Flip 7 or Z Fold 7, or even the super-slim Galaxy S25 Edge. And for a more affordable option, Samsung’s A26, A36 or A56 might be a good fit. Our roundup can help you find the phone that best fits your needs.
The Samsung Galaxy S26 Ultra isn’t a radical upgrade from the S25 Ultra, but there are improvements where it counts. It’s the thinnest and lightest Ultra, at 7.9mm thick and 214 grams — an admittedly minor slim-down that’s still noticeable. The $900 Samsung Galaxy S26 is a leading flagship with a price hike, and while it’s extremely likely that Samsung is just the first phonemaker in 2026 to give its phone a price hike, it still stings to have to pay $100 above last year’s Galaxy S25. Still, there are some notable software and AI upgrades, including the impressive Horizontal Lock feature that super-stabilizes recorded footage, no matter how you twist the phone around while shooting video.
With the Galaxy Z Fold 7, Samsung has finally addressed some of the key issues with its previous book-style foldables. The impressively thin build and wider, 6.5-inch cover screen makes this feel like a standard phone when closed, and that wider 8-inch inside display is great for multitasking, with the ability to run up to three apps simultaneously. Perhaps most notably, the camera gets a major upgrade with the addition of a 200-megapixel main camera, which takes shots on par with the top-of-the-line S25 Ultra.
When I first got my hands on Samsung’s new Galaxy Z Flip 7, I was delighted to discover that it has a smaller crease, larger cover screen, thinner design and bigger battery compared to last year’s Galaxy Z Flip 6. But as I tested the new clamshell phone, I became enthralled by its inner screen. At 6.9 inches, this is the biggest screen on any Samsung phone aside from the Galaxy Z Fold 7, which has an 8-inch foldable display.
The Galaxy S25 Edge is a unique offering — one that doesn’t necessarily cater to most people’s top priorities like longer battery life and an affordable price tag, yet it still presents an alluring option with its slim frame and lightweight body. Thankfully, it doesn’t scale back too much in the way of features and capabilities; it has a powerful Snapdragon 8 Elite processor and the same 200-megapixel main camera you’ll find on the top-of-the-line S25 Ultra (although there’s no telephoto lens).
Jump to details
Pros
Cons
Jump to details
Pros
Cons
Jump to details
Pros
Cons
Jump to details
Pros
Cons
Jump to details
Pros
Cons
Deals are selected by the CNET Group commerce team, and may be unrelated to this article.
The Samsung Galaxy S26 Ultra — priced at $1,300 — comes packed with maximum features that, for most people, is more than necessary. For the rest of us, last year’s $800 Galaxy S25 is a standout among its Galaxy counterparts. Even with the release of the S26, the S25 remains a solid pick because it has much of the same hardware, software and AI capabilities as it pricier (and newer) peers.
The Galaxy S25 has a very capable triple rear camera setup that is versatile in capturing bright outdoor scenes and candid moments inside under mixed lighting. Overall, the Galaxy S26 Ultra is ideal for Android fans who prioritize fast performance, versatile cameras and a spacious screen.
Deciding which Samsung phone is right for you comes down to what you want in a phone and how much you’re willing to spend. If you want the largest screen available on a standard Samsung phone, enjoy note-taking with a stylus and need a camera with a significantly closer zoom, the Galaxy S26 Ultra is the right choice. You’ll also have to spend $1,300 unless you score a trade-in deal.
Those who don’t need the stylus, prefer a more compact phone and still want a good camera should consider the Galaxy S25 or Galaxy S25 Plus. If you just want the basics, like a spacious screen, 5G and a decent camera, consider the Galaxy S25 FE. Those looking for the flashiest tech around — and who also have deep pockets — should consider the Galaxy Z Fold 7, Z Flip 7 or even the slightly cheaper Z Fold 7 FE.
Finding the best Samsung phone will ultimately come down to preference. Choosing among so many options can get complicated, so here’s how to decide which Samsung phone is best for you. Refer to our phone buying guide for more tips on how to choose the right device.
The Samsung Galaxy S26 Ultra isn’t a radical upgrade from the S25 Ultra, but there are improvements where it counts. It’s the thinnest and lightest Ultra, at 7.9mm thick and 214 grams — an admittedly minor slim-down that’s still noticeable.
The hardware advancement that steals the spotlight is the Privacy Display, which prevents others from seeing what’s on your screen. Unlike a $10 screen protector you can buy from Amazon, you can toggle Privacy Display on for certain apps, like a banking app or your email, as well as your lock screen so no one sees your password or PIN. You can also enable it just for incoming notifications, so only part of your screen gets blacked out.
The S26 Ultra carries over the same camera specs as last year, but it consistently delivers high-quality images. Plus, a neat new Horizontal Lock feature when recording videos keeps the horizon level even as you rotate your phone 360 degrees, leading to astonishingly stable footage.
Why we like it
The S26 Ultra prioritizes hardware and software. Along with a thinner design and the Privacy Display, there’s also a handful of new intuitive AI features. For instance, Now Nudge surfaces real-time suggestions based on what’s on your screen, so if someone asks for photos from your trip, it’ll automatically point you toward those images in your Gallery so you don’t have to dig for them. And Document Scan will automatically appear when you’re snapping a photo of a document to remove shadows and creases, then let you export the final product as a PDF. Plus, the S26 Ultra’s battery can last well over a day and a half, which is a major perk.
Who’s it best for
If you’re a power user who likes having a larger display, a bigger battery and top-notch cameras — as well as the signature S Pen — the S26 Ultra is the way to go. It’s a great choice for anyone who doesn’t want to worry about charging their phone at the end of each day, as the battery can last well over a day and a half.
Who shouldn’t get it
The S26 Ultra keeps its $1,300 price, even amid a RAM shortage that threatens to raise phone prices. But that’s still not pocket change. If you don’t need the most high-end cameras and prefer a smaller device, the baseline S26 shares many of the same features as the Ultra, including the Snapdragon 8 Elite Gen 5 processor, plus all those AI features.
The $900 Samsung Galaxy S26 is a leading flagship with a price hike, and while it’s extremely likely that Samsung is just the first phonemaker in 2026 to give its phone a price hike, it still stings to have to pay $100 above last year’s Galaxy S25. Still, there are some notable software and AI upgrades, including the impressive Horizontal Lock feature that super-stabilizes recorded footage, no matter how you twist the phone around while shooting video.
Why we like it
The Galaxy S26 is a leading smartphone, a jack-of-all-trades that is blisteringly fast, takes good photos, shoots great videos, runs games well and has a decent battery. It’s not the absolute best at any of these among today’s top phones, but makes the podium for most of them, so it’s an easy all-around choice. The new AI features are fun if situational, and aside from Horizontal Lock, there aren’t any standouts. If one has to pay more for the phone, at least it starts at 256GB of storage.
Who it’s best for
The Galaxy S26 is a reliable pick for anyone who just wants a great phone that can do anything. While it won’t win any battery longevity awards (especially compared to the OnePlus 15), its cameras remain stellar, and its Snapdragon 8 Elite Gen 5 chip with 12GB of RAM result in smooth operations. It still gets seven years of software and security updates, so buyers will be able to keep it around for years and expect some AI features from newer Galaxy phones to trickle down, too.
Who shouldn’t get it
The Galaxy S26 doesn’t quite excel in anything — not battery life (OnePlus 15), photo AI features (Google Pixel 10 Pro) or display tech (its Galaxy S26 Ultra sibling). And the $100 price hike is more a reflection of the state of the industry than a result of upgrades, so if you’re looking for a cheaper but still powerful option, look for the Galaxy S25 or other Android phones from last year.
With the Galaxy Z Fold 7, Samsung has finally addressed some of the key issues with its previous book-style foldables. The impressively thin build and wider, 6.5-inch cover screen makes this feel like a standard phone when closed, and that wider 8-inch inside display is great for multitasking, with the ability to run up to three apps simultaneously. Perhaps most notably, the camera gets a major upgrade with the addition of a 200-megapixel main camera, which takes shots on par with the top-of-the-line S25 Ultra.
Altogether, it’s a great choice if you want a bigger, tablet-like display without the bulk or a compromise on camera quality.
Why we like it
The Z Fold 7 does a solid job combining what’s great about standard slate phones and what’s great about foldables. It feels wonderfully normal to hold when closed, thanks to its sleek design and lightweight build. It also packs great cameras and has an expansive main display that’s 11 percent bigger than last year’s Z Fold 6.
Thankfully, a slimmer build doesn’t force the battery to take a hit; the Z Fold 7 maintains that same 4,400-mAh battery as last year’s foldable. That pales in comparison to batteries from Chinese competitors, but at least it’s not a downgrade. The Z Fold 7 also packs a Snapdragon 8 Elite processor to power the many AI features you’ll get onboard, from Galaxy AI photo and audio editing tools to Google’s Gemini Live and Circle to Search. The phone also supports seven years of software and security updates.
Who it’s best for
If you’re bored of standard slate phones and want something that feels a little more exciting, the Galaxy Z Fold 7 is a great choice. The slim design and wider cover screen helps it to feel as normal as possible when closed, with the added perk of an expansive main display that’s great for multitasking and watching videos. The cameras are also impressive for a foldable that’s so thin.
Who shouldn’t get it
The Z Fold 7’s $2,000 price tag is perhaps its biggest caveat. Also, if you don’t need a bigger display, it may not be worth the splurge. Ironically, the cover screen is so practical that you’ll rarely need to open the phone — unless you’re watching movies or multitasking, in which case a phone like the Galaxy S25 Ultra might be a better fit.
When I first got my hands on Samsung’s new Galaxy Z Flip 7, I was delighted to discover that it has a smaller crease, larger cover screen, thinner design and bigger battery compared to last year’s Galaxy Z Flip 6. But as I tested the new clamshell phone, I became enthralled by its inner screen. At 6.9 inches, this is the biggest screen on any Samsung phone aside from the Galaxy Z Fold 7, which has an 8-inch foldable display.
The Z Flip 7’s large screen size makes content feel more immersive and colors look lovely and vivid. This led to epic TikTok and Instagram sessions, watching widescreen films such as A Working Man and Back to the Future, as well as jumping between two apps stacked vertically on the screen thanks to One UI 8’s 90:10 split tool.
Every time I open the Flip 7, I’m consistently dumbfounded by how such a large display can unfurl from something about the size of a makeup compact. And when it’s closed, there’s a 4.1-inch cover screen that’s fantastic in its own ways, with new clever animations for when you’re recording a video, charging the phone or taking a selfie, all efficiently using the extra display real estate. In terms of functionality, though, the cover screen’s software is about the same as the 3.4-inch one on the Flip 6.
The Flip 7 impressed me in nearly every way but one: its battery life. It has a larger battery than the Flip 6, but it doesn’t last any longer in daily use. It did consistently get me through a day on a single charge, often having 15% to 20% left, but there were also a few days where it needed an early evening top-off.
Why we like it
The Galaxy Z Flip 7 is the most fully realized version of Samsung’s ideal of a flip phone since the launch of the original Galaxy Z Flip in 2020. The Flip 7’s appeal is simple: It’s a thin phone with a big, bold screen that folds in half into a coaster-sized square. The larger cover screen and inner screen make content more immersive. It’s design is thin (for a clamshell foldable) and comfortable to hold. Plus you get twice the storage this year compared to last.
Who is it best for
If you’ve been tempted by a clamshell-style foldable, you should definitely consider the Flip 7. If you have a Galaxy Z Flip 4 or older, the Flip 7 will be an upgrade in every way. It’s harder to make that same recommendation for Flip 5 owners unless your phone is showing its age. And if you have a Galaxy Z Flip 6, you can sit this one out unless you really want those larger screens.
Who shouldn’t get it
If you spend a ton of time around dirt or sand, this phone isn’t for you.
The Galaxy S25 Edge is a unique offering — one that doesn’t necessarily cater to most people’s top priorities like longer battery life and an affordable price tag, yet it still presents an alluring option with its slim frame and lightweight body. Thankfully, it doesn’t scale back too much in the way of features and capabilities; it has a powerful Snapdragon 8 Elite processor and the same 200-megapixel main camera you’ll find on the top-of-the-line S25 Ultra (although there’s no telephoto lens).
The main sacrifice is battery life, as the S25 Edge has a 3,900-mAh battery, the lowest amount across the S25 series. It also only supports 25-watt wired charging. But it still offers enough juice to get you through the day, even if you’re a notoriously heavy phone user. Plus, using something so remarkably feather-light feels like such a breath of fresh air, you may not mind making some compromises. Read our full Galaxy S25 Edge review.
Why we like it
The S25 Edge is surprisingly enjoyable to use and hold, given its lightweight design (it weighs 163 grams) and generous 6.7-inch screen. And despite its thinner frame, it feels surprisingly sturdy, thanks to its Gorilla Glass Ceramic 2 display and Victus 2 backing. That means it doesn’t feel like it’s going to snap in your pocket — and you’ll hardly even feel it in there.
Who it’s best for
If you want a phone that feels light and can slip easily into your pocket, without compromising too much on functionality, the S25 Edge is a great option. Plus, if you’re looking for a fresh form factor but aren’t interested in venturing into foldables territory, this unique phone is a solid — and more familiar-feeling — choice.
Who shouldn’t get it
If you’re looking for something more budget-friendly, the S25 Edge might not appeal to you. Also, if battery life is your top priority, the S25 Edge leaves something to be desired, as it only lasts about 24 hours before needing a recharge.
| Samsung Galaxy S26 Ultra | Samsung Galaxy S26 | Samsung Galaxy Z Fold 7 | Samsung Galaxy Z Flip 7 | Samsung Galaxy S25 Edge | |
|---|---|---|---|---|---|
| Display size, tech, resolution, refresh rate | 6.9-inch AMOLED; 3,120×1,440 pixels; 1-120Hz adaptive refresh rate | 6.3-inch AMOLED; 2,340×1,080 pixels; 1-120Hz adaptive refresh rate | 6.5-inch AMOLED, 2,520×1,080p, 1 to 120Hz refresh rate; 8-inch AMOLED, 2,184×1,968p, 1 to 120Hz refresh rate | 4.1-inch AMOLED; 1,048×948 pixels; 120Hz refresh rate; 6.9-inch AMOLED; 2,520×1,080 pixels; 1 to 120Hz refresh rate | 6.7-inch QHD+ AMOLED display; 120Hz refresh rate |
| Pixel density | 500 ppi | 411 ppi | Cover: 422 ppi; Internal: 368 ppi | Cover: 342ppi; Internal: 397ppi | 513 ppi |
| Dimensions (inches) | 6.44×3.07×0.31 | 5.89×2.82×0.28 | Open: 5.63 x 6.24 x 0.17 in; Closed: 2.87 x 6.24 x 0.35 in | Open: 2.96 x 6.56 x 0.26 in; Closed: 2.96 x 3.37 x 0.26 in | 2.98 x 6.23 x 0.23 inches |
| Dimensions (millimeters) | 163.6×78.1×7.9 | 149.6×71.7×7.2 | Open: 143.2 x 158.4 x 4.2mm; Closed: 72.8 x 158.4 x 8.9mm | Open: 75.2 x 166.7 x 6.5mm; Closed: 75.2 x 85.5 x 13.7mm | 75.6 X 158.2 X 5.8mm |
| Weight (grams, ounces) | 214 g (7.55 oz.) | 167g (5.89 oz.) | 215g (7.58 oz.) | 188g (6.63 oz.) | 163g (5.75 oz) |
| Mobile software | Android 16 | Android 16 | Android 16 | Android 16 | Android 15 |
| Camera | 200-megapixel (wide), 50-megapixel (ultrawide), 10-megapixel (3x telephoto), 50-megapixel (5x telephoto) | 50-megapixel (wide), 12-megapixel (ultrawide), 10-megapixel (3x telephoto) | 200-megapixel (wide), 12-megapixel (ultrawide), 10-megapixel (telephoto) | 50-megapixel (wide), 12-megapixel (ultrawide) | 200-megapixel (wide), 12-megapixel (ultrawide) |
| Front-facing camera | 12-megapixel | 12-megapixel | 10-megapixel (inner screen); 10-megapixel (outer screen) | 10-megapixel | 12-megapixel |
| Video capture | 8K | 8K | 8K | 4K | 8K |
| Processor | Qualcomm Snapdragon 8 Elite Gen 5 for Galaxy | Qualcomm Snapdragon 8 Elite Gen 5 for Galaxy | Qualcomm Snapdragon 8 Elite for Galaxy | Samsung Exynos 2500 | Snapdragon 8 Elite |
| RAM + storage | 12GB RAM + 256GB; 16GB RAM + 512GB, 1TB | 12GB RAM + 256GB, 512GB | 12GB + 256GB, 12GB + 512GB, 16GB + 1TB | 12GB + 256GB, 12GB + 512GB | 12GB RAM + 256GB, 512GB |
| Expandable storage | None | None | None | None | No |
| Battery | 5,000 mAh | 4,300 mAh | 4,400 mAh | 4,300 mAh | 3,900 mAh |
| Fingerprint sensor | Under display | Under display | Yes | Yes | Under display |
| Connector | USB-C | USB-C | USB-C | USB-C | USB-C |
| Headphone jack | None | None | None | None | None |
| Special features | Aluminum frame; 7 years of OS and security updates; IP68 water and dust resistance; wireless PowerShare to charge other devices; integrated S Pen; UWB for finding other devices; 60W wired charging (charger not included); 25W wireless charging; no magnets for accessories; Galaxy AI; Gorilla Glass Armor 2 cover glass; privacy display | 2,600-nit peak brightness; 7 years of OS and security updates; IP68 water and dust resistance; wireless PowerShare to charge other devices; 25W wired charging (charger not included); 15W wireless charging; lacks built-in magnets; Gorilla Glass Victus 2 cover screen; Galaxy AI | One UI 8, 25W wired charging speed, Qi wireless charging, 2,600-nit peak brightness, Galaxy AI, NFC, Wi-Fi 7, Bluetooth 5.4, IP48 water resistance | One UI 8, IP48 water resistance, 25W wired charging, Qi wireless charging, Wi-Fi 7, Bluetooth 5.4, Galaxy AI | IP88 rating, 5G, One UI 7, 25-watt wired charging, 15-watt wireless charging, Galaxy AI, Gemini, Circle to Search, Wi-Fi 7. |
| US price starts at | $1,300 (256GB) | $900 (256GB) | $2,000 (256GB) | $1,100 (256GB) | $1,100 |
In March 2026, we added the Samsung Galaxy S26 Ultra to our list.
Get more for less with cheap phones: For a fraction of the cost, you can get a solid phone that does almost everything a pricier flagship phone can do. The Galaxy S25 FE packs a good camera and costs only $650 before discounts or trade-in offers.
Test your phone: It’s worth going to a store and trying out a phone before you shell out hundreds of dollars for it.
Find peace of mind with a case: You spent all this time picking a phone, now protect it from damage with a case.
We test the battery, screen, performance, cameras and more on every phone we review.
We test every phone in real-world scenarios focusing on its features, design, performance, cameras, battery life and overall value. We document our findings in an initial review that is periodically updated when there are new software updates or to compare against new phones from competitors like Apple, Google, OnePlus and Samsung.
Photography is a major focus for most phones these days, so we take pictures and videos of various subjects in a variety of settings and lighting levels. We try out any new camera modes such as 4K 120fps slow motion video that debuted with the iPhone 16 Pro and 16 Pro Max or AI reframe and focus on the Motorola Razr Plus (2024).
Battery testing is conducted in a variety of ways. We assess how long a phone lasts during a typical day of use, and note how it performs during more focused sessions of video calls, media streaming and gaming. We also conduct a video playback test, which isn’t always included in the initial review and is added later in an update.
We perform processor-heavy tasks like editing photos, exporting videos and playing games. We evaluate whether a newer version of a particular phone includes enough features to make it worth upgrading from older models.
We use benchmarking apps to measure the performance, alongside our own anecdotal experiences using the phone for our review. Of particular note are how graphics and animations look. Are they smooth? Or do they lag or stutter? We also look at how quickly the phone switches between horizontal and vertical orientations and how fast the camera app opens and is ready to take a photo.
Read more: How We Test Phones
An anonymous reader quotes a report from Ars Technica: Dozens of new robot arms have been installed at General Motors’ flagship electric vehicle factory in Detroit — even as 1,300 workers remain out of work following what was supposed to be a temporary layoff. The latest automation push has spurred union pushback over a potentially existential issue for automakers and their workers. General Motors installed approximately 50 robot arms at GM’s Factory Zero plant in Detroit, Michigan, according to reporting by Crain’s Detroit Business. Made by the Japanese robotics company FANUC, the robots are designed to help attach various components to vehicles during the assembly line process. But leaders at United Auto Workers (UAW), the primary US union for autoworkers, reacted with anger to the new robotic presence, given how GM has not yet called back any of the workers affected by supposedly temporary layoffs in March.
More than 1,000 union members are still “laid off indefinitely,” James Cotton, president of UAW Local 22, told The Detroit News. He said that the company could bring some of those members back to work instead of installing the 50 robots. The temporary layoffs were preceded by permanent layoffs involving another 1,200 workers at GM’s Factory Zero in October 2025. Many automakers, including Stellantis NV and Ford Motor Company, have deployed assembly-line robots, such as Fanuc robot arms, as they push to automate more of their US operations. Hyundai Motor Company plans to deploy Atlas humanoid robots made by Boston Dynamics — which Hyundai acquired in 2020 — to start working in the automaker’s flagship EV facility in Georgia by 2028. “Technological development has the capability of making work safer for the working class and enabling workers to have a shorter work week without losing pay,” said Andrew Bergman, a Local 22 member and union organizer who was among those laid off by GM. “But in the bosses’ and billionaires’ hands it’s used to pad profits and lay off workers.”
Camping can take many forms depending on who you ask. Some limit the term to mean a bedroll under a tarp or using sleeping bags and tents. However, there is a growing trend called “glamping,” which has gained momentum in both Europe and the U.S.
Glamping is like camping mixed with the amenities of a luxury resort. You don’t necessarily need an expensive RV to participate, with these high-tech camping accessories that will take your glamping trip to the next level. According to Grand View Research, in 2025, the glamping market size in the states reached 3.79 billion. Accordingly, the RV industry is providing those interested in a more extravagant outdoor experience with plenty of options, at a price.
One of the manufacturers and models capturing a lot of attention, is Kopf’s Eldorado Series of towable units. What sets these apart from many other RV’s and crosses the line into tiny home territory is the permanent structure look and high-end features. Standards like a full residential front door, king size bed, full size bathtub, tiled shower and kitchen backsplash and shiplap laden walls, take things to another level. In addition, you can select upgrades like OSB (oriented strand board) construction and Tyvek (weather-resistant barrier) construction, which are used on many permanent home builds. There’s even an option for a porch complete with outdoor ceiling fan and railing.
Within the Eldorado series, Kropf has 10 different layouts with its largest being 12 by 45.5 feet and the smallest being 12 by 34 feet. This means you get between around 400 up to nearly 550 square feet, depending on the model. Some, like the Eldorado 9092 come equipped with two-bedroom setups, though you’ll lose out on the opportunity for a porch.
In terms of appliances, these units come with 22-cubic foot refrigerators, which is equal to some full-size units you might find in a typical residential home. While the manufacturer doesn’t specify brand, some RV dealers have noted the appliances are GE Café, a high-end refrigerator brand.
Of course, all of this comes at a cost, which is higher than you might think. For example, you can find the 2026 Kropf Eldorado 9101PWD on sale for around $160,000, assuming a dealer’s website will even provide the price upfront, as many require you to submit information and receive a quote via email. Which brings to mind the old adage attributed to JP Morgan—”If you have to ask how much it costs, you can’t afford it.”
Even though these models are sold by RV retailers, they aren’t really meant to be moved around as frequently as a fifth wheel. In fact, these units are referred to as “park model homes,” which are meant to remain in place for extended periods rather than something you’d pull around to different destinations. Think more along the lines of a semi-permanent vacation property; it’s not something suitable for a new destination every few months.
In fact, these units are closer to traditional mobile homes than RVs and require some additional consideration when moving. For instance, the best dually trucks offer plenty of torque to pull a fifth-wheel RV. However, a park model home like the Eldorado is too bulky for just anyone to move it. In order to transport it legally, it’ll be labeled as a “wide load,” or even “oversized load,” due to its width exceeding 8.5 feet and extended length. Transporting anything that large requires special permits. Plus, your destination must also offer suitable space and access for drop-off and pick-up.
Looking for the most recent Mini Crossword answer? Click here for today’s Mini Crossword hints, as well as our daily answers and hints for The New York Times Wordle, Strands, Connections and Connections: Sports Edition puzzles.
Need some help with today’s Mini Crossword? It wasn’t too tricky today, but read on for all the answers. And if you could use some hints and guidance for daily solving, check out our Mini Crossword tips.
If you’re looking for today’s Wordle, Connections, Connections: Sports Edition and Strands answers, you can visit CNET’s NYT puzzle hints page.
Read more: Tips and Tricks for Solving The New York Times Mini Crossword
Let’s get to those Mini Crossword clues and answers.
The completed NYT Mini Crossword puzzle for June 23, 2026.
1A clue: Baseball officials
Answer: UMPS
5A clue: Singer/songwriter Kahan with the 2026 #1 album “The Great Divide”
Answer: NOAH
6A clue: Shade of orangeish pink
Answer: CORAL
8A clue: Voice of the smartphone generation?
Answer: SIRI
9A clue: Partner channel of ABC
Answer: ESPN
1D clue: Someone giving off old guy vibes, in Gen Z slang
Answer: UNC
2D clue: Tallest animal in North America
Answer: MOOSE
3D clue: Setting for “Ratatouille”
Answer: PARIS
4D clue: #, in music
Answer: SHARP
7D clue: ___-Manuel Miranda
Answer: LIN
Not every company can or should build their own frontier AI language model. However, the harness controlling the model is something that most enterprises can and should customize for their specific purposes.
Of course, this is easier said than done. Agent harnesses are still largely tuned through manual, ad hoc debugging — a process that relies heavily on intuition rather than systematic feedback loops, making it difficult to keep pace with rapidly evolving LLMs.
To solve this challenge, researchers at the Shanghai Artificial Intelligence Laboratory have introduced “Self-Harness,” a new paradigm in which an LLM-based agent systematically improves its own operating rules. By examining its own execution traces to apply edits, the system trades manual guesswork for empirical evidence.
Self-improving harnesses can enable development teams to deploy robust custom agents that continually adapt their own execution protocols to overcome model-specific weaknesses.
An LLM-based agent’s performance is not determined solely by its underlying base model, but also by its harness: the surrounding system that provides context and enables the model to interact with the environment. A harness includes components like system prompts, tools, memory, verification rules, runtime policies, orchestration logic, and failure-recovery procedures.
This layer is crucial because many common agent failures stem from the harness rather than the model. For example, an agent may report success without checking the model’s response (e.g., running the code to see if it passes the tests), or it might retry a failed action repeatedly. The harness is also responsible for preventing context rot or overload when the agent’s interaction history grows very large. Examples of popular harnesses include SWE-agent, Claude Code, Codex, and OpenHands.
Harness engineering remains a significant challenge, but the bottleneck isn’t necessarily that humans are too slow or incapable.
In fact, Hangfan Zhang, lead author of the Self-Harness paper, told VentureBeat that “in many cases, an experienced engineer with deep domain knowledge can still propose better changes than an LLM can today.”
Instead, the true bottleneck of manual engineering is that it relies heavily on ad hoc debugging rather than a verifiable, empirical feedback loop. “The deeper issue is that the current harness-engineering paradigm often lacks a systematic feedback loop,” Zhang explained. “Many edits are made based on intuition, a few observed failures, or ad hoc debugging.”
With new models being released at a rapid pace, depending on human intuition to manually tune model-specific harnesses becomes increasingly costly and untenable. While some approaches use stronger models to improve the harnesses of weaker target agents, this dependence on external guidance has its own challenges, as these models may be costly, unavailable for frontier models, or mismatched to the target model’s failure modes.
The Self-Harness paradigm enables an LLM-based agent to improve its own harness without relying on human engineers or stronger external models.
This continuous self-evolution is driven by a three-stage iterative loop that turns behavioral evidence into harness updates:
Weakness mining: Starting from an initial harness, the agent runs a set of tasks, producing execution traces with verifiable outcomes. The agent categorizes failed traces and tries to detect model-specific failure patterns.
Harness proposal: Based on these failure patterns, the agent uses a “proposer” role to generate a set of diverse yet minimal harness modifications, each tied to a specific failure mechanism to avoid overly general corrections.
Proposal validation: The system evaluates candidate modifications through regression tests. An edit is promoted only if it improves performance without causing measurable degradation on held-out tasks. If multiple candidate modifications pass the regression tests, they are merged into the next version of the harness, which then serves as the starting point for the next iteration.
To visualize why an enterprise would need this, imagine an automated issue-fixing agent that reads internal documentation, writes patches, and opens pull requests. If the company updates its documentation style, the agent might suddenly fail, pulling the wrong context or writing bad patches.
On the surface, the agent simply looks broken. But Self-Harness turns this ambiguous failure into a solvable problem. “The failure traces expose where the agent is misusing the new documentation format; the proposer can generate a targeted harness edit… and the evaluator can decide whether that edit improves the failing cases without regressing other cases,” Zhang said.
The researchers evaluated Self-Harness on Terminal-Bench-2.0, a benchmark that tests general tool-based execution, including artifact management, command use, verification behavior, and recovery from execution errors. They applied Self-Harness with MiniMax M2.5, Qwen3.5-35B-A3B, and GLM-5.
To isolate the impact of the self-evolving harness, they started with a minimal harness built upon the DeepAgent SDK, containing only the benchmark-facing system prompt, and the default filesystem and shell tools. The model backend, tool set, benchmark environment, and evaluator were kept unchanged while only the harness was allowed to vary.
The quantitative results show that agents improved their performance through automated harness edits. On held-out tasks, performance jumped significantly across the board, ranging from 33 to 60 percent relative improvements for different models.
Importantly, an explicit acceptance rule promotes only those edits that improve performance without introducing unacceptable regressions. What makes Self-Harness powerful for enterprise applications is that it doesn’t simply make the prompt longer or add generic instructions. Instead, it introduces targeted changes that reflect the recurring problems each model encounters during execution.
For example, under the baseline harness, MiniMax M2.5 would get stuck endlessly exploring dataset configurations until the execution environment timed out, failing to produce any deliverables. Through Self-Harness, the system identified this specific flaw and wrote a “loop breaker” into its runtime policy, forcing the agent to stop and redirect its approach after 50 tool calls. It also added a rule to create an initial version of required artifacts as early as possible.
On the other hand, Qwen-3.5 had a habit of hitting a file overwrite error and then blindly retrying the same command repeatedly, eventually deleting necessary files out of confusion before stopping. The self-harness fixed this by introducing a strict command-retry discipline (forbidding exact duplicate commands) and a mechanism that forced the agent to immediately recreate any missing artifacts if a file error occurred.
GLM-5 struggled to preserve environment changes across different commands, and would often waste time on massive downloads or finalize tasks even when sanity checks were failing. Its self-generated harness introduced rules instructing the agent to persist PATH variables across shell sessions, limit external compute, and repair any failed sanity checks before concluding its run.
While Self-Harness automates the tedious work of tracking down idiosyncratic model failures, decision-makers must be realistic about the trade-offs. Replacing human engineering with automated trial-and-error requires significant computational overhead.
“Self-Harness replaces part of the human engineering burden with repeated proposal generation, parallel candidate evaluation, and regression testing,” Zhang said. “That can mean more API tokens, more latency during optimization, and more infrastructure for running evaluation tasks.”
Also, this system relies on the accuracy of its evaluation pipeline. During their experiments on Terminal-Bench-2.0, the researchers relied on strict, deterministic verifiers to ensure the agent’s edits were actually helpful. Without this rigorous ground truth, an automated system risks promoting bad updates. “[The] evaluation system is not an optional component; it is what lets us trade human intuition for empirical evidence,” Zhang said.
This reliance on strict verifiers also dictates where Self-Harness should be deployed. “The best deployment targets today are environments where failures can be measured and where trial-and-error is relatively safe,” Zhang said, pointing to coding, internal workflow automation, and DevOps data pipelines as ideal use cases.
Conversely, enterprises should avoid fully automating harnesses in high-stakes or subjective fields. “The clearest red flags are domains where evaluation is subjective, delayed, non-deterministic, or costly to get wrong, such as medical decision-making, safety-critical infrastructure, or legal decisions.”
The introduction of self-improving agents does not mean coding or enterprise workflows will suddenly become human-free. The quality of collaboration between the human engineer and the AI is still paramount and difficult to capture with automated benchmarks.
Instead, the engineering profession is moving up the abstraction layer. “The role of enterprise engineers will shift from manually patching individual prompts or tool calls toward designing the feedback systems that make agent improvement possible,” Zhang predicted. Moving forward, “the engineer becomes less of a prompt tweaker and more of a feedback architect.”
As foundational models grow more capable, they will naturally absorb many capabilities that currently require manual harness engineering. “But once that happens, the harness will not disappear; its scope will move outward to connect the model to richer external environments,” Zhang said. “Until that boundary moves beyond what humans can evaluate, humans will remain critical providers of feedback.”
Weekend Open Thread: Miami – Corporette.com
The Adder At The Heart Of Intel’s 8087 FPU
Renter of Home in Anne Heche Crash Denies Settlement With Son
Microsoft accidentally kills epic Outlook email threads
Soccer-U.S. defends Iran World Cup travel restrictions, says discussions ongoing
BBC Reporter Discusses Cross Party Criticism Of Trumps Iran Deal
Wall Street Week Ahead: Investors see Micron earnings as pulse check of AI rally momentum
AWS enters the context layer race with a graph that learns from agents, not manual curation
HIVE shares jump as $220M AI deal speeds Bitcoin mining pivot
Can Charles Hoskinson Really Rescue Cardano?
Jake Chervinsky accuses CME of protecting derivatives monopoly
Nearly 7,000 fake Amazon domains registered ahead of Prime Day 2026, researchers warn
MHP SE 2026 Q1 – Results – Earnings Call Presentation (OTCMKTS:MHPSY) 2026-06-20
FIFA World Cup 2026: Canada beat 9-men Qatar 6-0 to register first ever win | FIFA World Cup 2026
Signal’s Meredith Whittaker says AI chatbots ‘are not your friends’ and calls Copilot agents a backdoor
Anthropic’s Dario Amodei Urged AI Unity at G7, Even as US Banned His Models
Andy Burnham and the meaning of Makerfield
Brexit cost 6% of UK economy, Bank of England company data suggests
Weeks Of In-The-Field Testing And A Verdict
Jose Alvarado Wants Taylor Swift at More Knicks Games
You must be logged in to post a comment Login