Caitlin Kalinowski spent 16 months building OpenAI’s physical AI programme. On Saturday, she said the company moved too fast on something too important.
The week that began with Anthropic being blacklisted by the Pentagon and ended with OpenAI taking its contract has now claimed OpenAI’s most senior hardware executive.
Caitlin Kalinowski, who joined OpenAI in November 2024 to lead its robotics and consumer hardware division, announced her resignation on Saturday on X. Her statement was short, direct, and more candid than anything OpenAI itself has said about the deal.
“AI has an important role in national security,” she wrote. “But surveillance of Americans without judicial oversight and lethal autonomy without human authorization are lines that deserved more deliberation than they got.”
Advertisement
The 💜 of EU tech
The latest rumblings from the EU tech scene, a story from our wise ol’ founder Boris, and some questionable AI art. It’s free, every week, in your inbox. Sign up now!
In a subsequent post, she was more precise about the nature of the complaint. “It’s a governance concern first and foremost,” she wrote. “These are too important for deals or announcements to be rushed.”
Kalinowski was careful to frame her departure in personal terms. “This was about principle, not people,” she wrote. “I have deep respect for Sam and the team.”
Advertisement
That last note carries some weight: Sam Altman has himself acknowledged that the Pentagon deal was “definitely rushed,” and that the rollout produced significant backlash.
What Kalinowski’s resignation adds to that admission is a name and a title: the most senior person at OpenAI, whose job was to bring AI into physical systems, has decided that the process by which it will now enter weapons systems and surveillance infrastructure was not good enough.
What the deal involved
The sequence of events that led here unfolded over roughly a week. Anthropic, which had been the only AI company cleared to operate on the Pentagon’s classified networks, following a $200 million contract awarded in July 2025, spent several weeks in tense negotiations with the Defense Department over the terms of continued use.
Anthropic’s position was that its models should not be deployed for mass domestic surveillance or fully autonomous weapons. The Pentagon, under Defense Secretary Pete Hegseth, insisted on language permitting use “for all lawful purposes,” without specific carve-outs.
Advertisement
On 28 February, with negotiations collapsed, President Trump directed all federal agencies to stop using Anthropic’s technology and called the company “radical woke” on Truth Social.
Hegseth formally designated Anthropic a supply-chain risk to national security, a classification previously reserved for foreign adversaries, and one that requires DoD vendors and contractors to certify they do not use Anthropic’s models.
Hours later, Altman posted on X that OpenAI had reached its own agreement to deploy its models on the Pentagon’s classified network.
OpenAI’s stated position is that its deal includes the same core protections Anthropic sought: no mass domestic surveillance, no autonomous weapons.
Advertisement
The company published a blog post outlining its approach and arguing that its cloud-only deployment architecture, retained safety stack, and contractual provisions, anchored to existing US law rather than bespoke prohibitions, make its agreement more robust than any previous classified AI deployment, including Anthropic’s.
What Kalinowski’s departure means for OpenAI
Kalinowski’s career before OpenAI was unusual in its breadth. She spent nearly six years at Apple as a technical lead on the Mac Pro and MacBook Air programmes, including the original unibody MacBook Pro, before moving to Meta’s Oculus division, where she led virtual reality hardware for more than nine years.
Her final role at Meta was heading Project Nazare, later named Orion, the augmented reality glasses initiative Meta unveiled as a prototype in September 2024 and described as the most advanced AR glasses ever made.
She joined OpenAI the following month.
Advertisement
During her 16 months at OpenAI, Kalinowski built out what the company describes as its physical AI programme, including a San Francisco lab employing roughly 100 data collectors training a robotic arm on household tasks.
Her departure leaves that effort without its most experienced hardware leader at a moment when OpenAI has staked considerable ambition on moving beyond software.
OpenAI confirmed her resignation on Saturday and said in a statement: “We believe our agreement with the Pentagon creates a workable path for responsible national security uses of AI while making clear our red lines: no domestic surveillance and no autonomous weapons.
We recognise that people have strong views about these issues and we will continue to engage in discussion with employees, government, civil society, and communities around the world.”
Advertisement
The wider picture
The fallout from OpenAI’s Pentagon deal has not been limited to internal dissent. ChatGPT uninstalls reportedly surged 295% following the announcement, and Anthropic’s Claude climbed to the number-one position in the US App Store, displacing ChatGPT. As of Saturday afternoon, the two apps remained first and second, respectively.
What Thursday’s resignation of the company’s robotics chief confirms is that the deal’s costs for OpenAI are still being counted. Altman wanted to de-escalate a confrontation between the government and the AI industry. He may yet have succeeded. Whether the price of that de-escalation, in talent, in trust, and in the specific question of who was right about the guardrails, was worth paying is a question that will take longer to answer.
At CanJam NYC 2026 this past weekend, Grell’s latest design, the Grell OAE2 open back headphones, made their U.S. public debut. The $599 model builds on the original OAE1 and continues Grell’s pursuit of tonal accuracy, mechanical precision, and long-term listening comfort. The open back over ear design incorporates a newly optimized dynamic driver and an acoustically refined housing intended to improve airflow and project a more speaker like soundstage presentation in front of the listener.
Grell OAE2
If you’ve spent enough time walking the halls at hi-fi shows, you know the routine. Every year a handful of startups promise they’ve reinvented the wheel and that what you’re about to hear will change everything you thought you knew about speakers or headphones. You nod politely, sit down, listen for a few minutes, and try not to roll your eyes when the demo playlist inevitably lands on Diana Krall, the Eagles, or Norah Jones.
But when the name on the badge is Axel Grell, you stop joking around and actually pay attention.
Grell is hardly another “trust us, it’s revolutionary” startup voice on a crowded show floor. Before launching his own brand, the veteran headphone engineer spent decades at Sennheiser, where he was responsible for developing some of the most respected high end headphones of the past few decades. Small German company. You might have heard of it. They also make a decent strudel.
Advertisement
How the Grell OAE2 Tries to Move the Soundstage Out of Your Head
Drop + Grell prototype headphone with forward mounted driver.
One of the biggest limitations of traditional headphones is the so called “in head” effect. Because the drivers sit millimeters from the ear and fire directly into the ear canal, most headphones create a listening perspective where instruments appear to originate from inside the listener’s head rather than from a believable space in front of them. While open back designs can widen the presentation and improve air and separation, they rarely change the fundamental geometry of how the sound reaches the ear. The result is often a presentation that feels spacious but still anchored inside the listener’s skull rather than resembling the externalized imaging produced by loudspeakers.
The Grell OAE2 open back headphones were engineered to address that specific limitation. Instead of following the conventional layout where the driver points straight into the ear canal, Axel Grell designed the acoustic structure so the output interacts more deliberately with the outer ear before entering the ear canal. This approach allows the pinna and surrounding ear structures to contribute to spatial cues in a way that more closely resembles how we hear speakers in a room.
In speaker listening, sound reaches the ear only after interacting with the head, shoulders, and outer ear, creating small timing, phase, and tonal variations that the brain uses to interpret direction, distance, and placement. By preserving more of those interactions inside the headphone structure, the OAE2 attempts to shift the listening perspective forward so that instruments appear positioned in front of the listener rather than inside the head.
The goal is not to artificially exaggerate soundstage width or create gimmicky spatial effects, but to maintain stable imaging, natural treble perception, and controlled low frequency behavior while presenting music in a way that resembles nearfield loudspeaker listening.
For listeners accustomed to the traditional headphone presentation, the perspective can initially feel unfamiliar, but the intention is that the brain adapts to the spatial cues over time, making the presentation feel more natural and less fatiguing during long listening sessions.
Advertisement
German Engineering, Replaceable Parts, and None of That Disposable Headphone Nonsense
Beyond the acoustic design, the Grell OAE2 reflects Axel Grell’s long standing belief that premium headphones should be built to last. Instead of chasing short product cycles, the design emphasizes durability, serviceability, and long term ownership. In other words, the opposite of the sealed plastic approach that dominates much of the modern headphone market.
At the center of the OAE2 is a 40 mm wideband dynamic driver built around a bio cellulose diaphragm, paired with a carefully tuned damping system. Part of that system includes a precision manufactured stainless steel acoustic mesh produced in Germany, which helps regulate airflow and maintain consistent driver behavior. The goal is controlled acoustics rather than brute force tuning, supporting the headphone’s spatial presentation without introducing unwanted resonances or instability.
Advertisement. Scroll to continue reading.
Construction follows a modular all metal architecture with replaceable components that can be serviced if parts wear out. The idea is simple: headphones should not become disposable because one component fails. Connectivity is equally straightforward. The OAE2 ships with two detachable 1.8 m cables, including a 3.5 mm single ended cable and a 4.4 mm balanced cable, along with a screw on 3.5 mm to 6.3 mm adapter for traditional headphone amplifiers and a protective carry case.
From a technical standpoint, the OAE2 remains close to the OAE1 but with small refinements. The circumaural open back design uses a dynamic transducer rated from 12 Hz to 34 kHz within ±3 dB, extending from 6 Hz to 46 kHz at -10 dB. Nominal impedance is 38 ohms with 100 dB sensitivity at 1 kHz (1 VRMS), making it compatible with portable players while still benefiting from a capable amplifier. Total harmonic distortion is rated at 0.05 percent at 1 kHz and 100 dB, and the headphone weighs 378 g (13.3 oz) without the cable attached. Slightly heavier than the OAE1 by three grams. German engineering apparently does not skip arm day.
Advertisement
The Germans Ran the Numbers. Now We Listen.
Grell OAE2 Headphones at CanJam NYC 2026
Grell kept things refreshingly simple at CanJam NYC 2026. No $30,000 source chain, no mystical demo playlist, and no attempt to overwhelm people with exotic gear. Just three pairs of the Grell OAE2, a few source devices, and a small stack of EarMen ST-Amp headphone amplifiers priced around $400.
The setup was about as straightforward as it gets for a show floor demo. The EarMen ST-Amp offers both single ended and balanced output options, rated at 0.5W into 32 ohms (4V) single ended and 1.85W into 32 ohms (7.75V) balanced, which is more than enough for a 38 ohm headphone like the OAE2. In other words, plenty of clean power but nothing exotic that might artificially inflate the listening experience.
Even the music selection avoided the usual trade show clichés. There were no sacred audiophile demo tracks looping endlessly in the background. Attendees could simply plug in their own phone and listen to whatever they wanted. Which I appreciated. I even spent a little time surfing through German tracks on the playlist. My German is… limited. Although if knowing Yiddish counts as partial credit, I was doing just fine.
A short run through Deadmau5, Daft Punk, and Aphex Twin was enough to get my attention. The OAE2 leans toward a neutral presentation without obvious boosts or dips across the spectrum. Clean, but not the sterile kind of clean that some German designs fall into. Bass is not tuned for exaggerated punch, but the definition and speed are excellent, especially with electronic music that depends on tight timing. The top end is equally well behaved. Detailed and open without any glare or hardness.
Then came the part that really mattered. Soundstage. The presentation is clearly wide, but not absurdly wide in the artificial sense. Think more East River to the IAC Building on the West Side Highway wide, not across the Hudson into New Jersey for chili dogs at Hiram’s wide. More important than width, however, is placement in front of the listener. If Axel Grell’s goal was to push the image outside the head and create what I would call a “nearface” listening perspective about 6 to 10 inches in front of you, the OAE2 largely succeeds.
Switching over to vocals confirmed that impression. Tracks from Amy Winehouse, Billie Holiday, Bjork, and Belinda Carlisle showed the same spatial behavior. Voices did not collapse into the center of the skull as they often do with headphones. Some appeared slightly closer, others further back, but the image remained focused and stable. Never diffuse. Always locked in dead center. And frankly, that was pretty impressive.
Advertisement
The Bottom Line
The Grell OAE2 stands out for one reason: it actually delivers on the promise of moving the soundstage out of your head and placing it slightly in front of the listener. The effect is not gimmicky or exaggerated. Instead, it feels closer to a nearfield speaker presentation with stable imaging and natural placement. Sonically the tuning leans neutral with clean treble, fast and well controlled bass, and no obvious peaks designed to impress in a quick demo.
Comfort was also encouraging. Clamping force is moderate, the headband is well padded, and the overall build quality feels appropriate for a headphone expected to retail for $599 / £499 / €499. Based on our first listen at CanJam NYC 2026, Axel Grell’s latest design shows real promise. A full review is coming later this month once we spend more time with a production unit.
We may receive a commission on purchases made from links.
Gone are the days when you had to be attached to the nearest wall whenever you needed to use a power tool. These days, the rise of electric power tools has introduced not just the convenience of a cordless workflow but also the benefits of a swappable battery system. With batteries that can work across multiple product lines, electric power tool fans can save time, effort, and storage space. Because of this, it’s no wonder that everyone from regular homeowners to professionals has invested in their own electric power tool systems, like DeWalt. However, it’s important to note that, while cordless power tools offer a lot of convenience, they still require maintenance.
While DeWalt is known for its trustworthiness, some of the most common issues with DeWalt power tool batteries include premature failure, overheating, and charging issues. And while some of these issues are just mildly annoying, others can cause harm both to you and your property. Like other power tool brands, DeWalt batteries are also at risk of typical lithium battery issues, including degradation, swelling, and fire. Apart from using it correctly and charging it only with legitimate chargers and tools, one of the most important things you can do to make your DeWalt batteries last longer is to keep them in the right place at the right time. So, if you’re committed to doing so, here are what you should avoid.
Advertisement
1. Inside your car (and other super hot places)
HEGEDARIA/Shutterstock
Leaving your power tool batteries inside vehicles can lead to many problems, from minor efficiency issues to permanent damage. However, the vehicle itself isn’t entirely the problem, but how it can be like an oven. After all, DeWalt states that anything above 105°F is a big no. According to DeWalt, this is because the chemicals inside it won’t be able to get the right reaction it needs to charge properly. In line with this, it’s best to keep the batteries out of the reach of anything that generates excessive heat, such as fireplaces or space heaters. It also goes without saying that you should avoid placing them anywhere else where they’re exposed to direct sunlight. For example, it’s best not to leave them outside on your workbench in the afternoon sun for too long.
If you do need to bring your DeWalt batteries with you for any reason, it’s also important to prepare them properly for transit, especially when traveling on a commercial plane. The United States Federal Aviation Administration (FAA) notes that, due to the inherent risk of lithium batteries, there are many restrictions on them. Because of this, DeWalt cautions FlexVolt battery users not to forget the red transport cap. Thankfully, if you do lose it, you can buy these separately. On Amazon, you can snag a replacement DeWalt 60V battery cap for under $10, which buyers have confirmed meets airline requirements.
Advertisement
2. Unheated garages or sheds
Dmytro Zinkevych/Shutterstock
While heat can be a problem, the cold can also ruin your power tool batteries. In fact, DeWalt notes that anything below 40°F can cause similar problems to exposure to extreme heat. So, if you live in a country that regularly experiences freezing temperatures, you’ll need to be more mindful when moving your batteries to less volatile spaces. This can mean investing in insulating your garages, sheds, or workspaces.
If you have no choice but to leave your DeWalt batteries in a place with poor insulation, there are some things you can do that won’t break the bank. For example, some budget-friendly hacks for keeping your tool batteries safe in cold temperatures include using insulated bags or even a closed cooler. Similar to how these are designed to keep your beer cool in the summer, they’re also an ideal way to maintain a steady temperature in the winter. Just make sure no leftover ice or water is inside when you chuck your batteries in it.
Advertisement
But take note, it’s never a good idea to revive a power tool battery when it doesn’t seem to be charging, since the process can be both complicated and dangerous. Unless you’re a professional with access to the right parts or know the ins and outs of modern batteries, including the software, you’re likely better off sending it to the service center or replacing it with a fresh product.
Advertisement
3. Utility rooms
Solstock/Getty Images
One common warning you’ll get from any lithium-powered battery is to keep it away from moisture and other liquids, which basically means humid environments are at the bottom of the list. Humidity can lead to corrosion that can affect the battery’s ability to function over time, leading to issues like battery deformation or even short circuiting. Because of this, places that tend to have fluctuating, humid temperatures, like utility rooms, should be avoided.
Like most of its portfolio, DeWalt batteries can’t be used when wet anymore. While there are cases of users claiming their batteries still work after dropping them in buckets or getting wet in the rain, this isn’t always the case. To help counter this, you can invest in something like DeWalt’s ToughSystem 2.0 Charger Box. Apart from being a charger for small electronic devices and your power tool batteries, the box itself is rated IP55. Although it still can’t be submerged and isn’t fully dust-tight, it does offer protection from low-pressure water jets (and a little bit of rain).
If you spot any corrosion you suspect is affecting your tool’s performance, it doesn’t mean you have to dispose of the batteries yet. After unplugging the battery from your tool or charger, you can use a baking soda paste to help remove it. But if it looks too far gone, you might as well send it over to the DeWalt service center for professional guidance (or at least free, guilt-free disposal).
Advertisement
4. Near potential fire hazards
Constantinis/Getty Images
For many of us, our garages house more than just our vehicles. Unless you’re a committed minimalist, it’s not uncommon for the average person to accumulate a lifetime’s worth of stuff, much of it functional and some of it sentimental. Because of this, it’s quite common to use the garages as a general storage space, including for tools. Unsurprisingly, many of us may find ourselves storing our power tool batteries alongside a ton of other items, which they probably should not be close to at all.
In general, there are certain things you should avoid storing in your garage or shed entirely, including your DeWalt batteries. Apart from temperature-sensitive perishables, like wine or food, it’s also good to find a home elsewhere for flammable items. If your DeWalt batteries do catch fire, things like paint cans, propane tanks, or even old paperwork can make the damage even more terrifying.
That said, this can be easier said than done, especially if you have a small space to begin with. To avoid this, it’s always a good idea to keep your garage organized to keep fires at bay, whether they are caused by your DeWalt batteries or not. By conducting regular inventories, creating designated workspaces, storing similar items together, and setting a maintenance schedule, you can both keep the clutter at bay and catch any issues with your power tools and their batteries.
Advertisement
5. Near conductive or corrosive items
Bukhta Yurii/Shutterstock
In an ideal world, DeWalt batteries are supposed to last up to three years. However, among the many factors that can affect them are the number of charger cycles and small actions that can increase the rate of degradation, like not being careful with the terminals. If you’re wondering how to mess up this sensitive part of the battery, DeWalt mentions several ways to do so, such as avoiding conductive materials. In layman’s terms, this means things that can channel electricity, like keys or coins, should be stored away. For people who work with DeWalt batteries professionally, this can also mean hand tools or loose tools, such as nails, bolts, and screws. But while this can be a problem if you haphazardly throw your things into a random bag, this won’t be such a big concern if you are traveling with DeWalt’s TStak or ToughSystem.
Apart from this, it will also be a good idea to clean your DeWalt battery terminal regularly. Although things like sawdust, drywall dust, dirt, or oils are pretty common when you work with power tools, it’s important not to leave them on for too long. With just a few minutes of your time and a damp (not wet) cloth, you can remove particle buildup from your batteries. Just make sure to avoid any unnecessary solvents, so you don’t accidentally damage them.
The hype for Gang of Dragon, the debut game from Nagoshi Studio, may already be getting derailed. According to a Bloomberg report, Chinese tech giant NetEase is going to stop financing Nagoshi Studio starting in May. Bloomberg confirmed the news with the studio’s employees and a NetEase spokesperson.
The report explained that NetEase decided to cut funding to Nagoshi Studio, which was founded in 2021 by Yakuza franchise creator Toshihiro Nagoshi, after finding out the studio needed $44.4 million to complete the project. Bloomberg reported that Nagoshi Studio is trying to find new sponsors but hasn’t had any success so far. The report also added that the studio can continue the project on its own, but would be responsible for paying NetEase for any associated costs to hold onto the brand or assets.
While Nagoshi Studio may have been working on Gang of Dragon since the studio’s creation, the general public got a better look at the title through a trailer announcement during The Game Awards 2025. The action-adventure game set in Tokyo would star Ma Dong-Seok, a South Korean actor who starred in Train to Busan and Marvel’s Eternals. As of now, Nagoshi Studio might be at risk of joining other casualties stemming from NetEase’s executive decisions, like when the tech giant decided to shut down Ouka Studio in 2024.
While inventors haven’t quite given us a “Star Trek” replicator, 3D printers are the next best thing. After success with titanium printing, Apple will tackle the challenges of 3d printing with aluminum to make products like the iPhone.
iPhone’s aluminum chassis could one day be 3D printed
The Apple Watch Ultra 3 uses a 3D-printed titanium unibody case. It was Apple’s first 3D-printed product, but more are on the way as innovations make the process more efficient. According to the Power On newsletter, Apple is working to increase its use of 3D printing in product manufacturing. It is likely going to be used in Apple Watch models first, but the goal is to eventually print iPhones. Rumor Score: 🤔 Possible Continue Reading on AppleInsider | Discuss on our Forums
Colliding black holes were detected through spacetime ripples for the first time in 2015 by the Laser Interferometer Gravitational-Wave Observatory (LIGO), notes Space.com:
Since then, LIGO and its partner gravitational wave detectors Virgo in Italy and KAGRA (Kamioka Gravitational Wave Detector) in Japan have detected a multitude of gravitational waves from colliding black holes, merging neutron stars, and even the odd “mixed merger” between a black hole and a neutron star… During the first three observing runs of LIGO, Virgo and KAGRA, scientists had only “heard” 90 potential gravitational wave sources.
But now they’ve published new data from the LIGO-Virgo-KAGRA (LVK) Collaboration that includes 128 more gravitatational wave sources — some incredibly distant:
[Gravitational-Wave Transient Catalog-4.0, or GWTC-4] was collected during the fourth observational run of these gravitational wave detectors, which was conducted between May 2023 and Jan. 2024… Excitingly, GWTC-4 could technically have been even larger, as around 170 other gravitational wave detections made by LIGO, Virgo and KAGRA haven’t yet made their way into the catalog.
One aspect of GWTC-4 that really stands out is the variety of events that created these signals. Within this catalog are gravitational waves from mergers between the heaviest black hole binaries yet, each about 130 times as massive as the sun, lopsided mergers between black holes with seriously mismatched masses, and black holes that are spinning at incredible speeds of around 40% the speed of light. In these cases, scientists think the extreme characteristics of the black holes involved in these mergers are the result of prior collisions, providing evidence of merger chains that explain how some black holes grow to masses billions of times that of the sun… GWTC-4 also includes two new mixed mergers involving black holes and neutron stars.
Advertisement
[LVK member Daniel Williams, of the University of Glasgow in the U.K., said in their statement] “We are really pushing the edges, and are seeing things that are more massive, spinning faster, and are more astrophysically interesting and unusual.” The catalog also demonstrates just how sensitive the LVK detectors have become. Some of the neutron star mergers occurred up to 1 billion light-years away, while some of the black hole mergers occurred up to 10 billion light-years away. Einstein’s theory of general relativity can be tested with these detections, and “So far, the theory is passing all our tests,” says LVK member Aaron Zimmerman, of the University of Texas at Austin. “But we’re also learning that we have to make even more accurate predictions to keep up with all the data the universe is giving us.” And LVK member Rachel Gray, a lecturer at the University of Glasgow, says “every merging black hole gives us a measurement of the Hubble constant, and by combining all of the gravitational wave sources together, we can vastly improve how accurate this measurement is.”
In short, says LVK member Lucy Thomas of the California Institute of Technology (Caltech), “Each new gravitational-wave detection allows us to unlock another piece of the universe’s puzzle in ways we couldn’t just a decade ago.”
Coffee is the original office biohack and the nation’s most popular productivity tool. As we lose sleep to the changeover to daylight saving time, the caffeine-addicted WIRED Reviews team is writing about our favorite coffee brewing routines and devices that’ll keep us alert and maybe even happy in the morning. Today, operations manager Scott Gilbertson expounds on the perfect simplicity of the moka pot. In the days after, we’ll add other Java.Base stories about other WIRED writers’ favorite brewing methods.
Years of travel and a love of repair has given me a special appreciation for simple devices. A pen and paper is still the simplest way to write. A cast-iron pan is the simplest way to cook. And a moka pot is the simplest way to brew coffee.
What I love about the moka pot isn’t just the results I get from it. I do love the flavor, especially when paired with a nice dark, chocolatey, smokey roast, but the moka pot is about more than flavor. It’s also about ingenious simplicity and a design that has lasted nearly a century.
Simple Beginning
Photograph: Scott Gilbertson
The moka pot’s exact origins depend on who you ask, but it was first manufactured and popularized by an aluminum manufacturer named Alfonso Bialetti and his son Renato, who started mass-producing them the same year. Today, Bialetti Industries still makes the Moka Express. The iconic logo image of a short, squat, heavily mustached man is indeed based on Bialetti himself.
Advertisement
If you want some idea of Renato Bialetti’s commitment to the device that made him famous, consider that when he died in 2022, his ashes were interred in a large moka pot. He isn’t the only one who revered the design. The moka pot is featured in museums around the world, including the Museum of Modern Art. Its iconic octagonal shape makes it one of the most recognized coffee brewers in the world.
The moka pot is a pressure-driven stovetop (or campfire top, though this requires close attention) coffee brewer that works something like a percolator. The Moka Express consist of four parts, split into two chambers. The bottom is the water reservoir which heats up on the stove. Into this, you put the brewing basket which holds your grounds. The top consists of a long tube in the center of a holding chamber. On the bottom of the top piece, there’s a metal filter ringed by a rubber (or silicone on some models) gasket. The top and bottom screw together.
As the water heats it passes upwards, through the basket of grounds, and eventually out of the tube. The extraction sits above the grounds and the metal filter keeps everything in place. It’s ingeniously simple.
“When you get a demo and something works 90% of the time, that’s just the first nine.” — Andrej Karpathy
The “March of Nines” frames a common production reality: You can reach the first 90% reliability with a strong demo, and each additional nine often requires comparable engineering effort. For enterprise teams, the distance between “usually works” and “operates like dependable software” determines adoption.
The compounding math behind the March of Nines
“Every single nine is the same amount of work.” — Andrej Karpathy
Agentic workflows compound failure. A typical enterprise flow might include: intent parsing, context retrieval, planning, one or more tool calls, validation, formatting, and audit logging. If a workflow has n steps and each step succeeds with probability p, end-to-end success is approximately p^n.
Advertisement
In a 10-step workflow, the end-to-end success compounds due to the failures of each step. Correlated outages (auth, rate limits, connectors) will dominate unless you harden shared dependencies.
Per-step success (p)
10-step success (p^10)
Workflow failure rate
Advertisement
At 10 workflows/day
What does this mean in practice
90.00%
34.87%
Advertisement
65.13%
~6.5 interruptions/day
Prototype territory. Most workflows get interrupted
99.00%
Advertisement
90.44%
9.56%
~1 every 1.0 days
Fine for a demo, but interruptions are still frequent in real use.
Advertisement
99.90%
99.00%
1.00%
~1 every 10.0 days
Advertisement
Still feels unreliable because misses remain common.
99.99%
99.90%
0.10%
Advertisement
~1 every 3.3 months
This is where it starts to feel like dependable enterprise-grade software.
Define reliability as measurable SLOs
“It makes a lot more sense to spend a bit more time to be more concrete in your prompts.” — Andrej Karpathy
Teams achieve higher nines by turning reliability into measurable objectives, then investing in controls that reduce variance. Start with a small set of SLIs that describe both model behavior and the surrounding system:
Advertisement
Workflow completion rate (success or explicit escalation).
Tool-call success rate within timeouts, with strict schema validation on inputs and outputs.
Schema-valid output rate for every structured response (JSON/arguments).
Policy compliance rate (PII, secrets, and security constraints).
p95 end-to-end latency and cost per workflow.
Fallback rate (safer model, cached data, or human review).
Set SLO targets per workflow tier (low/medium/high impact) and manage an error budget so experiments stay controlled.
Nine levers that reliably add nines
1) Constrain autonomy with an explicit workflow graph
Reliability rises when the system has bounded states and deterministic handling for retries, timeouts, and terminal outcomes.
Model calls sit inside a state machine or a DAG, where each node defines allowed tools, max attempts, and a success predicate.
Persist state with idempotent keys so retries are safe and debuggable.
2) Enforce contracts at every boundary
Most production failures start as interface drift: malformed JSON, missing fields, wrong units, or invented identifiers.
Use JSON Schema/protobuf for every structured output and validate server-side before any tool executes.
Use enums, canonical IDs, and normalize time (ISO-8601 + timezone) and units (SI).
3) Layer validators: syntax, semantics, business rules
Schema validation catches formatting. Semantic and business-rule checks prevent plausible answers that break systems.
Advertisement
Semantic checks: referential integrity, numeric bounds, permission checks, and deterministic joins by ID when available.
Business rules: approvals for write actions, data residency constraints, and customer-tier constraints.
4) Route by risk using uncertainty signals
High-impact actions deserve higher assurance. Risk-based routing turns uncertainty into a product feature.
Use confidence signals (classifiers, consistency checks, or a second-model verifier) to decide routing.
Gate risky steps behind stronger models, additional verification, or human approval.
5) Engineer tool calls like distributed systems
Connectors and dependencies often dominate failure rates in agentic systems.
Apply per-tool timeouts, backoff with jitter, circuit breakers, and concurrency limits.
Version tool schemas and validate tool responses to prevent silent breakage when APIs change.
6) Make retrieval predictable and observable
Retrieval quality determines how grounded your application will be. Treat it like a versioned data product with coverage metrics.
Track empty-retrieval rate, document freshness, and hit rate on labeled queries.
Ship index changes with canaries, so you know if something will fail before it fails.
Apply least-privilege access and redaction at the retrieval layer to reduce leakage risk.
7) Build a production evaluation pipeline
The later nines depend on finding rare failures quickly and preventing regressions.
Advertisement
8) Invest in observability and operational response
Once failures become rare, the speed of diagnosis and remediation becomes the limiting factor.
Emit traces/spans per step, store redacted prompts and tool I/O with strong access controls, and classify every failure into a taxonomy.
Use runbooks and “safe mode” toggles (disable risky tools, switch models, require human approval) for fast mitigation.
9) Ship an autonomy slider with deterministic fallbacks
Fallible systems need supervision, and production software needs a safe way to dial autonomy up over time. Treat autonomy as a knob, not a switch, and make the safe path the default.
Default to read-only or reversible actions, require explicit confirmation (or approval workflows) for writes and irreversible operations.
Build deterministic fallbacks: retrieval-only answers, cached responses, rules-based handlers, or escalation to human review when confidence is low.
Expose per-tenant safe modes: disable risky tools/connectors, force a stronger model, lower temperature, and tighten timeouts during incidents.
Design resumable handoffs: persist state, show the plan/diff, and let a reviewer approve and resume from the exact step with an idempotency key.
Implementation sketch: a bounded step wrapper
A small wrapper around each model/tool step converts unpredictability into policy-driven control: strict validation, bounded retries, timeouts, telemetry, and explicit fallbacks.
# bound latency so one step can’t stall the workflow
with deadline(timeout_s):
out = attempt_fn()
# gate: schema + semantic + business invariants
Advertisement
validate_fn(out)
# success path
metric(“step_success”, name, attempt=attempt)
return out
Advertisement
except (TimeoutError, UpstreamError) as e:
# transient: retry with jitter to avoid retry storms
span.log({“attempt”: attempt, “err”: str(e)})
sleep(jittered_backoff(attempt))
Advertisement
except ValidationError as e:
# bad output: retry once in “safer” mode (lower temp / stricter prompt)
span.log({“attempt”: attempt, “err”: str(e)})
out = attempt_fn(mode=”safer”)
Advertisement
# fallback: keep system safe when retries are exhausted
metric(“step_fallback”, name)
return EscalateToHuman(reason=f”{name} failed”)
Why enterprises insist on the later nines
Reliability gaps translate into business risk. McKinsey’s 2025 global survey reports that 51% of organizations using AI experienced at least one negative consequence, and nearly one-third reported consequences tied to AI inaccuracy. These outcomes drive demand for stronger measurement, guardrails, and operational controls.
Advertisement
Closing checklist
Pick a top workflow, define its completion SLO, and instrument terminal status codes.
Add contracts + validators around every model output and tool input/output.
Treat connectors and retrieval as first-class reliability work (timeouts, circuit breakers, canaries).
Route high-impact actions through higher assurance paths (verification or approval).
Turn every incident into a regression test in your golden set.
The nines arrive through disciplined engineering: bounded workflows, strict interfaces, resilient dependencies, and fast operational learning loops.
Nikhil Mungel has been building distributed systems and AI teams at SaaS companies for more than 15 years.
Welcome to the VentureBeat community!
Our guest posting program is where technical experts share insights and provide neutral, non-vested deep dives on AI, data infrastructure, cybersecurity and other cutting-edge technologies shaping the future of enterprise.
Advertisement
Read more from our guest post program — and check out our guidelines if you’re interested in contributing an article of your own!
The restored PET/CBM 3032. (Credit: Drygol, retrohax.net)
The Commodore CBM 3032 is a successor to the original Commodore PET 2001, yet due a conflicting trademark issue with Philips these first European PETs were called ‘CBM’ instead. Hence the labeling on the CBM 3032 that [Drygol] had in for a restoration, which would have been produced somewhere between 1979 and the cessation of its manufacturing a few years later. This former machine of the University of Szcezecin in Poland had languished in a basement until a local demoscene group came across it and wanted to use it, after a restoration.
Although at first glance from just the front it didn’t look too shabby, problems were apparent from just a walkaround, including rusty and buckled paneling, showing that the time spent in storage had not done it any favors. Internally there was decades worth of dust, along with a dodgy potentiometer, cold joints and some PCB-level bodges that may or may not have been there from the factory.
The main case was disassembled by drilling out the rivets to gain full access to every nook and cranny, allowing for a good cleaning and repainting prior to putting in fresh rivets. On the PCB side of things, a potentiometer and an LM340KC-12 linear regulator in a TO-3 package had to be replaced, after which the system managed to boot reliably once in every three attempts.
Fixing this took basically cleaning all contacts and IC sockets, as well as refurbishing the keyboard, with corrosion and the occasional broken trace causing a lot of grief. Ultimately the system was restored and ready to be put into demoscene service.
DeepRare, an agentic AI system integrating 40 specialised tools, outperformed medical specialists in identifying rare conditions in a head-to-head study published in Nature.
For millions of people with rare diseases, the path to diagnosis is a labyrinth. Patients bounce between generalist GPs and specialists across years, sometimes decades, piecing together symptoms that fall outside textbook presentations.
Eighty per cent of rare diseases have a genetic origin, yet most go undiagnosed until too much biological damage has occurred. The bottleneck is not lack of data, it’s finding the needle in the medical haystack.
A new study published in Nature this month suggests that artificial intelligence may accelerate that hunt. Researchers at Shanghai Jiao Tong University’s School of Artificial Intelligence and Xinhua Hospital developed DeepRare, an AI system designed to mimic how human doctors reason through diagnostic uncertainty.
Advertisement
In a head-to-head comparison with five experienced physicians, each with more than a decade of practice, the system achieved higher accuracy across the board.
The numbers are striking. DeepRare correctly identified the disease on its first suggestion 64.4 per cent of the time, compared to 54.6 per cent for the doctors. When given three suggestions instead of one, the AI system achieved diagnostic success in 79 per cent of cases versus 66 per cent for the human specialists.
Crucially, the physicians endorsed the AI’s reasoning 95.4 per cent of the time, suggesting the system not only reaches correct conclusions, but does so in ways that experienced clinicians find persuasive and medically sound.
What distinguishes DeepRare from earlier diagnostic AI is its architecture. Rather than applying a black-box classification model, the system integrates 40 specialised digital tools and follows an explicitly reasoned workflow.
Advertisement
It forms diagnostic hypotheses, tests them against patient evidence, searches global medical literature databases, analyses genetic variants, and revises its conclusions iteratively before ranking possibilities.
The process mirrors the cognitive steps a human diagnostician takes, but with access to the entirety of medical knowledge and computational speed humans cannot match.
The system has already moved beyond the laboratory. Since July 2025, DeepRare has been deployed on an online diagnostic platform, with more than 600 medical institutions worldwide registered to use it.
The research team plans to validate the system further using 20,000 real-world cases and to launch a global rare disease diagnostic alliance. Notably, the authors emphasise that the system is not intended to replace clinicians, but to augment diagnostic workflows, a safeguard that acknowledges both the technical limits of AI and the irreducible human element in medicine.
Advertisement
The implications for patients are profound. Approximately 300 million people worldwide are affected by rare diseases, and the average diagnostic odyssey stretches to five years or longer.
Each year of diagnostic delay is a year of uncertainty, wrong treatments, and accumulating organ damage. An AI system that can trim weeks or months from that timeline, and surface possibilities that might otherwise be overlooked, could reshape the early experience of living with a rare condition.
The rapidly-improving speed and versatility of digital computers has mostly driven analogue computers out of use in modern systems, as has the relative difficulty of programming an analogue computer. There is a kind of art, though, in weaving together a series of op-amps to perform mathematical calculations; between this, a historical interest in the machines, and their rarity value, it’s no wonder that new analogue computers are being designed even now, such as [Markus Bindhammer]’s system.
The computer is built around a combined circuit board and patch panel, based on the designs included in three papers in a online library of analogue computer references. The housing around the patch panel took design cues from the Polish AKAT-1 analogue computer, including the two dial voltage indicators and an oscilloscope display, in this case an inexpensive DSO-138. The patch panel uses banana connectors and the jumper wires use stackable connectors, so several wires can be connected to the same socket.
The computer itself has a summing amplifier circuit, a multiplier circuit, an integrator, and square, triangle, and sine wave generators. This simple set of tools is enough to simulate both simple and complex math; for example, [Markus] squared five volts with the multiplier, resulting in 2.5 volts (the multiplier divides the result by ten). A more advanced example is a leaky-integrator model of a neuron, which simulates a differential equation.