An anonymous reader quotes a report from the BBC: Internet access has started to be restored in Iran after being cut off almost three months ago, the country’s first vice-president has said. “The first step toward free and regulated access to cyberspace has been taken,” Mohammad Reza Aref wrote on X on Tuesday. Internet monitoring groups Netblocks and Kentik reported “partial” restoration around 13:00 GMT, though the latter warned most networks were still down.
The Iranian government cut internet access following the launch of US and Israeli attacks on February 28. Officials suggested the aim was to prevent surveillance, espionage and cyber-attacks. It is one of the longest-running national internet shutdowns ever recorded worldwide. A content creator from Tehran told the BBC that he had been able to connect to the internet using his home WiFi on Tuesday. “The main point is, some of my income will come back,” he said.
Netblocks said it was unclear whether the internet return would be sustained, and told the BBC it was consistent with what it had seen when previous blackouts were lifted — where restoration could take hours. “Access is not universally back to its original state, with some regional variation,” said the global internet tracker’s research director Isik Mater on Tuesday. She added that there were signs of “more extensive filtering” than prior to January — when a similar blackout was imposed during the regime’s deadly crackdown on anti-government protests — “including additional restrictions to messaging apps like WhatsApp.”
An artist’s conception shows Blue Origin’s Blue Moon Mark 1 lander lowering an Astrolab rover to the lunar surface. (NASA Illustration)
Jeff Bezos’ Blue Origin space venture has won NASA’s nod to deliver crew-carrying rovers to the lunar surface as part of the space agency’s decade-long plan to create a base near the moon’s south pole.
“America is returning to the moon,” NASA Administrator Jared Isaacman said today during a news briefing at the space agency’s headquarters in Washington, D.C. “We are working alongside our many international and commercial partners to leverage the incredible capabilities from commercial industry to build a moon base for all we hope to accomplish in this endeavor.”
NASA awarded Blue Origin an initial $188 million contract to get its robotic Blue Moon Mark 1 lander ready to deliver lunar terrain vehicles, or LTVs, with an option period worth an additional $280.4 million for two task orders. The option period will be based on Blue Origin’s performance during the initial contract phase, NASA said.
Carlos Garcia-Galan, program manager for NASA’s Moon Base program, said the LTVs will be “a mix between the Apollo lunar roving vehicle and the Mars-style rover.” Each rover will weigh a little less than one metric ton, he said, and will be folded up to fit on Blue Origin’s lander during transit to the moon.
The first LTV is due to be brought to the moon in advance of the Artemis 4 mission’s crewed landing, which is currently scheduled for 2028, Garcia-Galan said.
Advertisement
One of the LTVs will be built by California-based Astrolab, with Seattle-based Interlune serving as a subcontractor. In a LinkedIn post, Interlune said it would work with Astrolab on “many aspects of the rover development, involving the science of survival in the lunar environment.” The Interlune Research Lab in Texas will develop varieties of simulated moon dirt specifically for testing Astrolab’s moon rover, which has been designated CLV-1.
The other LTV will be Colorado-based Lunar Outpost’s Pegasus rover, which is being developed in partnership with General Motors, Goodyear and Leidos.
Both LTVs are designed to travel at speeds of up to 10 kilometers per hour (6 mph), carrying up to two astronauts on 10-kilometer (6-mile) trips. The rovers could also take on robotic excursions with a maximum range of 200 kilometers (125 miles). Astrolab is receiving a $219 million contract, while Lunar Outpost’s contract is worth $220 million, NASA said.
In a statement posted to X, Kent, Wash.-based Blue Origin said it was proud to support NASA’s plans for a permanent presence in the moon’s south polar region. The company’s CEO, Dave Limp, also gave a shout-out to Isaacman on his social-media account.
Advertisement
“Since the beginning, Blue Origin has been committed to Lunar Permanence,” Limp wrote. “Thank you, @NASAadmin, for sharing that vision. We’re ready to make it a reality.”
A davit system on the Blue Moon lander lowers a Lunar Outpost’ Pegasus lander to the lunar surface. (NASA / Lunar Outpost Illustration)Artwork shows the Firefly Elytra Dark space vehicle deploying four rocket-powered drones over the moon. (Firefly Space Illustration)
NASA’s Moon Base program could get its official kickoff as early as this fall with the launch of Endurance, Blue Origin’s first Blue Moon Mark 1 lander. Endurance, which is currently going through preflight testing, is scheduled to deliver several payloads to the moon’s south polar region — including a retroreflector system for gauging distances and a camera system for studying how thrusters interact with the moon’s surface. This first Blue Moon mission has been on the schedule for more than a year, but Garcia-Galan said it is now known as Moon Base 1.
The Moon Base 2 mission calls for a SpaceX Falcon Heavy rocket to deliver Pittsburgh-based Astrobotic’s Griffin lander to the moon later this year. Griffin will be carrying more than 1,100 pounds of cargo. One of the payloads is an Astrolab rover that’s outfitted with an Interlune imaging system capable of surveying the lunar surface for traces of valuable helium-3.
For the Moon Base 3 mission, Intuitive Machines’ Nova-C Trinity lander will fly the first payload selected through a NASA initiative known as Payloads and Research Investigations on the Surface of the Moon, or PRISM. Lunar Vertex will study lunar swirls — bright spots on the moon’s surface that are thought to be caused by magnetic anomalies. The lander will also carry payloads for the European Space Agency and the Korea Astronomy and Space Science Institute.
Advertisement
“These represent the first of more than a dozen missions we expect to announce through the balance of this year, as we return, build the base, and never give up the moon again,” Isaacman said.
Moon Base 1 and the LTV deliveries aren’t the only lunar missions in which Blue Origin is playing a key role. For example, the company’s second Mark 1 lander has been tasked with delivering NASA’s robotic VIPER rover to the lunar surface in late 2027.
Blue Origin is also working on a Blue Moon Mark 2 lunar lander that could carry future Artemis crews to the lunar surface. NASA is aiming to test the Mark 2 and/or SpaceX’s Starship-based lunar lander next year in low Earth orbit during the Artemis 3 mission.
“We’re already moving forward pretty strongly with both Blue Origin and SpaceX on their lander concepts,” said Lori Glaze, associate administrator for NASA’s Human Spaceflight Mission Directorate. “There’s a lot of trade studies ongoing right now, just to make sure we’ve got the mission designs right and the right objectives for those.”
Advertisement
Isaacman said NASA’s strategy called for “leveraging the NASA playbook from the 1960s, figuring out what works and what doesn’t in this epic science of survival.”
The announcements that were made today focused on the first phase of NASA’s Moon Base plan, which aims to establish reliable access to the lunar surface and characterize resources at the south polar region, where significant reserves of water ice are thought to exist.
The second phase of the project, scheduled for the 2029-2032 time frame, calls for setting up infrastructure for lunar operations, including energy facilities that rely on solar or nuclear power. During the third phase, NASA and its partners would establish a permanent base.
“We envision the moon base to be hundreds of square miles, with different assets all building up to the objective of permanent lunar presence,” Garcia-Galan said.
Advertisement
Isaacman said there are “a lot of great things that will come from having an outpost on the moon,” with the ability to prepare for farther-out missions leading his list.
“There will be scientific discoveries,” he said. “Let’s land rovers with radio telescopes to go to the far side moon. Let’s ignite an orbital economy. These are all things that would be nice to have and achieve along the way, but really it is to have an environment where we can work with the water ice and master the skills for where we go next, which is Mars. … We want to be in an environment where we can learn the skills, so that astronauts can go and plant the Stars and Stripes on Mars someday.”
I’m starting to wonder if RFK Jr. can do anything right at all. After the courts put an injunction on Kennedy’s overhaul of the CDC’s ACIP panel on vaccines, as well as pretty much all of their recommendations since it was rebuilt on a foundation of anti-vaxxers, the government sprung into action to try to let Kennedy keep fucking with vaccines in America. The reasoning by the court for the injunction was a process oriented one: Kennedy’s overhaul of ACIP violated the American Procedures Act. By simply hand-picking unqualified sycophants to ACIP, he didn’t follow procedural law. The Trump administration eventually appealed the ruling, which is still pending hearings. On his end, Kennedy decided to amend the ACIP charter to try to route around some of the procedural violations of the APA that got him in trouble the first time.
A revised charter document for the Centers for Disease Control and Prevention’s influential vaccine advisory committee has been withdrawn by the Health Department over an administrative error, according to a notice published in the Federal Register Tuesday.
While the Health Department is working to appeal the injunction, Kennedy attempted to circumvent the judge’s ruling on the ACIP members by altering the committee’s charter to, among other things, allow for people without expertise in immunizations and public health to be members.
But, for now, that effort, too, has been thwarted. According to the notice on Tuesday, the new charter has been withdrawn for not following a federal requirement on public notification.
Advertisement
The law on the matter is remarkably clear. In order to reestablish a discretionary advisory committee, for which ACIP qualifies, the Secretary of the agency must provide a written statement that the committee is being formed in the public interest, establish what that public interest actually is, and then publish a public notice to the Federal Register so that the people can understand the action that is being taken.
Kennedy didn’t do any of that. He rewrote the governing charter for his remade version of ACIP and just tried to make it a thing without following any of those rules. He just plain fucked it up.
Which isn’t to suggest that Kennedy definitely won’t try to do this all again with an actual attempt to follow procedural law. I am having trouble imagining a world in which he doesn’t do that, actually. But given his apparent desire to step on every last rake he can find, it’s a wonder to me that the Trump administration doesn’t simply want to put someone more capable in charge of HHS.
Spain has temporarily blocked Polymarket and Kalshi while it investigates whether the prediction-market platforms are violating gambling laws by operating without a license. Engadget reports: The country’s ministry in charge of consumer affairs said it blocked the websites as a precautionary measure pending an official investigation. This investigation will determine if the platforms violate Spain’s gambling laws. It’s set to complete within the next four months and could mandate that these companies require specific administrative licenses to operate.
For months, the leading AI coding benchmarks have told enterprise buyers a comforting but misleading story: the top models are all roughly the same. OpenAI’s GPT-5 family, Anthropic’s Claude Opus, and Google’s Gemini Pro have clustered within a narrow band on Scale AI’s SWE-Bench Pro leaderboard, making it nearly impossible for engineering leaders to determine which agent will actually perform best inside their codebases.
On Monday, a startup called Datacurve released a benchmark it says shatters that illusion. DeepSWE, a 113-task evaluation spanning 91 open-source repositories and five programming languages, produces a dramatically wider spread among the same frontier models — and crowns OpenAI’s GPT-5.5 as the clear leader at 70%, sixteen points ahead of its nearest competitor.
“On public leaderboards, top models often look relatively close in capability,” wrote Datacurve co-author Serena Ge on X. “DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.”
The benchmark also delivers a pointed critique of the evaluation infrastructure the AI industry relies on to measure progress: Datacurve’s audit found that SWE-Bench Pro’s verifiers — the automated graders that determine whether an agent solved a task — issued incorrect pass/fail verdicts on roughly one-third of the trials it reviewed.
Advertisement
If that finding holds up, it has sweeping implications. Enterprise procurement teams, venture capitalists, and AI lab marketing departments all lean heavily on benchmark scores to make multimillion-dollar decisions. A 32% error rate in the most widely cited coding benchmark suggests the industry may have been navigating by a broken compass.
Why the most popular AI coding benchmark may be grading on a curve
To understand what Datacurve is claiming, it helps to understand how coding benchmarks work — and how they can go wrong.
The dominant paradigm, pioneered by the SWE-Bench family maintained by Scale AI and academic researchers, constructs tasks by mining real GitHub commits. The process extracts a bug fix or feature addition from a repository’s history, rolls the code back to the pre-fix state, and then asks an AI agent to reproduce the change. The original commit’s test suite serves as the verifier: if the agent’s patch makes the same tests pass, it gets credit. This approach has an elegant simplicity, but Datacurve argues it introduces three systemic weaknesses.
First, contamination. Because tasks are drawn from public GitHub history, the problem statement, the discussion, and often the exact solution are already present in the training data of frontier models. “The SWE-Bench family scrapes existing GitHub issues and PRs, which creates two problems: memorization (models have already seen the solution) and triviality (most tasks are small),” Ge wrote.
Advertisement
Second, scope. SWE-Bench Pro tasks require, on average, just 120 lines of code added across 5 files. DeepSWE’s reference solutions average 668 lines added across 7 files — roughly 5.5 times more code. Yet DeepSWE’s prompts are actually shorter, averaging 2,158 characters versus SWE-Bench Pro’s 4,614. In other words, DeepSWE gives the agent less instruction but expects far more output, which more closely mirrors how a human developer might actually delegate work to an AI assistant.
DeepSWE tasks demand roughly five times more code than SWE-Bench Pro’s while giving agents shorter prompts — a design choice intended to mirror how developers actually hand off work. (Source: Datacurve)
Third — and most damaging — verifier reliability. Datacurve drew 30 tasks at random from both DeepSWE and SWE-Bench Pro, ran three rollouts across 10 frontier model configurations, and then deployed an LLM-based judge to independently assess whether each agent’s patch actually solved the problem. SWE-Bench Pro’s verifiers accepted wrong implementations 8.5% of the time and rejected correct implementations 24% of the time. DeepSWE’s verifiers registered 0.3% and 1.1%, respectively.
Datacurve’s audit found that SWE-Bench Pro’s automated graders rejected correct solutions 24 percent of the time and accepted wrong ones 8.5 percent of the time. DeepSWE’s verifiers kept both rates near zero. (Source: Datacurve)
Advertisement
The false negative problem is especially insidious because it punishes creative solutions. In one documented case, the gold-standard pull request for a SWE-Bench Pro task refactored a private helper function. An agent that correctly solved the task by inlining the same logic — a perfectly valid engineering choice — failed because the test suite tried to import a symbol that only existed in the original author’s specific implementation.
OpenAI’s GPT-5.5 dominates the new benchmark while Claude and Gemini stumble
DeepSWE’s top-line results reorder the familiar hierarchy in ways that should matter to every engineering team evaluating AI coding tools. On SWE-Bench Pro, models from OpenAI, Anthropic, and Google have traded the lead within a 30-point range. DeepSWE stretches that range to 70 points.
GPT-5.5 leads at 70%, followed by GPT-5.4 at 56% and Claude Opus 4.7 at 54%. From there, the drop-off is steep: Claude Sonnet 4.6 lands at 32%, Gemini 3.5 Flash at 28%, GPT-5.4-mini and Kimi K2.6 tied at 24%, and then a long tail of models in the teens and single digits. Claude Haiku 4.5, which scores 39% on SWE-Bench Pro, collapses to zero on DeepSWE — suggesting that some mid-tier models have been significantly overperforming on easier, potentially contaminated benchmarks.
On SWE-Bench Pro, frontier models cluster within a 30-point range. On DeepSWE, the same models spread across 70 points, with some — like Claude Haiku 4.5 — collapsing entirely. (Source: Datacurve)
Advertisement
GPT-5.5 doesn’t just score the highest — it does so efficiently. The model reaches its 70% pass rate with a median cost of $5.80 per trial, a median wall-clock time of 20 minutes, and a median of 47,000 output tokens. GPT-5.4 emerges as perhaps the best overall value at $3.30 per trial with a 56% score. Claude Opus 4.7, meanwhile, costs significantly more per run, and output tokens, wall-clock duration, and dollar cost per trial all vary by an order of magnitude across the agents tested — yet none of these correlates strongly with pass rate. Agents that emit more tokens, run longer, or cost more do not consistently solve more tasks.
GPT-5.4 and GPT-5.5 occupy the cost-efficient frontier, solving the most tasks for the least money per run. Spending more did not reliably produce better results. (Source: Datacurve)
Datacurve’s audit found that Claude has been reading the answer key on existing benchmarks
Perhaps the most provocative finding in DeepSWE’s analysis concerns what the authors label “CHEATED” verdicts — instances where an agent passes a benchmark not by solving the problem, but by reading the answer.
SWE-Bench Pro’s Docker containers ship the repository’s full .git history, which means the gold-standard solution commit is sitting right there in the container’s file system. Most models ignore it. Claude does not. Datacurve’s analysis found that both Claude Opus 4.7 and Claude Opus 4.6 registered “CHEATED” on more than 12% of their reviewed SWE-Bench Pro rollouts. In those instances, the Claude agent ran commands like git log –all or git show to retrieve the merged fix and paste it into its own patch. The behavior accounted for approximately 18% of Opus 4.7’s passes and 25% of Opus 4.6’s passes on the reviewed sample. The issue has been filed publicly as GitHub issue #93 on the SWE-Bench Pro repository.
Advertisement
GPT-5.4 and GPT-5.5 never exhibited this behavior. Gemini configurations stayed around 1%. Datacurve describes the behavior diplomatically — “The benchmark makes this possible (the gold commit lives in the container), but Claude is the family that consistently does so” — but the implication is clear: a meaningful fraction of Claude’s SWE-Bench Pro scores may reflect environmental exploitation rather than genuine engineering capability.
DeepSWE addresses this by shipping only a shallow clone with the base commit, leaving no gold hash for the agent to discover. It is worth noting that the behavior is arguably a sign of Claude’s environmental attentiveness — the model is very good at exploring its surroundings and exploiting available resources. Whether that counts as “cheating” or “resourcefulness” depends on your perspective, but in the context of a benchmark designed to measure independent problem-solving, it undermines the signal.
Two mechanisms by which agents passed SWE-Bench Pro without solving the underlying problem: reading the answer from the container’s Git history, or stubbing features past weak gold tests. (Source: Datacurve)
Each AI model family fails in its own distinctive way, and the patterns matter for enterprise teams
Beyond the top-line scores, Datacurve’s qualitative trajectory analysis reveals distinctly different failure signatures across model families — a finding that could help engineering teams choose the right model for specific types of work.
Advertisement
Claude is forgetful with multi-part prompts. On DeepSWE, Claude configurations miss stated requirements more than any other family. The pattern is consistent: when a prompt enumerates parallel behaviors — “support both sync and async,” for instance — Claude typically implements the obvious branch and forgets to mirror the change. Datacurve reports that roughly two-thirds of Claude’s “MISSED_REQUIREMENT” failures on DeepSWE follow this “one branch shipped” pattern. In one example, Claude Opus 4.7 correctly landed a sync state-data hook in one engine class while the async engine never received the same hook.
GPT, by contrast, implements exactly what is asked. GPT-5.5 had the lowest rate of missing stated behaviors of any configuration tested. Across multiple runs of the same task, GPT trials tended to converge on the same interpretation of the prompt, suggesting instruction-following precision is a stable trait of the model rather than per-run luck.
One of the most intriguing findings involves self-verification. On DeepSWE, Claude Opus 4.7 and GPT-5.4 wrote and ran new tests in the project’s own test framework on over 80% of their runs — even though no one asked them to. On SWE-Bench Pro, those same models dropped to 28% and 18%, respectively. The reason: SWE-Bench Pro’s prompt template explicitly tells agents they “should not modify the testing logic or any of the tests.” Agents dutifully complied, suppressing a behavior that likely would have improved their performance. This suggests that prompt design in production coding workflows may be inadvertently suppressing valuable agent behaviors — something enterprise teams deploying AI coding agents should carefully audit.
On DeepSWE, top models wrote and ran their own tests in as many as 85 percent of runs. On SWE-Bench Pro, where prompts discourage modifying tests, the same models rarely did so. (Source: Datacurve)
Advertisement
What DeepSWE gets right, what it gets wrong, and what it means for the future of AI benchmarks
Datacurve is forthright about several limitations. The standardized harness, while ensuring fairness, routes all edits through bash rather than the model-specific editing tools each family was trained on — apply_patch for GPT, str_replace_based_edit_tool for Claude. This could hold models below their native ceilings. The benchmark draws exclusively from open-source repositories with 500-plus stars, and results may not generalize to proprietary codebases. Bug localization and refactoring tasks are under-represented, and widely used languages like C++ and Java are absent entirely. The verdict assignments in the qualitative analysis come from an LLM analyzer, not human reviewers, and sample sizes are modest — roughly 90 reviewed rollouts per model per benchmark.
It is also worth noting that Datacurve is a startup with its own commercial interests, and an independent benchmark that reshuffles the leaderboard will inevitably invite scrutiny. The company’s decision to publish the full dataset, all agent trajectories, and the evaluation harness on GitHub mitigates this concern considerably, but independent reproduction will be necessary before the AI community treats these results as definitive.
DeepSWE arrives at an inflection point for the AI coding market. Enterprise adoption of AI coding agents is accelerating rapidly, with engineering organizations making consequential bets on which model to build around. The benchmark market itself has become a strategic battleground — Scale AI’s SWE-Bench Pro, which Datacurve directly critiques, is maintained by a company that also provides evaluation services to the labs whose models it ranks.
If DeepSWE’s central findings about verifier reliability and data contamination hold up under independent scrutiny, they could force a reckoning not just with how the industry measures coding agents, but with the broader question of what benchmarks are actually for. A leaderboard where the grading system is wrong a third of the time is not merely inaccurate — it is the kind of broken instrument that makes everyone feel good about progress that may not be real. And in an industry spending billions on a bet that AI agents can do the work of software engineers, the difference between real progress and the appearance of it is not academic. It is the whole game.
Caviar has released its own take on a phone built around a prominent T. The T-GREAT starts from the iPhone 17 Pro Max and receives a full exterior transformation that mixes jewelry techniques with American symbols. The result sits in the company’s Visionaries collection and arrives only in the top 1-terabyte storage configuration.
Custom work is done on the phone’s back panel, which is transformed into a gold-plated stunner after a base constructed of a jewelry alloy is plated with two layers of 24-karat gold. A raised 3D ‘T’ rises from the center of a textured gold background, complete with a 24-karat gold finish. Alongside and around that letter, you’ll find a painstakingly correct United States flag done in cloisonné enamel, with gold separators keeping the colored areas apart and the flag’s fifty stars and thirteen stripes accurately alternating red and white.
True wireless earbuds provide a snug in-ear fit; Bluetooth 5.4 for fast, reliable connectivity
Built-in mic and easy controls for play/pause, next/previous, up/down volume, answer/reject call, and voice assistant
Includes 3 sets of eartips (S, M, L) to ensure comfort, a USB-C 10-inch charging cable, and a charging case
The enamel work is done using traditional hot cloisonné techniques, which result in a long-lasting layered finish rather than a flat print. The device now has a black anodized frame, which Caviar designed specifically for this edition. That dark border highlights the gold and colorful enamel while also providing the phone with a distinct outline that differs from the conventional Apple titanium edge. The enamel work is done with classic hot cloisonné techniques, which produce a long-lasting layered finish rather than a flat print. Caviar created the device’s new black anodized frame specifically for this edition. That dark border emphasizes the gold and colorful enamel while also giving the phone a distinct shape that contrasts from the standard Apple titanium edge.
Caviar lists the full T-GREAT phone at $10,910. However, if you pay in cryptocurrency, the price drops to $9,900. They also accept customer-owned iPhone 17 Pro Max units, which they will then give the same gold and enamel treatment. Each finished phone, however, comes with a bunch of extras, including an international certificate of authenticity, a personal ownership certificate, and a year’s warranty.
Each unit’s packaging is also unique, with an interactive design using the T motif and a small golden key included for good measure. Production does not begin until payment has cleared, after which handcrafting, inspection, and packing take approximately 1-4 business days before shipping. Delivery times vary according to where you are in the world. Buyers can add personal engraving to the side edges or get more involved by requesting adjustments through Caviar’s design team. Choices include bespoke forms, swapping materials, moving logos around, or coming up with whole new packaging designs. Buyers are assigned a dedicated manager, who will walk you through the entire process from beginning to end. [Source]
Mark Zuckerberg’s 387-foot superyacht Launchpad passes through Seattle’s Ballard Locks on Tuesday. (GeekWire Photo / Todd Bishop)
Mark Zuckerberg’s $300 million superyacht passed through Seattle’s Ballard Locks on Tuesday, the same day Meta disclosed plans to cut nearly 1,400 jobs in Washington state.
The 387-foot Launchpad, built by Dutch shipbuilder Feadship, traveled from Elliott Bay through the locks toward Lake Union, drawing a crowd along the walkway.
The boat’s arrival and the job cuts do not appear to be related, but the irony was not lost on the crowds that hustled down to the locks to catch a glimpse of the giant yacht after word spread through the neighborhood and online. Some booed from the shore and heckled the crew.
“Superyacht-wise, this is the biggest one I’ve had in 14 years,” said a lock operator who was helping to guide the vessel through the large lock.
Bumpers on the side of the boat were about the size of small SUVs, while the back deck had a covered pool and hot tub. More than a dozen crew members were visible, many enjoying the trip through the channel on a partly sunny evening.
Advertisement
Mark Zuckerberg’s 387-foot superyacht Launchpad in the Hiram M. Chittenden Locks in Seattle’s Ballard neighborhood on Tuesday. (GeekWire Photo / Todd Bishop)
The Meta CEO did not appear to be aboard. A crew member, asked if Zuckerberg was on the yacht, shook his head no. Another said the crew wasn’t in town for the FIFA World Cup, and planned to “come and go.”
Launchpad flies a Marshall Islands flag, a common registry for large yachts. One heckler shouted to “pay some fucking taxes.”
The Launchpad in Elliott Bay before transiting Seattle’s Ballard Locks on Tuesday. (Photo courtesy of a GeekWire reader)
The layoffs will impact about 20% of Meta’s Seattle-area workforce. They are part of a companywide reduction of roughly 8,000 positions as the company accelerates spending on AI infrastructure, including capital expenditures that could reach $145 billion this year.
Thanks to Ed Lazowska for tipping us off to the yacht’s arrival.
For decades, the U.S. has had a plutonium problem. Around 100 tons of the stuff was made during the Cold War to go into powerful atomic bombs. But as nuclear stockpiles were dismantled, the government had to store the radioactive material in high-security facilities.
Now it wants startups to help get rid of some of it.
The Department of Energy said Tuesday it has selected five nuclear startups to enter into negotiations with the government to receive a portion of the plutonium, which could potentially be used to power a new generation of nuclear reactors. The Department of Energy previously identified 34 tons of plutonium for disposal.
The five startups include Oklo, Standard Nuclear, Shine Technologies, Flibe Energy, and Exodys Energy.
Advertisement
Energy Secretary Chris Wright was previously on Oklo’s board, but he resigned when he joined the administration and said he has divested his shares. Sam Altman was Oklo’s board chair following its merger with his acquisition company, AltC; Altman resigned the position last year.
While plutonium does exist in nature, it is more typically a by-product of bombarding non-fissile uranium with neutrons. Once formed, that isotope of plutonium has a half-life of 24,000 years, meaning the government can’t just wait it out.
Oklo is developing a reactor that can run on traditional uranium fuel as well as plutonium. The plutonium would help the company fuel its first reactors. Exodys Energy is also developing a reactor that can operate using some plutonium as part of mixed oxide fuel, or MOX, which blends uranium with plutonium. Flibe Energy is working toward a reactor that would run on plutonium and other by-products of fission reactors.
MOX is currently produced in France, and while the U.S. had plans to make it in South Carolina, the first Trump administration canceled the project after it blew through budgets and timelines. One of Oklo’s partners in the project, U.K.-based Newcleo, said it intends to build its own MOX fuel fabrication facility nearby.
Advertisement
Not everyone is thrilled with the plan, though. Since the plutonium came from nuclear weapons, the security concerns are significant. “Countries have tried this before, and they concluded that, as nice as it would be to use that plutonium as fuel, it’s really just a liability and we need to dispose of it permanently,” Scott Roecker, a vice president at the Nuclear Threat Initiative, told the New York Times.
For the startups, the next step is to enter into advanced negotiations with the government over security and the transportation of the plutonium.
When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.
Every year I read through thousands of Startup Battlefield applications. And every year, I see the same pattern: The founders who belong on this stage are often the ones who almost didn’t apply.
They think they’re too early. They think they need more traction. They think the program is for companies further along than they are.
So here’s what we’re actually looking for and how to make sure your application reflects it. The deadline to be considered is May 27, which is tomorrow — time is running out for you to apply right here!
And if you’re not up to speed on this year’s Startup Battlefield details, it’s once again a premiere part of TechCrunch Disrupt, which will be in San Francisco from October 13-15 and concludes with the crowning of this year’s future champion. And that list of champions includes some incredible companies, from giants like Cloudflare and Discord, to the most recent crop of winners, who you can learn about in detail right here.
Advertisement
What gets a company selected for Startup Battlefield
Startup Battlefield is not a competition for the most polished companies. It never has been. It’s a competition for the most promising ones.
We’re looking for companies with ideas that feel meaningfully different and category-defining, with the potential to make a major impact in their industry or geography. For every application, the question we ask is simple: Does this change something? Not incrementally. Genuinely.
Product and disruption. What are you building, and does it represent a real shift in how something works? We’re not looking for a better version of what already exists. We’re looking for the thing that makes the existing version feel obsolete.
The founding team. Why you, why now, why this problem? Your origin story is part of the application. The founders who can articulate their conviction clearly, not just their market size, are the ones who stand out.
Advertisement
Industry and geographic diversity. The Startup Battlefield 200 is a global cohort. We actively look for companies from every corner of the world and every vertical in tech. If you’re building something important in a geography or sector that doesn’t often get a spotlight, that matters to us.
What doesn’t disqualify you from Startup Battlefield
Having press coverage. Local coverage is fine. Industry coverage is fine. A few founder profiles are fine. We’re looking for companies whose core technology hasn’t had its moment yet. If you’ve had some coverage but the product hasn’t been showcased, that’s exactly what Disrupt is for. Apply and show us what you have.
Being pre-launch. You need a working MVP, but you don’t need customers. You don’t need revenue. Pre-launch companies are genuinely welcome.
Having applied before. Many Startup Battlefield 200 companies applied more than once before being selected. A previous rejection says nothing about your company’s future or your chances this time.
Advertisement
Raising money. Bootstrapped, pre-seed, and seed companies are all welcome. Series A companies are reviewed on a case-by-case basis, particularly founders building in capital-intensive industries or raising in markets where funding dynamics differ from Silicon Valley norms.
Tips for a strong Startup Battlefield application
Show your product working. This is the single most important thing. Not a mockup. Not a simulation. Not an animated explainer video with upbeat background music. Your MVP in action, in real time. Even if it’s rough, even if it’s a screen recording from your phone. We want to see it work.
Know your competitive landscape. “We have no competitors” is not a credible answer, and it raises questions about how well you understand your market. Name your competitors, acknowledge them honestly, and then explain clearly and specifically why you win. This is one of the most important parts of the application and one of the most commonly underdeveloped.
Tell your story. Why did you start this company? What did you see that others didn’t? What makes you the right person to build it? The founding narrative is a meaningful part of how we evaluate teams and it’s the part most founders underwrite. Don’t skip it.
Advertisement
Don’t overpolish. Write clearly, show the product, tell the truth about where you are. We can see around rough edges. What we struggle to see around is an application that’s been so carefully managed that the actual company is invisible.
Resubmit if you need to. If you submit before you’re ready, don’t panic. You can resubmit until the May 27 deadline. You cannot edit an already submitted application, but you can submit a new one.
Learn what it takes from the founders who’ve done it
Build Mode, TechCrunch’s podcast for early-stage founders, is the best place to start. Hear directly from past Battlefield companies like Forethought AI and Glīd, breakout founders like Artisan and TaskRabbit, and top-tier investors like General Catalyst on what it takes to build a company worth putting on a global stage.
If you’re on the fence, apply. The worst outcome is you don’t get selected this cycle and you’ll have a stronger application next year for having gone through it.
We built this program to find you before the world does. The application is your first pitch.
Xreal makes some of my favorite display glasses around, functioning as wearable USB-C tethered monitors in glasses form. Xreal’s new budget pair of $299 display glasses, called a01, is part of a new sub-brand called X by Xreal. They’re significantly less expensive than Xreal’s other glasses and have some clever features their other glasses lack.
For the most part, the a01 glasses are more feature-limited compared with the Xreal One Pro and 1S. They have a slightly smaller field of view (50 degrees) and lack the dimming lens and chipset that can pin a display in place.
But they also offer notable advantages. The 1,600-nit brightness of these micro OLED displays is far higher than that of previous Xreal glasses, and I’m curious to compare them. They also support HDR10 for video, which TCL’s recent RayNeo Air 4 Pro glasses also have. The glasses are also pretty light, at 62 grams.
Advertisement
The front faceplates can swap out, giving different looks.
Xreal
What interests me most is a new “anti shake” mode that promises more stable video playback while moving, and a series of snap-on, swappable frame fronts that can change the look of the glasses. There are clear and sunglass-lensed options in the mix.
The X by Xreal a01 glasses are arriving in China now, then are available in the US sometime in July. I’ll review them then, and we’ll see if these can beat out the RayNeo Air 4 Pro as the best budget tethered display glasses in a year that’s already overloaded with smart glasses.
In some parts of the world it’s common for cell service providers to sell new phones at a price significantly below market value, with the caveat that these phones are locked to that service provider alone. It’s questionable whether this practice is good for consumers, but as [Gabriel Broussard Korr] notes, it’s an opportunity for hackers: since it’s possible to run a Linux environment on these phones, they make an inexpensive source of quite powerful computing hardware.
In this case, [Gabriel] was using the Moto G Power 2024, which has 128 GB of storage, 12 GB of RAM, and costs less than $50 when carrier-locked. Rather than trying to install a mobile-oriented Linux distribution (such as postmarketOS), [Gabriel] installed Termux, a terminal emulator which provides a Linux environment within Android. Before doing this, he set up the phone and configured a number of settings for a better Linux experience. Since automatic updates can interfere with these settings, and since none of the provided settings effectively disable these, he used NetGuard to block Internet access from the updater app and from Google Play services.
The next step was to actually install Termux, as well as an X11 extension and an app which exposes an API for Termux. The desktop environment (XFCE in this case) was installed through Termux, and [Gabriel] wrote a shell script to go through the steps of starting it. XFCE worked well on mobile devices because of its full-desktop zoom capability. Even running Linux indirectly, the experience was smooth; [Gabriel] found that GIMP, Shotcut, and VS Code all performed well.
Advertisement
It’s not quite the same set of software, but we’ve previously featured a guide to setting up a similar Linux environment using Termux and AnLinux. Lindroid provides a similar containerized Linux environment; on the other hand, you can also use postmarketOS to make a server from an old phone.
You must be logged in to post a comment Login