Tech

SpaceX S-1 warns orbital AI data centres may not be viable, months after Musk called space-based AI a no-brainer

Published

3 weeks ago

23 April 2026

Summary: SpaceX’s confidential S-1 pre-IPO filing warns that its orbital AI data centre plans “involve significant technical complexity and unproven technologies, and may not achieve commercial viability,” contradicting Elon Musk’s January claim at Davos that space-based AI was a “no-brainer” achievable within two to three years. The filing comes as SpaceX targets a $1.75 trillion IPO valuation and has applied to the FCC for one million data centre satellites, while competitors Starcloud, Google (Project Suncatcher), and Blue Origin pursue their own orbital compute programmes.

SpaceX told prospective investors in its confidential S-1 pre-IPO filing that its plans for orbital AI data centres “involve significant technical complexity and unproven technologies, and may not achieve commercial viability.” The company warned that any future space-based compute infrastructure will operate “in the harsh and unpredictable environment of space, exposing them to a wide and unique range of space-related risks that could cause them to malfunction or fail.” The disclosure, first reported by Reuters on Monday, is legally standard for a company approaching what could be the largest initial public offering in history. It is also a remarkable piece of bureaucratic candour from the same organisation whose chief executive described data centres in orbit as a “no-brainer” three months ago.

At the World Economic Forum in Davos in January, Elon Musk said the lowest-cost place to put AI would be in space “within two years, maybe three at the latest.” He called space-based solar “10 times cheaper than terrestrial solar” because “you don’t need any batteries,” described the cooling problem as solved by simply pointing a radiator away from the sun at three degrees Kelvin, and predicted that more AI capacity would sit in orbit than on Earth within five years. In February, SpaceX filed with the Federal Communications Commission to launch and operate up to one million satellites as the “SpaceX Orbital Data Center system” at altitudes between 500 and 2,000 kilometres. The filing described satellites that would “directly harness near-constant solar power with little operating or maintenance cost.” The S-1, filed confidentially with the Securities and Exchange Commission ahead of a targeted June listing at a $1.75 trillion valuation and a $75 billion raise, says something different.

The physics of the problem

The contradiction between Musk’s public statements and SpaceX’s legal disclosures maps onto a set of engineering constraints that have not changed since Davos. In vacuum, all heat dissipation happens through radiation. There is no convection, no liquid cooling, no fans. To radiate just one megawatt of heat at 20 degrees Celsius, an orbital data centre would need roughly 1,200 square metres of radiator surface, the area of four tennis courts. The International Space Station’s entire electrical system produces only 0.2 megawatts; ground-based hyperscale data centres are racing toward gigawatt scale. The three-degree background temperature of space is irrelevant if the radiators needed to exploit it weigh more than the servers they are cooling.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol’ founder Boris, and some questionable AI art. It’s free, every week, in your inbox. Sign up now!

Power is equally constrained. Solar panels in orbit receive roughly five times more energy than on the ground, with no atmosphere, weather, or nighttime in certain orbits. But it would take approximately one square mile of solar array in Earth orbit to produce one gigawatt at 30% cell efficiency. The ISS produces 0.2 megawatts from arrays that span the length of a football field. Scaling to the gigawatts that a single hyperscale data centre consumes on Earth would require deploying and maintaining solar infrastructure orders of magnitude larger than anything humans have built in space.

Hardware obsolescence may be the most underappreciated constraint. GPUs depreciate as new architectures emerge every two to three years. On Earth, racks are swapped continuously. In orbit, every hardware replacement requires a launch, docking, or robotic servicing mission. Radiation exposure causes bit flips and permanent circuit damage. Radiation-hardened chips lag multiple generations behind commercial processors. Triple modular redundancy, running three parallel systems and taking the majority vote, would triple the hardware requirements. The AI’s soaring energy demands, which the IEA projects will push data centre electricity consumption past 1,000 terawatt-hours by the end of 2026, are real. The question is whether solving them in orbit creates more problems than it solves.

The competitive landscape in orbit

SpaceX is not the only company pursuing orbital compute, which makes the S-1 disclaimer more strategically significant than a standard risk factor. Starcloud, formerly Lumen Orbit, launched the first high-powered GPU into orbit in November 2025, an Nvidia H100 that represented 100 times more compute than had ever operated in space. In December, Starcloud became the first company to run a large language model, Google’s Gemma, and the first to perform in-orbit LLM training. By March 2026 it had raised $170 million at a $1.1 billion valuation, the fastest unicorn in Y Combinator’s history. Its next satellite, targeting 200 kilowatts and a cost of roughly $0.05 per kilowatt-hour, is planned for October.

Google’s Project Suncatcher, a partnership with Planet Labs, plans to launch two test satellites carrying Google TPUs by early 2027 and envisions one-kilometre arrays of 81-satellite compute clusters in dawn-dusk sun-synchronous orbit. Google’s analysis suggests launch costs may fall below $200 per kilogram by the mid-2030s, making space data centres cost-comparable to terrestrial energy costs at that point. Nvidia announced Vera Rubin Space-1, a chip system designed specifically for orbital data centres. Blue Origin filed its own FCC application for 51,600 data centre satellites. The a16z-funded startup Orbital is building an AI satellite constellation. The idea is not fringe. It is attracting serious capital and serious engineering talent. SpaceX’s S-1 is notable precisely because the company that controls the launch vehicles and the satellite internet constellation, the company best positioned to make orbital compute work, is the one telling investors it might not.

The terrestrial alternatives

The S-1 disclosure arrives in a week when the terrestrial alternatives are absorbing enormous investment. Massive AI infrastructure deals like Meta’s $27 billion commitment to Nebius illustrate the scale of spending on ground-based compute. Nuclear-powered AI data centres are attracting dedicated funding, with Valar Atomics raising $450 million at a $2 billion valuation to build small modular reactors purpose-built for AI workloads. The US Department of Energy has identified 16 federal sites for data centre construction adjacent to existing nuclear facilities. By 2026, 18 nuclear-powered AI facilities with a combined capacity of 31.2 gigawatts are tracked globally. Microsoft’s Project Natick deployed an undersea data centre capsule designed for AI workloads in February 2025. The tech industry spent roughly $580 billion in 2025 turning deserts and abandoned factories into GPU-packed facilities.

The pattern is consistent: every approach to the AI power problem that keeps the servers on Earth, or at most underwater, is attracting more capital and progressing faster than the orbital alternatives. Nuclear reactors are a proven technology being adapted to a new use case. Orbital data centres are an unproven technology being proposed for a use case that may not require them. The S-1 language suggests SpaceX’s own engineers and lawyers recognise the distinction, even if the company’s public messaging has not caught up.

The IPO context

The S-1 filing serves two masters. SpaceX needs to present orbital data centres as a credible growth story to justify a $1.75 trillion valuation, the highest ever for a pre-IPO company. It also needs to disclose the risks clearly enough to protect itself from securities litigation if the plans do not materialise. The result is a document that simultaneously promotes and disclaims the same initiative. This is not unusual in IPO filings. It is unusual when the chief executive has spent the preceding three months describing the initiative as inevitable, obvious, and cheaper than the alternatives.

The SpaceX-xAI merger in February, an all-stock transaction valuing the combined entity at $1.25 trillion, was explicitly motivated by orbital data centres. Musk said integrating Starlink’s global satellite mesh with xAI’s large language models was a primary rationale. Musk’s AI chip ambitions through the Terafab project with Intel include dedicated processors for orbital deployments. The one million satellites in the FCC filing would represent a hundred-fold increase over the current population of low Earth orbit. Ars Technica estimated the barebones deployment cost at “at least $1 trillion.” The vast majority of more than 1,000 public comments to the FCC opposed the plan, citing debris, light pollution, and the risk of Kessler syndrome, a cascading chain of collisions that could render entire orbital altitudes unusable.

SpaceX may eventually prove that orbital compute works. The company has a record of achieving what others said was impossible, most notably reusable rockets. But the S-1 filing is not the language of a company that has solved the problem. It is the language of a company that wants credit for trying and protection if it fails. The gap between Davos in January and the SEC in April is the gap between a pitch and a prospectus. Both are real. Only one carries legal liability.

Source link

Tech

All the Android phones getting the new Gemini Intelligence

Published

4 minutes ago

14 May 2026

NewsAdmin

Google has announced plenty of new features and upcoming products, but one of the most intriguing is undoubtedly Gemini Intelligence.

Gemini Intelligence is promised to bring the “best of Gemini” to compatible devices, by integrating premium hardware and software to help users in everyday life. For an overview of what the new system specifically includes, visit our Gemini Intelligence explainer.

Google has revealed that Gemini Intelligence features will roll out in waves from the summer, but which phones are expected to see the upgrade?

We’ve rounded up the Android phones that should see Gemini Intelligence and, where possible, we detail when handsets are likely to receive the upgrade.

For more on Google’s recent Android 17 announcements, make sure you visit our guides on the new Pause Point feature and the Emoji revamp. Finally, the best Android phones list reveals our current favourite handsets on the market.

Which Android phones are expected to see Gemini Intelligence?

At the time of writing, Google hasn’t revealed the exact dates for when we can expect the Android 17 update to launch. Instead, the company has just stated it will begin the roll out this summer.

Android 17 Gemini Intelligence — Gemini Intelligence. Image Credit (Google)

We should also disclaim that, at the time of writing, Google hasn’t officially announced the specific phones that will support Gemini Intelligence. Instead, Google states that the first Android devices to see Gemini Intelligence will be the latest Samsung Galaxy and Google Pixel phones. With this in mind, we can assume the entire Pixel 10 series, including the Pixel 10 Pro Fold and potentially even the affordable Pixel 10a, will see the feature.

Pixel 10a in hand. Image Credit (Trusted Reviews)

We don’t currently know if the Pixel 9 series will benefit from Gemini Intelligence, so we’ll have to wait and see.

Similarly, we can reasonably expect that the Galaxy S26 series, including the Galaxy S26 Ultra, will sport Gemini Intelligence. Plus, considering the upcoming Z Fold 8 and Z Flip 8 are rumoured to launch sometime in the summer, it’s likely that the foldable may also use Gemini Intelligence – though that’s speculation on our part.

Which other Android devices will see Gemini Intelligence?

Google has teased that other Android devices will include Gemini Intelligence features. Such devices will include “your watch, car, glasses and laptops.” WearOS will see features like Create my Widget, while Android Auto will soon be able to pair and integrate with Gemini Intelligence-compatible Androids.

Finally, we also know that Google’s new Googlebook line-up will also benefit from Gemini Intelligence features, including the Magic Pointer and Create my Widgets.

Source link

Tech

AWS patched Quick auth bypass, says customers weren’t using control

Published

18 minutes ago

14 May 2026

NewsAdmin

Most users put up with AWS the way you put up with the DMV. I say this with love, but it’s hard to disagree that the UI is awful. The console is a UX time capsule if time capsules weren’t allowed to ever look like other time capsules. The pricing pages were designed by someone who hates you personally, and you accept all of it because the one thing AWS has historically gotten right is the boring, important stuff. The security model. The IAM language no one likes, but everyone trusts. The boundary between your account and someone else’s. Get that wrong, and the whole bargain collapses.

So when Fog Security disclosed an authorization bypass in Amazon Quick on May 12 (that’s the BI service formerly known as QuickSight, briefly known as Quick Suite, and now apparently just Quick, but check back next week) and AWS responded with a statement claiming “no customer data was at risk,” it’s fair to ask which definition of customer data they’re using. Because it isn’t an obvious one, and it certainly isn’t mine.

What Fog found

Fog reports that when an Amazon Quick administrator (which is an absolutely devastating personal insult) uses “custom permissions” to explicitly deny access to AI Chat Agents, the UI correctly hides the feature. Great! Awesome! I sure wish to hell I could do that with S3 buckets to which I do not have access! Notably, there’s no other way for an admin to do this – it’s custom permissions or naught.

The API, however, was perfectly willing to keep answering chat requests for any user in the account who knew how to send them. Fog’s proof-of-concept was a non-admin asking the agent “Tell me about mangoes” from a session that was, on paper, locked out of the agent entirely. The agent told them about mangoes.

AWS deployed the fix between March 11 and March 12, eight days after Fog reported it via HackerOne. So far, so coordinated. Seriously, for a company of this scale, that’s underpants-outside-the-pants superhero speed. Good for you; gold star.

What came next

Where this gets uncomfortable is the response. AWS classified the severity as “none.” It issued no customer notification. It published no advisory.

After Fog disclosed the HackerOne report and published a blog post, AWS provided a statement to Fog Security reading, in full: “We appreciate Fog Security’s coordinated disclosure. This issue was addressed in March 2026. No customer data was at risk and there is no customer action required. As always, customers can contact AWS Support with any questions or concerns about the security of their account.”

Take that sentence apart and see how much work “no customer data was at risk” is doing.

Amazon Quick is described on its own product page as an AI assistant that “connects Slack, Microsoft Teams and Outlook, CRMs, databases, and documents in one place” and “grounds every answer in your real business data.” The default chat agent, which is automatically and annoyingly provisioned the instant Quick is enabled whether the customer wants those AI features or not, is the front end for that data. It is the whole point of the front end for that data.

Now consider the actual scenario AWS just patched. An administrator at, say, a regulated bank (an unregulated bank is called “a criminal enterprise that hasn’t been caught yet”) configures custom permissions denying chat agent access to a large group of users. Maybe those users are contractors. Maybe they’re in a business unit that isn’t cleared for AI tools. Maybe the bank’s compliance posture flat-out prohibits shadow AI usage on top of internal data. Until two months ago, every one of those users could send an HTTP request directly to the agent endpoint and get a response.

Fog asked about mangoes because they’re a security firm doing a clean disclosure, not a malicious insider. A malicious insider would not have asked about mangoes.

The question to AWS, with no rhetoric attached: In what sense was customer data not at risk? Either the chat agent doesn’t actually have access to the data the product page says it does (in which case the marketing department has some serious splainin’ to do) or unauthorized users could query an agent wired into customer data, in which case “customer data was at risk” is the correct English-language description of the situation.

AWS clarifies, and says the quiet part out loud

After this story started circulating, AWS offered a follow-up comment that I sincerely appreciate, because it’s so much more honest than the first one. Per a hounded-looking AWS spokesperson: “The researcher was using the Admin Control capability that no customers were actively using when the server side validation was not present.”

Reading that twice doesn’t help. Let me translate.

AWS is saying: Yes, the server-side authorization check was missing. Yes, an authenticated user in your Quick account could bypass the only access control mechanism the service offers. The reason this is fine, apparently, is that no real customer had bothered to configure that access control during the window when it didn’t work.

Um … what?

The defense isn’t “the bug wasn’t real,” which you could be forgiven for hearing in AWS’s first statement. The defense also isn’t “the bug couldn’t have done what Fog says it could have done,” which is the even stronger implication of their first statement. The defense is “the access control didn’t enforce what we said it did, but luckily nobody was relying on it.” This is the corporate-comms equivalent of “the lock on the front door didn’t work, but nobody had locked it anyway, so why are you upset?”

It’s also a surprisingly specific telemetry claim. AWS is asserting that they know zero customers had configured custom permissions to deny chat agent access during the exposure window. That’s a confident thing to say, and an even more interesting thing to volunteer as a defense, because it doubles as a withering review of Quick’s access management model: the only knob the service provides for this purpose, the one AWS’s own documentation explicitly tells administrators to use, has zero recorded uptake.

The same follow-up also pointed back to the HackerOne thread to demonstrate that AWS told Fog throughout the disclosure window that “user-based authorization remained enforced.” Translation: you needed authenticated credentials in the same Quick account to exploit this. Yes. That’s intra-account scope, which Fog documented in their writeup, and which is precisely the scope in which custom permissions are supposed to function as a security boundary. AWS saying “user-based authorization was fine” is saying “you couldn’t exploit this anonymously from the internet,” which was never the threat model in question. The threat model is the contractor with valid SSO credentials whose admin tried to lock them out of some datasets.

Why this matters more than it sounds

Amazon Quick’s access model is already an outlier: IAM policies don’t govern Quick’s AI Chat Agent, SCPs don’t apply, and RCPs don’t apply. Custom permissions are the only knob the service provides. If those don’t enforce, nothing else does. And per AWS’s own follow-up, literally nobody was using them anyway. Both halves of that sentence should be alarming, and AWS is offering them as reassurance.

AWS’s competitive moat for the last decade hasn’t been pricing. It sure as poop hasn’t been developer experience, documentation, console design, or the inscrutable poetry of service names. It’s been the well-earned belief that AWS gets the foundational things right: boundaries, identity, durability, reliability, and the parts customers can’t easily verify themselves. Customers have paid the AWS premium because they trusted the boring stuff.

This year that trust is being tested in a way it hasn’t been before. The 2025–2026 cadence of AWS security advisories has noticeably increased, for reasons that are as yet unclear. Coordinated disclosures from independent researchers keep surfacing missing authorization checks in newer, AI-adjacent services.

The fixes are landing fast, which is good. The customer communication isn’t landing at all, which is, charitably, a choice. A “severity: none” rating on a bypass of the only access control a service offers is not an objective security finding so much as it is a communication decision. And the communication decision now reads, with the benefit of AWS’s follow-up: “We’ll fix the bug, we won’t tell you it existed, and if you ask we’ll explain that you weren’t using the feature anyway.”

AWS gets a lot of forgiveness on the small stuff because they own the big stuff. They might want to reconsider how much of the big stuff they keep classifying as “none.” ®

Source link

Tech

New critical Exim mailer flaw allows remote code execution

Published

32 minutes ago

14 May 2026

NewsAdmin

New critical Exim mailer flaw allows remote code execution

A critical vulnerability affecting certain configurations of the Exim open-source mail transfer agent could be exploited by an unauthenticated remote attacker to execute arbitrary code.

Identified as CVE-2026-45185, the security issue impacts some Exim versions before 4.99.3 that use the default GNU Transport Layer Security (GnuTLS) library for secure communication. It is a user-after-free (UAF) flaw triggered during the TLS shutdown while handling BDAT chunked SMTP traffic.

Exim frees a TLS transfer buffer but later continues using stale callback references that can write data into the freed memory region, which can lead to unauthenticated remote code execution (RCE).

Exim is a widely deployed open-source mail transfer agent (MTA) used to send, receive, and route email on Linux and Unix servers. It is used on Linux servers, in shared hosting environments, enterprise mail systems, and on Debian- and Ubuntu-based distributions, where it has historically been the default mail server.

CVE-2026-45185 was discovered and reported by XBOW researcher Federico Kirschbaum. It impacts Exim versions 4.97 through 4.99.2 on builds compiled with GnuTLS that have STARTTLS and CHUNKING advertised. OpenSSL-based builds are not affected.

Attackers exploiting the vulnerability could execute commands on the server as well as access Exim data and emails, and potentially pivot further into the environment depending on server permissions and configuration.

XBOW reported the vulnerability to the Exim maintainers on May 1st and received an acknowledgment on May 5th. Impacted Linux distributions were notified three days later.

A fix for CVE-2026-45185 was released in Exim version 4.99.3.

AI-assisted exploit build

XBOW reports that creating the proof-of-concept (PoC) exploit was a seven-day challenge between the company’s autonomous AI-driven development system, XBOW Native, and a human researcher assisted by a large language model.

While XBOW Native successfully produced a working exploit for a simplified target Exim server that had no Address Space Layout Randomization (ASLR) and non-PIE (Position Independent Executables) binary.

In a second attempt, the LLM achieved an exploit on a machine with ASLR, but still a non-PIE binary.

“[…] instead of continuing to attack glibc’s allocator with off-the-shelf mechanisms, XBOW Native had taken on Exim’s own allocator,” XBOW researchers say.

Despite the surprising result below, it was the human researcher who won the race, with assistance from the LLM for tasks such as assembling files and testing exploitation avenues.

While the researcher acknowledged the impressive speed of the LLM, they realized the need to shape the work environment instead of letting the model create its own space.

“Honestly, I don’t think LLMs alone are quite ready to write exploits against real-world software yet. After this experience, I think it can solve something CTF-shaped, but I don’t see them reaching the level of real production targets just yet.”

Still, the researcher acknowledged the crucial role of AI tools in helping humans understand unfamiliar code and dig deeper into suspicious areas much faster than without them.

To mitigate the risk, users of Ubuntu and Debian-based Linux distributions should apply the available Exim updates (v4.99.3) through their package managers.

AI chained four zero-days into one exploit that bypassed both renderer and OS sandboxes. A wave of new exploits is coming.

At the Autonomous Validation Summit (May 12 & 14), see how autonomous, context-rich validation finds what’s exploitable, proves controls hold, and closes the remediation loop.

Claim Your Spot

Source link

Tech

Android Auto's biggest update in years delivers edge-to-edge Maps, Gemini, and HD video streaming

Published

46 minutes ago

14 May 2026

NewsAdmin

The updated Android Auto brings a complete Material 3 Expressive design overhaul, including expressive typography, smooth animations, and vibrant wallpapers. It is the biggest update to the platform since the 2023 “Coolwalk” redesign, which introduced a dynamic interface, split-screen multitasking, a revamped dock, and enhanced safety features.
Read Entire Article
Source link

Tech

AI IQ is here: a new site scores frontier AI models on the human IQ scale. The results are already dividing tech.

Published

60 minutes ago

14 May 2026

NewsAdmin

For decades, the IQ test has been one of the most familiar — and most contested — yardsticks for human intelligence. Now, a startup project called AI IQ is applying the same metaphor to artificial intelligence, assigning estimated intelligence quotients to more than 50 of the world’s most powerful language models and plotting them on a standard bell curve.

The result is a set of interactive visualizations at aiiq.org that have ricocheted across social media in the past week, drawing praise from enterprise technologists who say the charts make an impossibly complex market legible — and sharp criticism from researchers and commentators who warn the entire framework is misleading.

“This is super useful,” wrote Thibaut Mélen, a technology commentator, on X. “Much easier to understand model progress when it’s mapped like this instead of another giant leaderboard table.”

Brian Vellmure, a business strategist, offered a similar endorsement: “This is helpful. Anecdotally tracks with personal experience.”

But the backlash arrived just as quickly. “It’s nonsense. AI is far too jagged. The map is not the territory,” posted AI Deeply, an artificial intelligence commentary account, crystallizing a worry shared by many researchers: that reducing a language model’s sprawling, uneven capabilities to a single number creates a dangerous illusion of precision.

aiiq-ai-models-by-iq-2026-05-13 — More than 50 AI language models, plotted on a standard IQ bell curve by the site AI IQ. The most capable models crowd the right tail of the distribution. (Credit: AI IQ)

Twelve benchmarks, four dimensions, and one controversial number: how AI IQ actually works

AI IQ was created by Ryan Shea, an engineer, entrepreneur, and angel investor best known as a co-founder of the blockchain platform Stacks. Shea also co-founded Voterbase and has invested in the early stages of several unicorns, including OpenSea, Lattice, Anchorage, and Mercury. He holds a Bachelor of Science in Mechanical Engineering from Princeton University.

The site’s methodology rests on a deceptively simple formula. AI IQ groups 12 benchmarks into four reasoning dimensions: abstract, mathematical, programmatic, and academic. The composite IQ is a straight average of those four dimension scores: IQ = ¼ (IQ_Abstract + IQ_Math + IQ_Prog + IQ_Acad).

The abstract reasoning dimension draws from ARC-AGI-1 and ARC-AGI-2, the notoriously difficult pattern-recognition benchmarks designed to test general fluid intelligence. Mathematical reasoning includes FrontierMath (Tiers 1–3 and Tier 4), AIME, and ProofBench. Programmatic reasoning uses Terminal-Bench 2.0, SWE-Bench Verified, and SciCode. Academic reasoning pulls from Humanity’s Last Exam, CritPt, and GPQA Diamond.

Each raw benchmark score gets mapped to an implied IQ through what the site describes as “hand-calibrated difficulty curves.” Crucially, the methodology compresses ceilings for benchmarks considered easier or more susceptible to data contamination, preventing them from inflating scores above 100. Harder, less gameable benchmarks retain higher ceilings. The system also handles missing data conservatively: models need scores on at least two of the four dimensions to receive a derived IQ, and when benchmarks are absent, the pipeline deliberately pulls scores down rather than up. The site states that “every derived IQ averages all four dimensions, so missing coverage cannot make a model look better by omission.”

OpenAI leads the bell curve, but the gap between the top AI models has never been smaller

As of mid-May 2026, the AI IQ charts tell a story of rapid convergence at the top of the frontier — and widening diversity in the tiers below.

According to the Frontier IQ Over Time chart, GPT-5.5 from OpenAI currently sits at the peak of the bell curve, with an estimated IQ near 136 — the highest of any model tracked. It is closely followed by GPT-5.4 (approximately 131), Opus 4.7 from Anthropic (approximately 132), and Opus 4.6 (approximately 129). Google’s Gemini 3.1 Pro lands near 131, making the top cluster extraordinarily tight.

That compression is not unique to AI IQ’s framework. Visual Capitalist, drawing from a separate Mensa-based ranking by TrackingAI, recently observed the same dynamic, noting that “the biggest takeaway is how compressed the top of the leaderboard has become.” On that scale, Grok-4.20 Expert Mode and GPT 5.4 Pro tied at 145, with Gemini 3.1 Pro at 141.

Below the frontier cluster, the AI IQ charts show a crowded midfield. Models from Chinese labs — Kimi K2.6, GLM-5, DeepSeek-V3.2, Qwen3.6, MiniMax-M2.7 — bunch between roughly 112 and 118, making the cost-performance tier increasingly competitive for enterprise buyers who don’t need the absolute best model for every task. One X user, ovsky, noted that the data “confirms experience with sonnet 4.6 being an absolute workhorse as opposed to opus 4.5” — pointing to the way the charts can validate practitioner intuitions that headline rankings often miss.

aiiq-frontier-iq-over-time-2026-05-13 — The trajectory of frontier AI models from October 2023 to mid-2026, as tracked by AI IQ. Provider-colored step-lines connect each lab’s flagship releases, showing roughly 60 points of estimated IQ improvement in 30 months. (Credit: AI IQ)

Why emotional intelligence scores are becoming the new battleground in AI model rankings

What distinguishes AI IQ from most other benchmarking efforts is its inclusion of an “EQ” — emotional intelligence — score. The site maps each model’s EQ-Bench 3 Elo score and Arena Elo score to an estimated EQ using calibrated piecewise-linear scales, then takes a 50/50 weighted composite of the two.

The EQ scores produce a meaningfully different ranking than IQ alone. On the IQ vs. EQ scatter plot, Anthropic’s Opus 4.7 leads on EQ with a score near 132, pushing it into the upper-right quadrant — the most desirable position, signaling both high cognitive and high emotional intelligence. OpenAI’s GPT-5.5 and GPT-5.4 cluster in the high-IQ zone but lag slightly on EQ. Google’s Gemini 3.1 Pro sits in a strong middle position on both axes.

One notable methodological choice has drawn attention: EQ-Bench 3 is judged by Claude, an Anthropic model, which the site acknowledges “creates potential scoring bias in favor of Anthropic models.” To correct for this, AI IQ subtracts a 200-point Elo penalty from the EQ-Bench component for all Anthropic models before mapping to implied EQ. The Arena component is unaffected since it uses human judges. That self-correction is unusual in the benchmarking world, and it suggests Shea is aware of the methodological minefield he has entered. Still, the EQ dimension captures something IQ alone cannot: the growing importance of conversational quality, collaboration, and trust in models deployed for user-facing work.

aiiq-iq-vs-eq-2026-05-13 — Plotting IQ against EQ reveals that the smartest models aren’t always the most emotionally intelligent. Anthropic’s Opus 4.7 dominates the upper-right quadrant. (Credit: AI IQ)

The AI cost-performance chart that enterprise buyers actually need to see

Perhaps the most practically useful chart on the site is not the bell curve but the IQ vs. Effective Cost scatter plot. It maps each model’s estimated IQ against an “effective cost” metric — defined as the token cost for a task using 2 million input tokens and 1 million output tokens, multiplied by a usage efficiency factor.

The chart reveals a familiar pattern in enterprise technology: the best models are not always the best value. GPT-5.5 and Opus 4.7 sit in the upper-left corner — high IQ, high cost, with effective per-task costs north of $30 and $50 respectively. Meanwhile, models like GPT-5.4-mini, DeepSeek-V3.2, and MiniMax-M2.7 occupy a sweet spot in the middle: respectable IQ scores between 112 and 120, at effective costs ranging from roughly $1 to $5 per task. At the cheapest extreme, GPT-oss-20b (an open-source OpenAI model) appears near $0.20 effective cost with an IQ around 107 — potentially the most economical option for bulk classification or extraction workloads.

The site also offers a 3D visualization mapping IQ, EQ, and effective cost simultaneously. A dashed line running through the cube points toward the ideal: higher IQ, higher EQ, and lower cost. Models near the “green end” of that axis are stronger all-around deals; those near the “red end” sacrifice capability, cost efficiency, or both. For CIOs staring at API invoices, the implication is clear: the intelligence gap between a $50 model and a $3 model has narrowed enough that routing — using expensive models for hard problems and cheap ones for everything else — is no longer optional. It is the dominant architecture for serious AI deployments.

Critics say AI’s “jagged” capabilities make a single IQ score dangerously misleading

The loudest objection to AI IQ is philosophical, and it cuts deep. Critics argue that collapsing a model’s uneven capabilities into a single score obscures more than it reveals.

“IQ as a proxy is fading — we’re seeing reasoning density spikes that don’t map to g-factor,” posted Zaya, a technology commentator, on X. “GPT-5.5 already hit saturation on MMLU-Pro, but still fails ClockBench 50% of the time.”

That observation touches on what AI researchers call the “jaggedness” problem: large language models often exhibit wildly uneven capabilities, excelling at graduate-level physics while failing at tasks a child could do. A composite score can paper over those gaps.

Pressureangle, another X user, posted a more granular critique, calling out “complete lack of transparency” and arguing the site never fully discloses how its calibration curves were created or validated. In fairness, AI IQ does list its 12 benchmarks and shows the shape of each calibration curve in its methodology modal. But the raw data and precise mathematical transformations are not published as open datasets — a gap that matters to researchers accustomed to fully reproducible methods.

Others questioned the premise itself. “As useless as human IQ testing,” wrote haashim on X. Shubham Sharma, an AI and technology writer, offered a constructive alternative: “Why not having the Models take an official (MENSA-Grade) test? Wouldn’t this be the most accurate and most ‘human-comparable’ way to benchmark intelligence?” That approach already exists through TrackingAI, which administers the Mensa Norway IQ test to language models. But Mensa-style tests measure only abstract pattern recognition, while AI IQ attempts a broader composite across coding, mathematics, and academic reasoning. As Visual Capitalist noted, “an IQ-style benchmark captures only one slice of capability.” Each approach has tradeoffs — and neither has won the argument yet.

The real race isn’t for the highest score — it’s for the smartest model stack

For all the debate about methodology, the most important signal in AI IQ’s data may not be any single model’s score. It is the shape of the market the charts reveal.

There are now more than 50 frontier-class models available through APIs, from at least 14 major providers spanning the United States, China, and Europe. Each provider publishes its own benchmarks, often cherry-picked to showcase strengths. The result is a Tower of Babel where no two companies measure the same thing in the same way. Academic research has highlighted that “most benchmarks introduce bias by focusing on a particular type of domain,” and the Frontier IQ Over Time chart on AI IQ shows just how fast the targets are moving: in October 2023, GPT-4-turbo sat near an estimated IQ of 75. By early 2026, the top models were brushing 135 — roughly 60 points of improvement in 30 months.

That pace raises a fundamental question about whether any scoring system can keep up. The site compresses ceilings for saturated benchmarks, but as models continue to max out even the hardest tests — ARC-AGI-2, FrontierMath Tier 4, Humanity’s Last Exam — the framework will face the same ceiling effects that have plagued every AI evaluation before it. Connor Forsyth pointed to this dynamic on X: “ARC AGI 3 disagrees,” he wrote, referencing a next-generation benchmark that may already be undermining current scores.

AI IQ is not perfect. Its methodology is partially opaque. Its IQ metaphor can mislead. And its creator acknowledges known biases while likely missing others. But the alternative — wading through dozens of provider-specific benchmark tables, each using different test suites and scoring conventions — is worse. The site offers enterprise buyers something genuinely scarce: a single framework for comparing models across providers, dimensions, and price points, updated regularly, with enough nuance to show that the right answer to “which model is best?” is almost always “it depends on the task.”

As Debdoot Ghosh mused on X after viewing the charts: “Now a human’s role is just to orchestrate?“

Maybe. But if the AI IQ data shows anything clearly, it is that orchestration — knowing which model to deploy, when, and at what price — has become its own form of intelligence. And for that, there is no benchmark yet.

Source link

Tech

Man Who Stole Beyonce’s Hard Drives Gets Five-Year Sentence

Published

1 hour ago

14 May 2026

NewsAdmin

A man accused of stealing hard drives containing unreleased Beyonce music, tour plans, and other materials from a rental car in Atlanta has pleaded guilty and accepted a five-year sentence, including two years in custody. Slashdot Bruce66423 shares a report from The Guardian: Kelvin Evans was by the Atlanta police department in September in connection to a July 2025 car robbery where two suitcases containing Beyonce music and tour plans were stolen from a rental car. […] According to a July police report, Beyonce choreographer Christopher Grant and dancer Diandre Blue called 911 to report a theft from their rental vehicle, a 2024 Jeep Wagoneer, before Beyonce’s Cowboy Carter tour dates in Atlanta. An October indictment stated that Evans entered the car on July 8 “with the intent to commit theft.”

The stolen hard drives contained “watermarked music, some unreleased music, footage plans for the show and past and future set list,” according to a police report. Clothing, designer sunglasses, laptops and AirPods headphones were also stolen, Grant and Blue said. Local law enforcement searched for the location of one of the stolen laptops and the AirPods to try and locate the property. One police officer wrote in the report: “I conducted a suspicious stop in the area, due to the information that was relayed to me. There were several cars in the area also that the AirPods were pinging to in that area also. After further investigation, a silver [redacted], which had traveled into zone 5 was moving at the same time as the tracking on the AirPods.”

Evans was arrested several weeks after Grant and Blue filed a report, and was publicly named as the suspect in September. He was released on a $20,000 bond a month later. At the time of his arrest, Atlanta police said that the stolen property had not been recovered. It is unclear whether it has since been found.

Bruce66423 commented: “Just for stealing a couple of suitcases from a car. Funny how the elite punish those who inconvenience them. Can you imagine an ordinary victim see their offender get that sort of sentence?”

Source link

Tech

Microsoft’s multi-agent AI system tops Anthropic’s Mythos on cybersecurity benchmark

Published

1 hour ago

14 May 2026

NewsAdmin

CyberGym benchmark scores over time, showing the rapid improvement in AI vulnerability discovery capabilities. Microsoft’s multi-model MDASH system (top right) tops the leaderboard at 88.4%. (CyberGym / UC Berkeley)

Mythos has been MDASH’d.

A new AI-powered system from Microsoft surpassed a headline-grabbing rival from Anthropic on a leading cybersecurity benchmark, using more than 100 specialized AI agents working together across multiple AI models to find real-world software vulnerabilities.

Microsoft’s system, codenamed MDASH, was introduced this week alongside the disclosure of 16 new vulnerabilities it found in different versions of Windows, including four “critical” remote code execution flaws fixed in this month’s Patch Tuesday release.

The company, which has faced persistent criticism over security lapses, is betting that multiple models can discover vulnerabilities at a pace that individual models can’t match.

MDASH, derived from the term “multi-model agentic scanning harness,” works by running specialized AI agents through a staged pipeline. Different agents scan code for potential vulnerabilities, then a separate set of agents debate whether each finding is real and exploitable, and a final stage constructs proof-of-concept attacks to confirm the bugs exist.

By comparison, Anthropic’s Mythos, which raised concerns over its ability to find and exploit software vulnerabilities when it was previewed earlier this year, is a single AI model running inside an agent framework. Anthropic restricted its release to a handful of companies through a consortium called Project Glasswing, which includes Microsoft.

OpenAI’s GPT-5.5 and others on the leaderboard are also single-model systems.

MDASH scored 88.45% on the CyberGym benchmark, a test developed by UC Berkeley researchers that measures how well AI systems can reproduce real-world vulnerabilities across 1,507 tasks drawn from 188 open-source software projects.

Mythos Preview was second at 83.1%, followed by GPT-5.5 at 81.8%.

The benchmark gives each system a description of a known vulnerability and an unpatched codebase, and measures whether it can produce a working attack that triggers the bug.

The scores on the CyberGym leaderboard are self-reported by the companies, including Anthropic’s Mythos result. The benchmark code is public, but no independent party has verified any of the scores. Also, benchmark results don’t necessarily reflect real-world performance.

The results also highlight growing concerns about AI’s use as an offensive hacking tool. The same capabilities that allow AI to find vulnerabilities in friendly hands can be used to discover them for exploitation by attackers. Microsoft said MDASH is being used internally by its security engineering teams and will be entering a limited private preview with customers.

Microsoft is telling customers to expect bigger Patch Tuesdays going forward as AI accelerates the discovery of vulnerabilities.

Source link

Tech

Iranian hackers targeted major South Korean electronics maker

Published

2 hours ago

14 May 2026

NewsAdmin

Iranian hackers targeted major South Korean electronics maker

The Iran-linked hacking group MuddyWater (a.k.a. Seedworm, Static Kitten) launched a broad cyber-espionage campaign targeting at least nine high-profile organizations across multiple sectors and countries.

Among the victims are a major South Korean electronics manufacturer, government agencies, an international airport in the Middle East, industrial manufacturers in Asia, and educational institutions.

Researchers at Symantec say that the threat actor “spent a week inside the network of a major South Korean electronics manufacturer in February 2026.”

Symantec’s Threat Hunter Team believes the attacker was intelligence-driven, focusing on industrial and intellectual property theft, government espionage, and access to downstream customers or corporate networks.

Fortemedia and SentinelOne abuse

Seedworm’s campaign relied heavily on DLL sideloading, a common technique in which legitimate, signed software loads malicious DLLs.

Two of the binaries leveraged in the attack are ‘fmapp.exe,’ a legitimate Foremedia audio utility, and ‘sentinelmemoryscanner.exe,’ a legitimate SentinelOne component.

The malicious DLLs (fmapp.dll and sentinelagentcore.dll) contained ChromElevator, a commodity post-exploitation tool that steals data stored in Chrome-based browsers.

Symantec also found that PowerShell, used in previous Seedworm attacks, was still heavily used in the recent incidents, although the payloads were controlled through Node.js loaders rather than directly.

PowerShell was used to capture screenshots, conduct reconnaissance, fetch additional payloads, establish persistence, steal credentials, and create SOCKS5 tunnels.

Attack on a Korean firm

According to Symantec’s observations, the attack on the South Korean electronics manufacturer lasted between February 20 and 27. The researchers did not disclose the name of the targeted organization.

In the first stage, Seedworm performed host and domain reconnaissance, followed by antivirus enumeration via WMI, screenshot capture, and the download of additional malware.

Credential theft occurred via fake Windows prompts, registry hive theft (SAM/SECURITY/SYSTEM), and Kerberos ticket abuse tools.

Persistence was established through registry modifications, beaconing occurred at 90-second intervals, and sideloaded binaries were repeatedly relaunched to maintain access.

“The cadence is again consistent with implant-driven activity rather than continuous operator presence,” the researchers said.

The attackers leveraged sendit.sh, a public file-sharing service for data exfiltration, likely to obscure the malicious activity and make it appear as normal traffic.

Overall, Symantec has found the latest Seedworm campaign notable for the threat actors’ geographic expansion, operational maturity, and the abuse of legitimate tools and services, which mark a shift toward quieter attacks.

AI chained four zero-days into one exploit that bypassed both renderer and OS sandboxes. A wave of new exploits is coming.

At the Autonomous Validation Summit (May 12 & 14), see how autonomous, context-rich validation finds what’s exploitable, proves controls hold, and closes the remediation loop.

Claim Your Spot

Source link

Tech

Apple May Open Up The App Store To Agentic AI

Published

2 hours ago

14 May 2026

NewsAdmin

Close-up photo of an iPhone showing the App Store icon. The device is sitting on a wireless keyboard.

miss.cabul/Shutterstock

Artificial intelligence has posed a multi-layered problem for Apple in recent years. We’re expecting to hear some big news at WWDC this year about how AI will be integrated into the company’s gadgets, but there are still other wrinkles still to be ironed out in its broader approach to the use of this influential technology. According to The Information, one of those challenges is the recent interest and development of agentic AI.

To date, Apple has not permitted vibe coding tools on the App Store because they would violate its policies. They could also potentially be used to create original apps for people who would have otherwise gotten software from the App Store, which could pose a threat to Apple’s revenue as well as creating a loophole for spreading malware or taking other malicious actions. But applying that same block more broadly to any agentic AI services, which can take active control over a device and its programs, could keep Apple out of the loop as those tools are generating a lot of interest among both developers and casual users. Apple is reportedly trying to maintain its control over the App Store, while capitalizing on the current buzz around AI agents.

“While details couldn’t be learned, its staffers are designing a system to adhere to its standards of privacy and security and prevent the more freewheeling behavior some users of agentic systems such as OpenClaw have experienced, where agents can go haywire and delete all of a user’s emails, according to the people briefed on the matter,” the article states.

It sounds like a high wire act for a company that has been struggling to keep pace with AI’s breakneck development. Add this to the long laundry list of information we’ll be curious to see addressed at next month’s keynote.

Source link

Tech

Netflix’s Ad Tier Now Has A Whopping 250 Million Monthly Users

Published

2 hours ago

14 May 2026

NewsAdmin

Elliott Cowand Jr/Shutterstock

Netflix has more than 250 million monthly active users on its ad-supported tier. The figure, which was revealed during the company’s Upfront presentation, marks a huge spike for this subscription option. In 2024 the plan with ads had 70 million users and in 2025 it reached 94 million.

Starting next year, Netflix will also launch the ad-supported plan in 15 more countries: Austria, Belgium, Colombia, Denmark, Indonesia, Ireland, the Netherlands, New Zealand, Norway, Peru, Philippines, Poland, Sweden, Switzerland and Thailand.

The Basic with Ads tier of access started rolling out in 2022. It appears to be an increasingly popular option as Netflix, like most streaming services, has continued to get ever-more expensive. The company just upped all monthly subscription costs by a dollar earlier this year.

And of course, because this is 2026, the Upfront included plenty of talk about AI. Netflix started using the tech in its ads last year, and one of the new potential applications the company is testing will serve “personalized ad loads and frequency caps that dynamically adjust the ads our members see, based on their viewing behaviors.” Netflix is currently facing a lawsuit from Texas on claims that it illegally sells user data to ad tech companies, although the streaming service said the suit was “based on inaccurate and distorted information.”

Source link