Tech

Arkansas Tried To Pass An Unconstitutional Social Media Law. Again. It Lost. Again.

Published

3 weeks ago

23 April 2026

from the maybe-read-your-laws-before-passing-them dept

Back in 2023, Arkansas passed a social media age verification law so poorly drafted that the bill’s own sponsor couldn’t accurately describe who it covered. The law appeared to exempt TikTok, Snapchat, and YouTube while the sponsor publicly claimed those were the exact platforms being targeted. When the state’s own expert witness testified that Snapchat was covered, the state’s own attorney disagreed with his own witness in the same hearing. That law was struck down on First Amendment and vagueness grounds, and then permanently enjoined earlier this year in a suit brought by the trade group NetChoice.

So Arkansas went back to the drawing board and passed Act 900, which was supposed to fix all the problems with the original. Judge Timothy Brooks of the Western District of Arkansas has now preliminarily enjoined that law too, in a ruling that reads like a patient teacher explaining to a student why the homework still doesn’t work despite a rewrite.

The legislature did manage to fix the content-based definition problem that sank the first law, but the progress stops there. Act 900 imposes four main new requirements on social media platforms: a prohibition on “addictive practices,” default settings for minors (including a nighttime notification blackout), privacy default settings at the most protective level, and a parental dashboard requirement. Every single one of these provisions fell apart on review, each in its own special way.

The “addictive practices” provision might be the most impressively broken. Here’s what it actually says platforms must do:

Consistent with contemporary understanding of addiction, compulsory behavior, and child cognitive development, ensure that the social media platform does not engage in practices to evoke any addiction or compulsive behaviors in an Arkansas user who is a minor, including without limitation through notifications, recommended content, artificial sense of accomplishment, or engagement with online bots that appear human.

“Contemporary understanding of addiction” is doing a lot of work here, and it’s not up to the job. There is no consensus that social media constitutes addiction in any clinical sense. So it’s entirely unclear what a company would need to do here, which is fatal in a First Amendment context. And yet, the law is designed such that violations are strict liability and ridiculously broad. A plain reading of the law shows that it is not limited to addiction to the platform itself; a platform can apparently be held liable if its practices “evoke” addiction to off-platform activities. And the statute uses the singular “user,” meaning a single child’s response triggers liability.

As the court puts it:

Not only does Act 900 impose liability based on a single child’s response to the platform, it does so on a strict liability basis—a platform is liable for a practice the evokes addiction in a single child even if it could not have known through the exercise of reasonable care that the practice would have such an effect. “Businesses of ordinary intelligence cannot reliably determine what compliance requires.”

The state, realizing belatedly that it had written an unworkable law, asked the court to just sort of ignore the strict liability language and read in a specific intent requirement that doesn’t exist anywhere in the text. As the judge notes, that’s not how any of this works. The courts interpret the law as written and are not there to fix the legislature’s mistakes:

Instead of defending the statute the General Assembly enacted, Defendants ask the Court to rewrite it by ignoring the strict liability provision altogether and inserting a specific intent requirement that appears nowhere in the text. The Court cannot do so.

Then there’s the default provisions. The court was actually somewhat sympathetic to the idea that the state has a legitimate interest in helping kids sleep. The problem is that the law itself undermines that interest by letting parents flip the nighttime notification blackout off. And the government is not there to fix what parents refuse to do:

While Defendants justify the notification default as an aid to parental authority, they ignore their own evidence that parents are part of the problem. If parents wanted to prevent their children’s sleep from being disrupted by late-night notifications, they have a readily available, free, no-tech solution already at their disposal: taking devices away at night. Yet “86% of adolescents sleep with their phone in the bedroom.” …. The State has provided no evidence that parents lack the tools to assert their authority in this domain, so it appears unlikely that the State’s deferential approach to restricting nighttime notifications will actually serve its stated interest in ensuring minors get enough sleep. This “is not how one addresses a serious social problem.”

The privacy default is worse. It requires platforms to set privacy controls to their most restrictive level for minors — but says nothing about who can change them. Meaning, as the court notes, the minor can just… change them. The state argued this was necessary to protect children from sexual exploitation online. The court points out the obvious problem:

On the other hand, because the default can be changed by the minor, this provision is also wildly underinclusive. Defendants say children need this law to protect them from sexual exploitation online. But the law, in effect, allows children to decide whether they need protection from sexual exploitation online because they are free to depart from the protective default. As Defendants’ evidence shows, teenagers’ developing brains make them less likely than adults to appreciate the risks associated with, for example, making their profiles public… Like the notification default, while the burdens imposed by the privacy default may be slight, they do not appear likely to serve the State’s asserted interest at all. Imposing small burdens on vast quantities of speech for no appreciable benefit is not consistent with the First Amendment. Arkansas cannot sentence speech on the internet to death by a thousand cuts.

Any law that burdens First Amendment speech has to be tailored precisely to a compelling goal. And if it’s either under or over-inclusive, it’s going to have problems surviving. Making it such that kids could just turn off the privacy controls fails that test.

But the dashboard provision is where things get genuinely hilarious, in that dark way where you wonder if anyone read the bill before voting on it. Act 900 has three separate definitions for people who interact with platforms: “account holders,” “users,” and “Arkansas users.” The problem is that, according to the statute’s own definitions, a “user” is specifically someone who is not an account holder — in other words, just a visitor to the site who doesn’t have an account. Yes, it’s confusing. The court is confused. Everyone is confused.

Act 900 has one particularly noteworthy problem: “users.” Act 900 has three different definitions for relationships a person can have with a platform. First, an “account holder” is “an individual who primarily uses, manages, or otherwise controls an account or a profile to use a social media platform.” Id. sec. 1, § 4-88-1401(1). “Account holder” is not used in any of the Act’s operative provisions. Second, a “user” is “a person who has access to view all or some of the posts and content on a social media platform but is not an account holder.” Id. § 1401(12). Third, an “Arkansas user” is “an individual who is a resident of the State of Arkansas and who accesses or attempts to access a social media platform while present in this state.” Id. § 1401(2). “Arkansas users” include both “account holders” and “users,” but “users” are definitionally not “account holders.” The addictive practices provision and the default provisions therefore apply to all Arkansas minors, whether they have a social media account or are merely a website visitor. Worse, the dashboard provision applies only to minor “users,” not account holders.

Again: the dashboard provision requires platforms to build parental supervision tools for minor “users.”

Not account holders. Users. Which, as the court notes, definitionally does not include “account holders.” Meaning it only applies to… random anonymous visitors to the website. Those who have accounts… apparently aren’t covered?

As the court explains, taking the statute at its word would require platforms to:

(1) collect age information from everyone who visits a covered platform to identify minors; and (2) collect and store identity information for every minor who visits a platform to track their “use habits,” connect them with their parents, and effectuate “tools for a parent to restrict his or her minor child’s access.”

This is a law that claims to be about children’s privacy that accidentally requires mass surveillance and identity collection on every anonymous visitor to a website, just in case one of them turns out to be an Arkansas minor. The court openly “questions whether this was the General Assembly’s intended result” but notes it can’t just rewrite the statute because the legislature picked the wrong word. That’s on them. Just like the earlier provision that the state asked the court to quietly rewrite.

The Arkansas legislature does not appear to be a detail-oriented body.

Oh, and there’s also an audit requirement directing platforms to conduct quarterly audits to ensure their products aren’t “causing minors to engage in compulsory or addiction-driven behavior” — again, including off-platform behavior, apparently. How a platform is supposed to audit for behaviors that happen when users aren’t on the platform is left as an exercise for the reader.

What makes this all so maddening is that none of these problems are subtle. The “user” vs. “account holder” mixup is the kind of thing that any lawyer should catch on a close read. The strict liability plus singular “user” combination in the addictive practices provision is exactly the drafting error that made the 2023 law fail. The defaults that can be changed by the very minor they’re supposed to protect — that’s not a hard problem to spot.

There is a reason this pattern keeps repeating.

Passing an unconstitutional law to “protect the kids” from Big Tech generates headlines, press conferences, and signing ceremonies. Governor Sarah Huckabee Sanders got to tweet about how “social media companies have gotten away with exploiting kids for profit” when she signed the original law. That made the news. The permanent injunction three years later, overturning that same law? Barely a ripple. Act 900 itself got its own round of celebratory press. The injunction we’re discussing here will get a fraction of that coverage.

The political asymmetry is kind of the point. State legislatures have figured out that there is essentially no downside to passing obviously unconstitutional social media laws. The upside is maximal: you get to posture as tough on Big Tech, protective of children, and responsive to moral panics about screens and teens. The downside — losing in federal court, wasting state resources on legal fees, and getting lectured by judges about basic First Amendment doctrine — happens quietly, years later, long after the political benefits have been banked.

Arkansas will almost certainly lose its appeal, and either way the legislature will be back next session with a new hastily drafted law that fixes some of Act 900’s problems while introducing fresh ones. And then that will get struck down. And then they’ll try again. Texas, Florida, California, Ohio, Utah, Mississippi, Tennessee, Georgia, and a growing list of other states are running the same play on roughly the same schedule.

The courts keep doing their jobs. NetChoice keeps winning. Judges keep writing careful opinions explaining, for what feels like the hundredth time, that strict scrutiny means what it means, vagueness doctrine exists for a reason, and you cannot simply compel platforms to do whatever you want because you have invoked The Children.

None of it matters to the incentive structure. The headline from the signing ceremony is worth more than the opinion from the courthouse. Until that changes — until voters start holding legislators accountable for passing laws that can’t survive even the most basic constitutional review — we’re going to keep reading rulings like this one. Arkansas just provided the latest installment. There will be more.

Filed Under: 1st amendment, arkansas, free speech, privacy, protect the children, social media, social media addiction, social media safety act

Companies: netchoice

Source link

Tech

All the Android phones getting the new Gemini Intelligence

Published

6 minutes ago

14 May 2026

NewsAdmin

Google has announced plenty of new features and upcoming products, but one of the most intriguing is undoubtedly Gemini Intelligence.

Gemini Intelligence is promised to bring the “best of Gemini” to compatible devices, by integrating premium hardware and software to help users in everyday life. For an overview of what the new system specifically includes, visit our Gemini Intelligence explainer.

Google has revealed that Gemini Intelligence features will roll out in waves from the summer, but which phones are expected to see the upgrade?

We’ve rounded up the Android phones that should see Gemini Intelligence and, where possible, we detail when handsets are likely to receive the upgrade.

For more on Google’s recent Android 17 announcements, make sure you visit our guides on the new Pause Point feature and the Emoji revamp. Finally, the best Android phones list reveals our current favourite handsets on the market.

Which Android phones are expected to see Gemini Intelligence?

At the time of writing, Google hasn’t revealed the exact dates for when we can expect the Android 17 update to launch. Instead, the company has just stated it will begin the roll out this summer.

Android 17 Gemini Intelligence — Gemini Intelligence. Image Credit (Google)

We should also disclaim that, at the time of writing, Google hasn’t officially announced the specific phones that will support Gemini Intelligence. Instead, Google states that the first Android devices to see Gemini Intelligence will be the latest Samsung Galaxy and Google Pixel phones. With this in mind, we can assume the entire Pixel 10 series, including the Pixel 10 Pro Fold and potentially even the affordable Pixel 10a, will see the feature.

Pixel 10a in hand. Image Credit (Trusted Reviews)

We don’t currently know if the Pixel 9 series will benefit from Gemini Intelligence, so we’ll have to wait and see.

Similarly, we can reasonably expect that the Galaxy S26 series, including the Galaxy S26 Ultra, will sport Gemini Intelligence. Plus, considering the upcoming Z Fold 8 and Z Flip 8 are rumoured to launch sometime in the summer, it’s likely that the foldable may also use Gemini Intelligence – though that’s speculation on our part.

Which other Android devices will see Gemini Intelligence?

Google has teased that other Android devices will include Gemini Intelligence features. Such devices will include “your watch, car, glasses and laptops.” WearOS will see features like Create my Widget, while Android Auto will soon be able to pair and integrate with Gemini Intelligence-compatible Androids.

Finally, we also know that Google’s new Googlebook line-up will also benefit from Gemini Intelligence features, including the Magic Pointer and Create my Widgets.

Source link

Tech

AWS patched Quick auth bypass, says customers weren’t using control

Published

20 minutes ago

14 May 2026

NewsAdmin

Most users put up with AWS the way you put up with the DMV. I say this with love, but it’s hard to disagree that the UI is awful. The console is a UX time capsule if time capsules weren’t allowed to ever look like other time capsules. The pricing pages were designed by someone who hates you personally, and you accept all of it because the one thing AWS has historically gotten right is the boring, important stuff. The security model. The IAM language no one likes, but everyone trusts. The boundary between your account and someone else’s. Get that wrong, and the whole bargain collapses.

So when Fog Security disclosed an authorization bypass in Amazon Quick on May 12 (that’s the BI service formerly known as QuickSight, briefly known as Quick Suite, and now apparently just Quick, but check back next week) and AWS responded with a statement claiming “no customer data was at risk,” it’s fair to ask which definition of customer data they’re using. Because it isn’t an obvious one, and it certainly isn’t mine.

What Fog found

Fog reports that when an Amazon Quick administrator (which is an absolutely devastating personal insult) uses “custom permissions” to explicitly deny access to AI Chat Agents, the UI correctly hides the feature. Great! Awesome! I sure wish to hell I could do that with S3 buckets to which I do not have access! Notably, there’s no other way for an admin to do this – it’s custom permissions or naught.

The API, however, was perfectly willing to keep answering chat requests for any user in the account who knew how to send them. Fog’s proof-of-concept was a non-admin asking the agent “Tell me about mangoes” from a session that was, on paper, locked out of the agent entirely. The agent told them about mangoes.

AWS deployed the fix between March 11 and March 12, eight days after Fog reported it via HackerOne. So far, so coordinated. Seriously, for a company of this scale, that’s underpants-outside-the-pants superhero speed. Good for you; gold star.

What came next

Where this gets uncomfortable is the response. AWS classified the severity as “none.” It issued no customer notification. It published no advisory.

After Fog disclosed the HackerOne report and published a blog post, AWS provided a statement to Fog Security reading, in full: “We appreciate Fog Security’s coordinated disclosure. This issue was addressed in March 2026. No customer data was at risk and there is no customer action required. As always, customers can contact AWS Support with any questions or concerns about the security of their account.”

Take that sentence apart and see how much work “no customer data was at risk” is doing.

Amazon Quick is described on its own product page as an AI assistant that “connects Slack, Microsoft Teams and Outlook, CRMs, databases, and documents in one place” and “grounds every answer in your real business data.” The default chat agent, which is automatically and annoyingly provisioned the instant Quick is enabled whether the customer wants those AI features or not, is the front end for that data. It is the whole point of the front end for that data.

Now consider the actual scenario AWS just patched. An administrator at, say, a regulated bank (an unregulated bank is called “a criminal enterprise that hasn’t been caught yet”) configures custom permissions denying chat agent access to a large group of users. Maybe those users are contractors. Maybe they’re in a business unit that isn’t cleared for AI tools. Maybe the bank’s compliance posture flat-out prohibits shadow AI usage on top of internal data. Until two months ago, every one of those users could send an HTTP request directly to the agent endpoint and get a response.

Fog asked about mangoes because they’re a security firm doing a clean disclosure, not a malicious insider. A malicious insider would not have asked about mangoes.

The question to AWS, with no rhetoric attached: In what sense was customer data not at risk? Either the chat agent doesn’t actually have access to the data the product page says it does (in which case the marketing department has some serious splainin’ to do) or unauthorized users could query an agent wired into customer data, in which case “customer data was at risk” is the correct English-language description of the situation.

AWS clarifies, and says the quiet part out loud

After this story started circulating, AWS offered a follow-up comment that I sincerely appreciate, because it’s so much more honest than the first one. Per a hounded-looking AWS spokesperson: “The researcher was using the Admin Control capability that no customers were actively using when the server side validation was not present.”

Reading that twice doesn’t help. Let me translate.

AWS is saying: Yes, the server-side authorization check was missing. Yes, an authenticated user in your Quick account could bypass the only access control mechanism the service offers. The reason this is fine, apparently, is that no real customer had bothered to configure that access control during the window when it didn’t work.

Um … what?

The defense isn’t “the bug wasn’t real,” which you could be forgiven for hearing in AWS’s first statement. The defense also isn’t “the bug couldn’t have done what Fog says it could have done,” which is the even stronger implication of their first statement. The defense is “the access control didn’t enforce what we said it did, but luckily nobody was relying on it.” This is the corporate-comms equivalent of “the lock on the front door didn’t work, but nobody had locked it anyway, so why are you upset?”

It’s also a surprisingly specific telemetry claim. AWS is asserting that they know zero customers had configured custom permissions to deny chat agent access during the exposure window. That’s a confident thing to say, and an even more interesting thing to volunteer as a defense, because it doubles as a withering review of Quick’s access management model: the only knob the service provides for this purpose, the one AWS’s own documentation explicitly tells administrators to use, has zero recorded uptake.

The same follow-up also pointed back to the HackerOne thread to demonstrate that AWS told Fog throughout the disclosure window that “user-based authorization remained enforced.” Translation: you needed authenticated credentials in the same Quick account to exploit this. Yes. That’s intra-account scope, which Fog documented in their writeup, and which is precisely the scope in which custom permissions are supposed to function as a security boundary. AWS saying “user-based authorization was fine” is saying “you couldn’t exploit this anonymously from the internet,” which was never the threat model in question. The threat model is the contractor with valid SSO credentials whose admin tried to lock them out of some datasets.

Why this matters more than it sounds

Amazon Quick’s access model is already an outlier: IAM policies don’t govern Quick’s AI Chat Agent, SCPs don’t apply, and RCPs don’t apply. Custom permissions are the only knob the service provides. If those don’t enforce, nothing else does. And per AWS’s own follow-up, literally nobody was using them anyway. Both halves of that sentence should be alarming, and AWS is offering them as reassurance.

AWS’s competitive moat for the last decade hasn’t been pricing. It sure as poop hasn’t been developer experience, documentation, console design, or the inscrutable poetry of service names. It’s been the well-earned belief that AWS gets the foundational things right: boundaries, identity, durability, reliability, and the parts customers can’t easily verify themselves. Customers have paid the AWS premium because they trusted the boring stuff.

This year that trust is being tested in a way it hasn’t been before. The 2025–2026 cadence of AWS security advisories has noticeably increased, for reasons that are as yet unclear. Coordinated disclosures from independent researchers keep surfacing missing authorization checks in newer, AI-adjacent services.

The fixes are landing fast, which is good. The customer communication isn’t landing at all, which is, charitably, a choice. A “severity: none” rating on a bypass of the only access control a service offers is not an objective security finding so much as it is a communication decision. And the communication decision now reads, with the benefit of AWS’s follow-up: “We’ll fix the bug, we won’t tell you it existed, and if you ask we’ll explain that you weren’t using the feature anyway.”

AWS gets a lot of forgiveness on the small stuff because they own the big stuff. They might want to reconsider how much of the big stuff they keep classifying as “none.” ®

Source link

Tech

New critical Exim mailer flaw allows remote code execution

Published

34 minutes ago

14 May 2026

NewsAdmin

New critical Exim mailer flaw allows remote code execution

A critical vulnerability affecting certain configurations of the Exim open-source mail transfer agent could be exploited by an unauthenticated remote attacker to execute arbitrary code.

Identified as CVE-2026-45185, the security issue impacts some Exim versions before 4.99.3 that use the default GNU Transport Layer Security (GnuTLS) library for secure communication. It is a user-after-free (UAF) flaw triggered during the TLS shutdown while handling BDAT chunked SMTP traffic.

Exim frees a TLS transfer buffer but later continues using stale callback references that can write data into the freed memory region, which can lead to unauthenticated remote code execution (RCE).

Exim is a widely deployed open-source mail transfer agent (MTA) used to send, receive, and route email on Linux and Unix servers. It is used on Linux servers, in shared hosting environments, enterprise mail systems, and on Debian- and Ubuntu-based distributions, where it has historically been the default mail server.

CVE-2026-45185 was discovered and reported by XBOW researcher Federico Kirschbaum. It impacts Exim versions 4.97 through 4.99.2 on builds compiled with GnuTLS that have STARTTLS and CHUNKING advertised. OpenSSL-based builds are not affected.

Attackers exploiting the vulnerability could execute commands on the server as well as access Exim data and emails, and potentially pivot further into the environment depending on server permissions and configuration.

XBOW reported the vulnerability to the Exim maintainers on May 1st and received an acknowledgment on May 5th. Impacted Linux distributions were notified three days later.

A fix for CVE-2026-45185 was released in Exim version 4.99.3.

AI-assisted exploit build

XBOW reports that creating the proof-of-concept (PoC) exploit was a seven-day challenge between the company’s autonomous AI-driven development system, XBOW Native, and a human researcher assisted by a large language model.

While XBOW Native successfully produced a working exploit for a simplified target Exim server that had no Address Space Layout Randomization (ASLR) and non-PIE (Position Independent Executables) binary.

In a second attempt, the LLM achieved an exploit on a machine with ASLR, but still a non-PIE binary.

“[…] instead of continuing to attack glibc’s allocator with off-the-shelf mechanisms, XBOW Native had taken on Exim’s own allocator,” XBOW researchers say.

Despite the surprising result below, it was the human researcher who won the race, with assistance from the LLM for tasks such as assembling files and testing exploitation avenues.

While the researcher acknowledged the impressive speed of the LLM, they realized the need to shape the work environment instead of letting the model create its own space.

“Honestly, I don’t think LLMs alone are quite ready to write exploits against real-world software yet. After this experience, I think it can solve something CTF-shaped, but I don’t see them reaching the level of real production targets just yet.”

Still, the researcher acknowledged the crucial role of AI tools in helping humans understand unfamiliar code and dig deeper into suspicious areas much faster than without them.

To mitigate the risk, users of Ubuntu and Debian-based Linux distributions should apply the available Exim updates (v4.99.3) through their package managers.

AI chained four zero-days into one exploit that bypassed both renderer and OS sandboxes. A wave of new exploits is coming.

At the Autonomous Validation Summit (May 12 & 14), see how autonomous, context-rich validation finds what’s exploitable, proves controls hold, and closes the remediation loop.

Claim Your Spot

Source link

Tech

Android Auto's biggest update in years delivers edge-to-edge Maps, Gemini, and HD video streaming

Published

48 minutes ago

14 May 2026

NewsAdmin

The updated Android Auto brings a complete Material 3 Expressive design overhaul, including expressive typography, smooth animations, and vibrant wallpapers. It is the biggest update to the platform since the 2023 “Coolwalk” redesign, which introduced a dynamic interface, split-screen multitasking, a revamped dock, and enhanced safety features.
Read Entire Article
Source link

Tech

AI IQ is here: a new site scores frontier AI models on the human IQ scale. The results are already dividing tech.

Published

1 hour ago

14 May 2026

NewsAdmin

For decades, the IQ test has been one of the most familiar — and most contested — yardsticks for human intelligence. Now, a startup project called AI IQ is applying the same metaphor to artificial intelligence, assigning estimated intelligence quotients to more than 50 of the world’s most powerful language models and plotting them on a standard bell curve.

The result is a set of interactive visualizations at aiiq.org that have ricocheted across social media in the past week, drawing praise from enterprise technologists who say the charts make an impossibly complex market legible — and sharp criticism from researchers and commentators who warn the entire framework is misleading.

“This is super useful,” wrote Thibaut Mélen, a technology commentator, on X. “Much easier to understand model progress when it’s mapped like this instead of another giant leaderboard table.”

Brian Vellmure, a business strategist, offered a similar endorsement: “This is helpful. Anecdotally tracks with personal experience.”

But the backlash arrived just as quickly. “It’s nonsense. AI is far too jagged. The map is not the territory,” posted AI Deeply, an artificial intelligence commentary account, crystallizing a worry shared by many researchers: that reducing a language model’s sprawling, uneven capabilities to a single number creates a dangerous illusion of precision.

aiiq-ai-models-by-iq-2026-05-13 — More than 50 AI language models, plotted on a standard IQ bell curve by the site AI IQ. The most capable models crowd the right tail of the distribution. (Credit: AI IQ)

Twelve benchmarks, four dimensions, and one controversial number: how AI IQ actually works

AI IQ was created by Ryan Shea, an engineer, entrepreneur, and angel investor best known as a co-founder of the blockchain platform Stacks. Shea also co-founded Voterbase and has invested in the early stages of several unicorns, including OpenSea, Lattice, Anchorage, and Mercury. He holds a Bachelor of Science in Mechanical Engineering from Princeton University.

The site’s methodology rests on a deceptively simple formula. AI IQ groups 12 benchmarks into four reasoning dimensions: abstract, mathematical, programmatic, and academic. The composite IQ is a straight average of those four dimension scores: IQ = ¼ (IQ_Abstract + IQ_Math + IQ_Prog + IQ_Acad).

The abstract reasoning dimension draws from ARC-AGI-1 and ARC-AGI-2, the notoriously difficult pattern-recognition benchmarks designed to test general fluid intelligence. Mathematical reasoning includes FrontierMath (Tiers 1–3 and Tier 4), AIME, and ProofBench. Programmatic reasoning uses Terminal-Bench 2.0, SWE-Bench Verified, and SciCode. Academic reasoning pulls from Humanity’s Last Exam, CritPt, and GPQA Diamond.

Each raw benchmark score gets mapped to an implied IQ through what the site describes as “hand-calibrated difficulty curves.” Crucially, the methodology compresses ceilings for benchmarks considered easier or more susceptible to data contamination, preventing them from inflating scores above 100. Harder, less gameable benchmarks retain higher ceilings. The system also handles missing data conservatively: models need scores on at least two of the four dimensions to receive a derived IQ, and when benchmarks are absent, the pipeline deliberately pulls scores down rather than up. The site states that “every derived IQ averages all four dimensions, so missing coverage cannot make a model look better by omission.”

OpenAI leads the bell curve, but the gap between the top AI models has never been smaller

As of mid-May 2026, the AI IQ charts tell a story of rapid convergence at the top of the frontier — and widening diversity in the tiers below.

According to the Frontier IQ Over Time chart, GPT-5.5 from OpenAI currently sits at the peak of the bell curve, with an estimated IQ near 136 — the highest of any model tracked. It is closely followed by GPT-5.4 (approximately 131), Opus 4.7 from Anthropic (approximately 132), and Opus 4.6 (approximately 129). Google’s Gemini 3.1 Pro lands near 131, making the top cluster extraordinarily tight.

That compression is not unique to AI IQ’s framework. Visual Capitalist, drawing from a separate Mensa-based ranking by TrackingAI, recently observed the same dynamic, noting that “the biggest takeaway is how compressed the top of the leaderboard has become.” On that scale, Grok-4.20 Expert Mode and GPT 5.4 Pro tied at 145, with Gemini 3.1 Pro at 141.

Below the frontier cluster, the AI IQ charts show a crowded midfield. Models from Chinese labs — Kimi K2.6, GLM-5, DeepSeek-V3.2, Qwen3.6, MiniMax-M2.7 — bunch between roughly 112 and 118, making the cost-performance tier increasingly competitive for enterprise buyers who don’t need the absolute best model for every task. One X user, ovsky, noted that the data “confirms experience with sonnet 4.6 being an absolute workhorse as opposed to opus 4.5” — pointing to the way the charts can validate practitioner intuitions that headline rankings often miss.

aiiq-frontier-iq-over-time-2026-05-13 — The trajectory of frontier AI models from October 2023 to mid-2026, as tracked by AI IQ. Provider-colored step-lines connect each lab’s flagship releases, showing roughly 60 points of estimated IQ improvement in 30 months. (Credit: AI IQ)

Why emotional intelligence scores are becoming the new battleground in AI model rankings

What distinguishes AI IQ from most other benchmarking efforts is its inclusion of an “EQ” — emotional intelligence — score. The site maps each model’s EQ-Bench 3 Elo score and Arena Elo score to an estimated EQ using calibrated piecewise-linear scales, then takes a 50/50 weighted composite of the two.

The EQ scores produce a meaningfully different ranking than IQ alone. On the IQ vs. EQ scatter plot, Anthropic’s Opus 4.7 leads on EQ with a score near 132, pushing it into the upper-right quadrant — the most desirable position, signaling both high cognitive and high emotional intelligence. OpenAI’s GPT-5.5 and GPT-5.4 cluster in the high-IQ zone but lag slightly on EQ. Google’s Gemini 3.1 Pro sits in a strong middle position on both axes.

One notable methodological choice has drawn attention: EQ-Bench 3 is judged by Claude, an Anthropic model, which the site acknowledges “creates potential scoring bias in favor of Anthropic models.” To correct for this, AI IQ subtracts a 200-point Elo penalty from the EQ-Bench component for all Anthropic models before mapping to implied EQ. The Arena component is unaffected since it uses human judges. That self-correction is unusual in the benchmarking world, and it suggests Shea is aware of the methodological minefield he has entered. Still, the EQ dimension captures something IQ alone cannot: the growing importance of conversational quality, collaboration, and trust in models deployed for user-facing work.

aiiq-iq-vs-eq-2026-05-13 — Plotting IQ against EQ reveals that the smartest models aren’t always the most emotionally intelligent. Anthropic’s Opus 4.7 dominates the upper-right quadrant. (Credit: AI IQ)

The AI cost-performance chart that enterprise buyers actually need to see

Perhaps the most practically useful chart on the site is not the bell curve but the IQ vs. Effective Cost scatter plot. It maps each model’s estimated IQ against an “effective cost” metric — defined as the token cost for a task using 2 million input tokens and 1 million output tokens, multiplied by a usage efficiency factor.

The chart reveals a familiar pattern in enterprise technology: the best models are not always the best value. GPT-5.5 and Opus 4.7 sit in the upper-left corner — high IQ, high cost, with effective per-task costs north of $30 and $50 respectively. Meanwhile, models like GPT-5.4-mini, DeepSeek-V3.2, and MiniMax-M2.7 occupy a sweet spot in the middle: respectable IQ scores between 112 and 120, at effective costs ranging from roughly $1 to $5 per task. At the cheapest extreme, GPT-oss-20b (an open-source OpenAI model) appears near $0.20 effective cost with an IQ around 107 — potentially the most economical option for bulk classification or extraction workloads.

The site also offers a 3D visualization mapping IQ, EQ, and effective cost simultaneously. A dashed line running through the cube points toward the ideal: higher IQ, higher EQ, and lower cost. Models near the “green end” of that axis are stronger all-around deals; those near the “red end” sacrifice capability, cost efficiency, or both. For CIOs staring at API invoices, the implication is clear: the intelligence gap between a $50 model and a $3 model has narrowed enough that routing — using expensive models for hard problems and cheap ones for everything else — is no longer optional. It is the dominant architecture for serious AI deployments.

Critics say AI’s “jagged” capabilities make a single IQ score dangerously misleading

The loudest objection to AI IQ is philosophical, and it cuts deep. Critics argue that collapsing a model’s uneven capabilities into a single score obscures more than it reveals.

“IQ as a proxy is fading — we’re seeing reasoning density spikes that don’t map to g-factor,” posted Zaya, a technology commentator, on X. “GPT-5.5 already hit saturation on MMLU-Pro, but still fails ClockBench 50% of the time.”

That observation touches on what AI researchers call the “jaggedness” problem: large language models often exhibit wildly uneven capabilities, excelling at graduate-level physics while failing at tasks a child could do. A composite score can paper over those gaps.

Pressureangle, another X user, posted a more granular critique, calling out “complete lack of transparency” and arguing the site never fully discloses how its calibration curves were created or validated. In fairness, AI IQ does list its 12 benchmarks and shows the shape of each calibration curve in its methodology modal. But the raw data and precise mathematical transformations are not published as open datasets — a gap that matters to researchers accustomed to fully reproducible methods.

Others questioned the premise itself. “As useless as human IQ testing,” wrote haashim on X. Shubham Sharma, an AI and technology writer, offered a constructive alternative: “Why not having the Models take an official (MENSA-Grade) test? Wouldn’t this be the most accurate and most ‘human-comparable’ way to benchmark intelligence?” That approach already exists through TrackingAI, which administers the Mensa Norway IQ test to language models. But Mensa-style tests measure only abstract pattern recognition, while AI IQ attempts a broader composite across coding, mathematics, and academic reasoning. As Visual Capitalist noted, “an IQ-style benchmark captures only one slice of capability.” Each approach has tradeoffs — and neither has won the argument yet.

The real race isn’t for the highest score — it’s for the smartest model stack

For all the debate about methodology, the most important signal in AI IQ’s data may not be any single model’s score. It is the shape of the market the charts reveal.

There are now more than 50 frontier-class models available through APIs, from at least 14 major providers spanning the United States, China, and Europe. Each provider publishes its own benchmarks, often cherry-picked to showcase strengths. The result is a Tower of Babel where no two companies measure the same thing in the same way. Academic research has highlighted that “most benchmarks introduce bias by focusing on a particular type of domain,” and the Frontier IQ Over Time chart on AI IQ shows just how fast the targets are moving: in October 2023, GPT-4-turbo sat near an estimated IQ of 75. By early 2026, the top models were brushing 135 — roughly 60 points of improvement in 30 months.

That pace raises a fundamental question about whether any scoring system can keep up. The site compresses ceilings for saturated benchmarks, but as models continue to max out even the hardest tests — ARC-AGI-2, FrontierMath Tier 4, Humanity’s Last Exam — the framework will face the same ceiling effects that have plagued every AI evaluation before it. Connor Forsyth pointed to this dynamic on X: “ARC AGI 3 disagrees,” he wrote, referencing a next-generation benchmark that may already be undermining current scores.

AI IQ is not perfect. Its methodology is partially opaque. Its IQ metaphor can mislead. And its creator acknowledges known biases while likely missing others. But the alternative — wading through dozens of provider-specific benchmark tables, each using different test suites and scoring conventions — is worse. The site offers enterprise buyers something genuinely scarce: a single framework for comparing models across providers, dimensions, and price points, updated regularly, with enough nuance to show that the right answer to “which model is best?” is almost always “it depends on the task.”

As Debdoot Ghosh mused on X after viewing the charts: “Now a human’s role is just to orchestrate?“

Maybe. But if the AI IQ data shows anything clearly, it is that orchestration — knowing which model to deploy, when, and at what price — has become its own form of intelligence. And for that, there is no benchmark yet.

Source link

Tech

Man Who Stole Beyonce’s Hard Drives Gets Five-Year Sentence

Published

1 hour ago

14 May 2026

NewsAdmin

A man accused of stealing hard drives containing unreleased Beyonce music, tour plans, and other materials from a rental car in Atlanta has pleaded guilty and accepted a five-year sentence, including two years in custody. Slashdot Bruce66423 shares a report from The Guardian: Kelvin Evans was by the Atlanta police department in September in connection to a July 2025 car robbery where two suitcases containing Beyonce music and tour plans were stolen from a rental car. […] According to a July police report, Beyonce choreographer Christopher Grant and dancer Diandre Blue called 911 to report a theft from their rental vehicle, a 2024 Jeep Wagoneer, before Beyonce’s Cowboy Carter tour dates in Atlanta. An October indictment stated that Evans entered the car on July 8 “with the intent to commit theft.”

The stolen hard drives contained “watermarked music, some unreleased music, footage plans for the show and past and future set list,” according to a police report. Clothing, designer sunglasses, laptops and AirPods headphones were also stolen, Grant and Blue said. Local law enforcement searched for the location of one of the stolen laptops and the AirPods to try and locate the property. One police officer wrote in the report: “I conducted a suspicious stop in the area, due to the information that was relayed to me. There were several cars in the area also that the AirPods were pinging to in that area also. After further investigation, a silver [redacted], which had traveled into zone 5 was moving at the same time as the tracking on the AirPods.”

Evans was arrested several weeks after Grant and Blue filed a report, and was publicly named as the suspect in September. He was released on a $20,000 bond a month later. At the time of his arrest, Atlanta police said that the stolen property had not been recovered. It is unclear whether it has since been found.

Bruce66423 commented: “Just for stealing a couple of suitcases from a car. Funny how the elite punish those who inconvenience them. Can you imagine an ordinary victim see their offender get that sort of sentence?”

Source link

Tech

Microsoft’s multi-agent AI system tops Anthropic’s Mythos on cybersecurity benchmark

Published

1 hour ago

14 May 2026

NewsAdmin

CyberGym benchmark scores over time, showing the rapid improvement in AI vulnerability discovery capabilities. Microsoft’s multi-model MDASH system (top right) tops the leaderboard at 88.4%. (CyberGym / UC Berkeley)

Mythos has been MDASH’d.

A new AI-powered system from Microsoft surpassed a headline-grabbing rival from Anthropic on a leading cybersecurity benchmark, using more than 100 specialized AI agents working together across multiple AI models to find real-world software vulnerabilities.

Microsoft’s system, codenamed MDASH, was introduced this week alongside the disclosure of 16 new vulnerabilities it found in different versions of Windows, including four “critical” remote code execution flaws fixed in this month’s Patch Tuesday release.

The company, which has faced persistent criticism over security lapses, is betting that multiple models can discover vulnerabilities at a pace that individual models can’t match.

MDASH, derived from the term “multi-model agentic scanning harness,” works by running specialized AI agents through a staged pipeline. Different agents scan code for potential vulnerabilities, then a separate set of agents debate whether each finding is real and exploitable, and a final stage constructs proof-of-concept attacks to confirm the bugs exist.

By comparison, Anthropic’s Mythos, which raised concerns over its ability to find and exploit software vulnerabilities when it was previewed earlier this year, is a single AI model running inside an agent framework. Anthropic restricted its release to a handful of companies through a consortium called Project Glasswing, which includes Microsoft.

OpenAI’s GPT-5.5 and others on the leaderboard are also single-model systems.

MDASH scored 88.45% on the CyberGym benchmark, a test developed by UC Berkeley researchers that measures how well AI systems can reproduce real-world vulnerabilities across 1,507 tasks drawn from 188 open-source software projects.

Mythos Preview was second at 83.1%, followed by GPT-5.5 at 81.8%.

The benchmark gives each system a description of a known vulnerability and an unpatched codebase, and measures whether it can produce a working attack that triggers the bug.

The scores on the CyberGym leaderboard are self-reported by the companies, including Anthropic’s Mythos result. The benchmark code is public, but no independent party has verified any of the scores. Also, benchmark results don’t necessarily reflect real-world performance.

The results also highlight growing concerns about AI’s use as an offensive hacking tool. The same capabilities that allow AI to find vulnerabilities in friendly hands can be used to discover them for exploitation by attackers. Microsoft said MDASH is being used internally by its security engineering teams and will be entering a limited private preview with customers.

Microsoft is telling customers to expect bigger Patch Tuesdays going forward as AI accelerates the discovery of vulnerabilities.

Source link

Tech

Iranian hackers targeted major South Korean electronics maker

Published

2 hours ago

14 May 2026

NewsAdmin

Iranian hackers targeted major South Korean electronics maker

The Iran-linked hacking group MuddyWater (a.k.a. Seedworm, Static Kitten) launched a broad cyber-espionage campaign targeting at least nine high-profile organizations across multiple sectors and countries.

Among the victims are a major South Korean electronics manufacturer, government agencies, an international airport in the Middle East, industrial manufacturers in Asia, and educational institutions.

Researchers at Symantec say that the threat actor “spent a week inside the network of a major South Korean electronics manufacturer in February 2026.”

Symantec’s Threat Hunter Team believes the attacker was intelligence-driven, focusing on industrial and intellectual property theft, government espionage, and access to downstream customers or corporate networks.

Fortemedia and SentinelOne abuse

Seedworm’s campaign relied heavily on DLL sideloading, a common technique in which legitimate, signed software loads malicious DLLs.

Two of the binaries leveraged in the attack are ‘fmapp.exe,’ a legitimate Foremedia audio utility, and ‘sentinelmemoryscanner.exe,’ a legitimate SentinelOne component.

The malicious DLLs (fmapp.dll and sentinelagentcore.dll) contained ChromElevator, a commodity post-exploitation tool that steals data stored in Chrome-based browsers.

Symantec also found that PowerShell, used in previous Seedworm attacks, was still heavily used in the recent incidents, although the payloads were controlled through Node.js loaders rather than directly.

PowerShell was used to capture screenshots, conduct reconnaissance, fetch additional payloads, establish persistence, steal credentials, and create SOCKS5 tunnels.

Attack on a Korean firm

According to Symantec’s observations, the attack on the South Korean electronics manufacturer lasted between February 20 and 27. The researchers did not disclose the name of the targeted organization.

In the first stage, Seedworm performed host and domain reconnaissance, followed by antivirus enumeration via WMI, screenshot capture, and the download of additional malware.

Credential theft occurred via fake Windows prompts, registry hive theft (SAM/SECURITY/SYSTEM), and Kerberos ticket abuse tools.

Persistence was established through registry modifications, beaconing occurred at 90-second intervals, and sideloaded binaries were repeatedly relaunched to maintain access.

“The cadence is again consistent with implant-driven activity rather than continuous operator presence,” the researchers said.

The attackers leveraged sendit.sh, a public file-sharing service for data exfiltration, likely to obscure the malicious activity and make it appear as normal traffic.

Overall, Symantec has found the latest Seedworm campaign notable for the threat actors’ geographic expansion, operational maturity, and the abuse of legitimate tools and services, which mark a shift toward quieter attacks.

AI chained four zero-days into one exploit that bypassed both renderer and OS sandboxes. A wave of new exploits is coming.

At the Autonomous Validation Summit (May 12 & 14), see how autonomous, context-rich validation finds what’s exploitable, proves controls hold, and closes the remediation loop.

Claim Your Spot

Source link

Tech

Apple May Open Up The App Store To Agentic AI

Published

2 hours ago

14 May 2026

NewsAdmin

Close-up photo of an iPhone showing the App Store icon. The device is sitting on a wireless keyboard.

miss.cabul/Shutterstock

Artificial intelligence has posed a multi-layered problem for Apple in recent years. We’re expecting to hear some big news at WWDC this year about how AI will be integrated into the company’s gadgets, but there are still other wrinkles still to be ironed out in its broader approach to the use of this influential technology. According to The Information, one of those challenges is the recent interest and development of agentic AI.

To date, Apple has not permitted vibe coding tools on the App Store because they would violate its policies. They could also potentially be used to create original apps for people who would have otherwise gotten software from the App Store, which could pose a threat to Apple’s revenue as well as creating a loophole for spreading malware or taking other malicious actions. But applying that same block more broadly to any agentic AI services, which can take active control over a device and its programs, could keep Apple out of the loop as those tools are generating a lot of interest among both developers and casual users. Apple is reportedly trying to maintain its control over the App Store, while capitalizing on the current buzz around AI agents.

“While details couldn’t be learned, its staffers are designing a system to adhere to its standards of privacy and security and prevent the more freewheeling behavior some users of agentic systems such as OpenClaw have experienced, where agents can go haywire and delete all of a user’s emails, according to the people briefed on the matter,” the article states.

It sounds like a high wire act for a company that has been struggling to keep pace with AI’s breakneck development. Add this to the long laundry list of information we’ll be curious to see addressed at next month’s keynote.

Source link

Tech

Netflix’s Ad Tier Now Has A Whopping 250 Million Monthly Users

Published

2 hours ago

14 May 2026

NewsAdmin

Elliott Cowand Jr/Shutterstock

Netflix has more than 250 million monthly active users on its ad-supported tier. The figure, which was revealed during the company’s Upfront presentation, marks a huge spike for this subscription option. In 2024 the plan with ads had 70 million users and in 2025 it reached 94 million.

Starting next year, Netflix will also launch the ad-supported plan in 15 more countries: Austria, Belgium, Colombia, Denmark, Indonesia, Ireland, the Netherlands, New Zealand, Norway, Peru, Philippines, Poland, Sweden, Switzerland and Thailand.

The Basic with Ads tier of access started rolling out in 2022. It appears to be an increasingly popular option as Netflix, like most streaming services, has continued to get ever-more expensive. The company just upped all monthly subscription costs by a dollar earlier this year.

And of course, because this is 2026, the Upfront included plenty of talk about AI. Netflix started using the tech in its ads last year, and one of the new potential applications the company is testing will serve “personalized ad loads and frequency caps that dynamically adjust the ads our members see, based on their viewing behaviors.” Netflix is currently facing a lawsuit from Texas on claims that it illegally sells user data to ad tech companies, although the streaming service said the suit was “based on inaccurate and distorted information.”

Source link