Connect with us

Tech

This tree search framework hits 98.7% on documents where vector search fails

Published

on

A new open-source framework called PageIndex solves one of the old problems of retrieval-augmented generation (RAG): handling very long documents.

The classic RAG workflow (chunk documents, calculate embeddings, store them in a vector database, and retrieve the top matches based on semantic similarity) works well for basic tasks such as Q&A over small documents.

PageIndex abandons the standard “chunk-and-embed” method entirely and treats document retrieval not as a search problem, but as a navigation problem.

But as enterprises try to move RAG into high-stakes workflows — auditing financial statements, analyzing legal contracts, navigating pharmaceutical protocols — they’re hitting an accuracy barrier that chunk optimization can’t solve.

Advertisement

AlphaGo for documents

PageIndex addresses these limitations by borrowing a concept from game-playing AI rather than search engines: tree search.

When humans need to find specific information in a dense textbook or a long annual report, they do not scan every paragraph linearly. They consult the table of contents to identify the relevant chapter, then the section, and finally the specific page. PageIndex forces the LLM to replicate this human behavior.

Instead of pre-calculating vectors, the framework builds a “Global Index” of the document’s structure, creating a tree where nodes represent chapters, sections, and subsections. When a query arrives, the LLM performs a tree search, explicitly classifying each node as relevant or irrelevant based on the full context of the user’s request.

PageIndex

How PageIndex works (soure: PageIndex GitHub)

Advertisement

“In computer science terms, a table of contents is a tree-structured representation of a document, and navigating it corresponds to tree search,” Zhang said. “PageIndex applies the same core idea — tree search — to document retrieval, and can be thought of as an AlphaGo-style system for retrieval rather than for games.”

This shifts the architectural paradigm from passive retrieval, where the system simply fetches matching text, to active navigation, where an agentic model decides where to look.

The limits of semantic similarity

There is a fundamental flaw in how traditional RAG handles complex data. Vector retrieval assumes that the text most semantically similar to a user’s query is also the most relevant. In professional domains, this assumption frequently breaks down.

Mingtian Zhang, co-founder of PageIndex, points to financial reporting as a prime example of this failure mode. If a financial analyst asks an AI about “EBITDA” (earnings before interest, taxes, depreciation, and amortization), a standard vector database will retrieve every chunk where that acronym or a similar term appears.

Advertisement

“Multiple sections may mention EBITDA with similar wording, yet only one section defines the precise calculation, adjustments, or reporting scope relevant to the question,” Zhang told VentureBeat. “A similarity based retriever struggles to distinguish these cases because the semantic signals are nearly indistinguishable.”

This is the “intent vs. content” gap. The user does not want to find the word “EBITDA”; they want to understand the “logic” behind it for that specific quarter.

Furthermore, traditional embeddings strip the query of its context. Because embedding models have strict input-length limits, the retrieval system usually only sees the specific question being asked, ignoring the previous turns of the conversation. This detaches the retrieval step from the user’s reasoning process. The system matches documents against a short, decontextualized query rather than the full history of the problem the user is trying to solve.

Solving the multi-hop reasoning problem

The real-world impact of this structural approach is most visible in “multi-hop” queries that require the AI to follow a trail of breadcrumbs across different parts of a document.

Advertisement

In a recent benchmark test known as FinanceBench, a system built on PageIndex called “Mafin 2.5” achieved a state-of-the-art accuracy score of 98.7%. The performance gap between this approach and vector-based systems becomes clear when analyzing how they handle internal references.

Zhang offers the example of a query regarding the total value of deferred assets in a Federal Reserve annual report. The main section of the report describes the “change” in value but does not list the total. However, the text contains a footnote: “See Appendix G of this report … for more detailed information.”

A vector-based system typically fails here. The text in Appendix G looks nothing like the user’s query about deferred assets; it is likely just a table of numbers. Because there is no semantic match, the vector database ignores it.

The reasoning-based retriever, however, reads the cue in the main text, follows the structural link to Appendix G, locates the correct table, and returns the accurate figure.

Advertisement

The latency trade-off and infrastructure shift

For enterprise architects, the immediate concern with an LLM-driven search process is latency. Vector lookups occur in milliseconds; having an LLM “read” a table of contents implies a significantly slower user experience.

However, Zhang explains that the perceived latency for the end-user may be negligible due to how the retrieval is integrated into the generation process. In a classic RAG setup, retrieval is a blocking step: the system must search the database before it can begin generating an answer. With PageIndex, retrieval happens inline, during the model’s reasoning process.

“The system can start streaming immediately, and retrieve as it generates,” Zhang said. “That means PageIndex does not add an extra ‘retrieval gate’ before the first token, and Time to First Token (TTFT) is comparable to a normal LLM call.”

This architectural shift also simplifies the data infrastructure. By removing reliance on embeddings, enterprises no longer need to maintain a dedicated vector database. The tree-structured index is lightweight enough to sit in a traditional relational database like PostgreSQL.

Advertisement

This addresses a growing pain point in LLM systems with retrieval components: the complexity of keeping vector stores in sync with living documents. PageIndex separates structure indexing from text extraction. If a contract is amended or a policy updated, the system can handle small edits by re-indexing only the affected subtree rather than reprocessing the entire document corpus.

A decision matrix for the enterprise

While the accuracy gains are compelling, tree-search retrieval is not a universal replacement for vector search. The technology is best viewed as a specialized tool for “deep work” rather than a catch-all for every retrieval task.

For short documents, such as emails or chat logs, the entire context often fits within a modern LLM’s context window, making any retrieval system unnecessary. Conversely, for tasks purely based on semantic discovery, such as recommending similar products or finding content with a similar “vibe,” vector embeddings remain the superior choice because the goal is proximity, not reasoning.

PageIndex fits squarely in the middle: long, highly structured documents where the cost of error is high. This includes technical manuals, FDA filings, and merger agreements. In these scenarios, the requirement is auditability. An enterprise system needs to be able to explain not just the answer, but the path it took to find it (e.g., confirming that it checked Section 4.1, followed the reference to Appendix B, and synthesized the data found there).

Advertisement
PageIndex vs RAG

Image credit: VentureBeat with Nano Banana Pro

The future of agentic retrieval

The rise of frameworks like PageIndex signals a broader trend in the AI stack: the move toward “Agentic RAG.” As models become more capable of planning and reasoning, the responsibility for finding data is moving from the database layer to the model layer.

We are already seeing this in the coding space, where agents like Claude Code and Cursor are moving away from simple vector lookups in favor of active codebase exploration. Zhang believes generic document retrieval will follow the same trajectory.

“Vector databases still have suitable use cases,” Zhang said. “But their historical role as the default database for LLMs and AI will become less clear over time.”

Advertisement

Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Tech

These Official ChromeOS Flex USB Sticks Can Give Your Old Mac or Windows PC a Second Life

Published

on

“People want something that lasts them a long time, that is quality, that is useful,” says Google senior director Alexander Kuscher. “Eventually, when it breaks or when you lose it, you get a new one because you feel taken care of. So I think that builds trust, and the trust is important.”

Flex started as an enterprise service for businesses; Google offered companies worried about security vulnerabilities on aging hardware a way to easily update to a more secure operating system. Or, at least, one that still received updates. After a while, other users started to get ahold of the software, downloading and installing it on their own USB sticks for their personal machines. “We didn’t make it particularly easy at the time,” Kuscher says. “But people did it.”

What led to the more consumer-oriented push of ChromeOS Flex—like this partnership with Back Market—was the end of software support for Microsoft’s Windows 10 operating system last fall. While the OS still technically works, it stopped receiving security updates, and Microsoft has encouraged users to update to Windows 11. But Windows 11 has specific hardware requirements, and it may not be a simple upgrade on certain machines. Google saw this as a moment to provide a cheaper alternative to the “Windows 10 cliff,” as Kuscher puts it. Back Market agreed.

“Ultimately, [Microsoft is] saying that people need to throw away their existing laptop to buy another one,” Hug de Larauze says. “And we say politely, no.”

Advertisement

If you’re tech-savvy, you can forgo Back Market’s $3 stick and download ChromeOS Flex onto a USB drive you have lying around right now.

Buying Refurb

Back Market has done very well for itself despite economic turmoil. As devices become more expensive, people turn to cheaper, refurbished options. He compares the device market to the auto industry.

“Ninety percent of cars are being sold pre-owned,” Hug de Larauze says. “The new normal is to purchase them pre-owned because it’s almost dumb to buy a new one.”

When US president Donald Trump announced sweeping tariffs last year, Hug de Larauze says Back Market sales tripled afterwards. Even after the dust settled a little and it became clear that tariffs would not directly affect smartphones or computers, Hug de Larauze says sales stayed around twice what they’d been before. Back Market made $3.8 billion in 2025, making the company profitable for the first time. While Hug de Larauze says these kinds of economic fluctuations may be good for sending more people to Back Market, he hopes it will shift buyer mindsets to buying refurbished tech writ large.

Advertisement

“We have one planet, and resources are limited,” Hug de Larauze says. “We need to do more with what we already have in every sector. Fashion is the same, transportation is the same, energy is the same, it’s the same for everything.”

Source link

Continue Reading

Tech

Apple’s new Studio Display XDR monitor has limited functionality on older Silicon Macs

Published

on

If you’re looking to pre-order Apple’s new Studio Display XDR monitor today but have an older Mac, beware of some potential issues. According to the compatibility list spotted by Apple Insider, the new display will only work at 60Hz and not at its full 120Hz refresh rate on some older and less powerful Silicon models. Moreover, support for older Intel Macs isn’t mentioned at all for either the Studio Display XDR or cheaper Studio Display.

All Apple Silicon Macs will work with both monitors, including those with the oldest M1 chips, according to the support pages. However, the compatibility list for the Studio Display XDR includes this nugget: “Mac models with M1, M1 Pro, M1 Max, M1 Ultra, M2, and M3 support Studio Display XDR at up to 60Hz. All other Studio Display XDR features are supported.” So even if you have a hotrod M1 Ultra-based Mac, the Studio Display XDR’s refresh rate is capped at 60Hz — despite the fact that the chip can drive third-party monitors at 120Hz.

Similarly, only the iPad Pro M5 supports the Studio Display XDR at 120Hz, with all other compatible models (in the iPad Pro and iPad Air family) limited to 60Hz.

Intel Mac support isn’t mentioned at all in the compatibility list for either display, though they may function in some limited manner when connected. Intel Macs just received their last new OS update with macOS Tahoe (and only three more years of security updates), but it’s still surprising that they’re not compatible with Apple’s latest monitors.

Advertisement

Source link

Continue Reading

Tech

Military Drone Insights for Safer Self-Driving Cars

Published

on

Self-driving cars often struggle with with situations that are commonplace for human drivers. When confronted with construction zones, school buses, power outages, or misbehaving pedestrians, these vehicles often behave unpredictably, leading to crashes or freezing events, causing significant disruption to local traffic and possibly blocking first responders from doing their jobs. Because self-driving cars cannot successfully handle such routine problems, self-driving companies use human babysitters to remotely supervise them and intervene when necessary.

This idea—humans supervising autonomous vehicles from a distance—is not new. The U.S. military has been doing it since the 1980s with unmanned aerial vehicles (UAVs). In those early years, the military experienced numerous accidents due to poorly designed control stations, lack of training, and communication delays.

As a Navy fighter pilot in the 1990s, I was one of the first researchers to examine how to improve the UAV remote supervision interfaces. The thousands of hours I and others have spent working on and observing these systems generated a deep body of knowledge about how to safely manage remote operations. With recent revelations that U.S. commercial self-driving car remote operations are handled by operators in the Philippines, it is clear that self-driving companies have not learned the hard-earned military lessons that would promote safer use of self-driving cars today.

While stationed in the Western Pacific during the Gulf War, I spent a significant amount of time in air operations centers, learning how military strikes were planned, implemented and then replanned when the original plan inevitably fell apart. After obtaining my PhD, I leveraged this experience to begin research on the remote control of UAVs for all three branches of the U.S. military. Sitting shoulder-to-shoulder in tiny trailers with operators flying UAVs in local exercises or from 4000 miles away, my job was to learn about the pain points for the remote operators as well as identify possible improvements as they executed supervisory control over UAVs that might be flying halfway around the world.

Advertisement

Supervisory control refers to situations where humans monitor and support autonomous systems, stepping in when needed. For self-driving cars, this oversight can take several forms. The first is teleoperation, where a human remotely controls the car’s speed and steering from afar. Operators sit at a console with a steering wheel and pedals, similar to a racing simulator. Because this method relies on real-time control, it is extremely sensitive to communication delays.

The second form of supervisory control is remote assistance. Instead of driving the car in real time, a human gives higher-level guidance. For example, an operator might click a path on a map (called laying “breadcrumbs”) to show the car where to go, or interpret information the AI cannot understand, such as hand signals from a construction worker. This method tolerates more delay than teleoperation but is still time-sensitive.

Five Lessons From Military Drone Operations

Over 35 years of UAV operations, the military consistently encountered five major challenges during drone operations which provide valuable lessons for self-driving cars.

Latency

Latency—delays in sending and receiving information due to distance or poor network quality—is the single most important challenge for remote vehicle control. Humans also have their own built-in delay: neuromuscular lag. Even under perfect conditions, people cannot reliably respond to new information in less than 200–500 milliseconds. In remote operations, where communication lag already exists, this makes real-time control even more difficult.

Advertisement

In early drone operations, U.S. Air Force pilots in Las Vegas (the primary U.S. UAV operations center) attempted to take off and land drones in the Middle East using teleoperation. With at least a two-second delay between command and response, the accident rate was 16 times that of fighter jets conducting the same missions . The military switched to local line-of-sight operators and eventually to fully automated takeoffs and landings. When I interviewed the pilots of these UAVs, they all stressed how difficult it was to control the aircraft with significant time lag.

Self-driving car companies typically rely on cellphone networks to deliver commands. These networks are unreliable in cities and prone to delays. This is one reason many companies prefer remote assistance instead of full teleoperation. But even remote assistance can go wrong. In one incident, a Waymo operator instructed a car to turn left when a traffic light appeared yellow in the remote video feed—but the network latency meant that the light had already turned red in the real world. After moving its remote operations center from the U.S. to the Philippines, Waymo’s latency increased even further. It is imperative that control not be so remote, both to resolve the latency issue but also increase oversight for security vulnerabilities.

Workstation Design

Poor interface design has caused many drone accidents. The military learned the hard way that confusing controls, difficult-to-read displays, and unclear autonomy modes can have disastrous consequences. Depending on the specific UAV platform, the FAA attributed between 20% and 100% of Army and Air Force UAV crashes caused by human error through 2004 to poor interface design.

UAV crashes (1986-2004) caused by human factors problems, including poor interface and procedure design. These two categories do not sum to 100% because both factors could be present in an accident.

Advertisement

Human Factors Interface Design Procedure Design
Army Hunter 47% 20% 20%
Army Shadow 21% 80% 40%
Air Force Predator 67% 38% 75%
Air Force Global Hawk 33% 100% 0%

Many UAV aircraft crashes have been caused by poor human control systems. In one case, buttons were placed on the controllers such that it was relatively easy to accidentally shut off the engine instead of firing a missile. This poor design led to the accidents where the remote operators inadvertently shut the engine down instead of launching a missile.

The self-driving industry reveals hints of comparable issues. Some autonomous shuttles use off-the-shelf gaming controllers, which—while inexpensive—were never designed for vehicle control. The off-label use of such controllers can lead to mode confusion, which was a factor in a recent shuttle crash. Significant human-in-the-loop testing is needed to avoid such problems, not only prior to system deployment, but also after major software upgrades.

Operator Workload

Drone missions typically include long periods of surveillance and information gathering, occasionally ending with a missile strike. These missions can sometimes last for days; for example, while the military waits for the person of interest to emerge from a building. As a result, the remote operators experience extreme swings in workload: sometimes overwhelming intensity, sometimes crushing boredom. Both conditions can lead to errors.

When operators teleoperate drones, workload is high and fatigue can quickly set in. But when onboard autonomy handles most of the work, operators can become bored, complacent, and less alert. This pattern is well documented in UAV research.

Advertisement

Self-driving car operators are likely experiencing similar issues for tasks ranging from interpreting confusing signs to helping cars escape dead ends. In simple scenarios, operators may be bored; in emergencies—like driving into a flood zone or responding during a citywide power outage—they can become quickly overwhelmed.

The military has tried for years to have one person supervise many drones at once, because it is far more cost effective. However, cognitive switching costs (regaining awareness of a situation after switching control between drones) result in workload spikes and high stress. That coupled with increasingly complex interfaces and communication delays have made this extremely difficult.

Self-driving car companies likely face the same roadblocks. They will need to model operator workloads and be able to reliably predict what staffing should be and how many vehicles a single person can effectively supervise, especially during emergency operations. If every self-driving car turns out to need a dedicated human to pay close attention, such operations would no longer be cost-effective.

Training

Early drone programs lacked formal training requirements, with training programs designed by pilots, for pilots. Unfortunately, supervising a drone is more akin to air traffic control than actually flying an aircraft, so the military often placed drone operators in critical roles with inadequate preparation. This caused many accidents. Only years later did the military conduct a proper analysis of the knowledge, skills, and abilities needed to conduct safe remote operations, and changed their training program.

Advertisement

Self-driving companies do not publicly share their training standards, and no regulations currently govern the qualifications for remote operators. On-road safety depends heavily on these operators, yet very little is known about how they are selected or taught. If commercial aviation dispatchers are required to have formal training overseen by the FAA, which are very similar to self-driving remote operators, we should hold commercial self-driving companies to similar standards.

Contingency Planning

Aviation has strong protocols for emergencies including predefined procedures for lost communication, backup ground control stations, and highly reliable onboard behaviors when autonomy fails. In the military, drones may fly themselves to safe areas or land autonomously if contact is lost. Systems are designed with cybersecurity threats—like GPS spoofing—in mind.

Self-driving cars appear far less prepared. The 2025 San Francisco power outage left Waymo vehicles frozen in traffic lanes, blocking first responders and creating hazards. These vehicles are supposed to perform “minimum-risk maneuvers” such as pulling to the side—but many of them didn’t. This suggests gaps in contingency planning and basic fail-safe design.

The history of military drone operations offers crucial lessons for the self-driving car industry. Decades of experience show that remote supervision demands extremely low latency, carefully designed control stations, manageable operator workload, rigorous, well-designed training programs, and strong contingency planning.

Advertisement

Self-driving companies appear to be repeating many of the early mistakes made in drone programs. Remote operations are treated as a support feature rather than a mission-critical safety system. But as long as AI struggles with uncertainty, which will be the case for the foreseeable future, remote human supervision will remain essential. The military learned these lessons through painful trial and error, yet the self-driving community appears to be ignoring them. The self-driving industry has the chance—and the responsibility—to learn from our mistakes in combat settings before it harms road users everywhere.

From Your Site Articles

Related Articles Around the Web

Source link

Advertisement
Continue Reading

Tech

Anthropic sees major Claude outage after ‘unprecedented demand’

Published

on

As the US administration proceeds to drop Anthropic as a supplier, many are rallying around the AI company’s relatively ethical stance, creating ‘unprecedented demand’ for Claude.

Anthropic’s Claude has been fast becoming the darling of the AI enthusiasts, for development, research and enterprise work. Now it is facing the might of the US administration which is threatening to drop it entirely as a supplier after a falling out with the Pentagon over so-called “red lines” it would not pass.

With many in Silicon Valley supporting its relatively principled stand, and general users sending it to the top of the US Apple charts in recent days for free downloads – beating OpenAI’s ChatGPT for the first time – its flagship Claude.ai and Claude Code apps went down for around three hours on Monday (2 March), causing many to bemoan its absence. There are already reports of further outages as we write, although its latest update says “a fix has been implemented and we are monitoring the results”.

In a nostalgic post on LinkedIn yesterday, regular contributor to Silicon Republic, AI aficionado Jonathan McCrea wrote: “I now feel the same way about Claude being down as I used to about Twitter being down.”

Advertisement

De facto boycott

Last night, treasury secretary Scott Bessent added his voice to the de facto US administration boycott of Anthropic products saying in a post on X that his department would terminate use of Anthropic products.

It follows a directive from president Donald Trump ordering US agencies to “phase out” their use of the AI company’s products, and his defence department labelling Anthropic a “supply-chain risk”, an allocation normally reserved for foreign suppliers from non-friendly states. Anthropic has been quick to say that this is a “legally unsound’ designation, and is expected to challenge the move in the courts.

Reuters is also reporting that it has seen memos to employees at the Department of Health and Human Services, asking them to switch to other AI platforms such as ChatGPT and Gemini, and at the State Department saying it was switching the model powering its in-house chatbot – StateChat – to OpenAI from Anthropic.

Financially it will surely deal a serious blow to Anthropic in the short term, but some commentators are arguing that it could be a pivotal moment for Anthropic as it may be seen by many as the relatively ethical choice when it comes to the AI giants.

Advertisement

The recent Grok scandal has put a major question mark over xAI’s credentials and OpenAI’s Sam Altman clearly sees the reputational risk as he has been quick to claim that it is ensuring some guardrails in its contract with the Pentagon.

On X yesterday Altman claimed that these guardrails would ensure OpenAI would not be “intentionally used for domestic surveillance of ⁠US persons ​and nationals”.

The backstory

If you haven’t been following, Anthropic drew the ire of the US administration after a standoff with the Pentagon, where Anthropic refused to change its safeguards related to using its AI for fully autonomous weapons, or for mass surveillance of US citizens.

On Thursday (February 27), Anthropic’s Dario Amodei released an official statement saying Anthropic believed that in “a narrow set of cases, we believe AI can undermine, rather than defend, democratic values”.

Advertisement

“Some uses are also simply outside the bounds of what today’s technology can safely and reliably do,” he said. “Two such use cases have never been included in our contracts with the Department of War, and we believe they should not be included.

“We support the use of AI for lawful foreign intelligence and counterintelligence missions. But using these systems for mass domestic surveillance is incompatible with democratic values.”

Amodei went on to say that partially autonomous weapons, like those used today in Ukraine, are vital to the defense of democracy. “But today, frontier AI systems are simply not reliable enough to power fully autonomous weapons. We will not knowingly provide a product that puts America’s warfighters and civilians at risk.”

It’s a debacle that is likely to roll on in coming days, and it remains to be seen whether Anthropic can withstand the unprecedented onslaught from its own government and rely on the support of users for its principled stand. In the short term, its challenge appears to be to meet the current demand on its systems.

Advertisement

Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.

Source link

Advertisement
Continue Reading

Tech

iPhone Coruna virus: possible US government hacking toolset spreading via black market

Published

on

If your iPhone is running an outdated version of iOS, you may have 23 vulnerabilities that can be exploited by highly sophisticated toolkit being sold to bad actors.

An iPhone showing green lines of binary on a green background
Update to iOS 26 to avoid a sophisticated hacking toolkit

It is well known that law enforcement agencies and government entities rely on hardware like GrayKey to attempt a bypass of iPhone security. It seems that the United States Government may have created a monstrous exploit tool that is now being sold and spread to bad actors.
A Wired report details data shared by Google’s Threat Intelligence Group and iVerify. Google explains how the exploit toolkit, named “Coruna,” spread, while iVerify shared its findings tying its origins to the US government.
Continue Reading on AppleInsider | Discuss on our Forums

Source link

Continue Reading

Tech

Barkbox Promo Codes and Discounts: Up to 50% Off

Published

on

As my fellow pet parents will know, it’s amazing how quickly even the tiniest of dogs can demolish their toys and treat stash. We love and spoil them nonetheless. When you subscribe to BarkBox a fresh batch of cleverly themed treats and toys arrives at your doorstep. The costs of pet ownership can stack up quickly, especially if you’re buying your pooch a random gift box that goes well beyond the essentials. That’s why we have Barkbox promo codes and discount options ready to go for you.

Barkbox Promo: Enjoy a Free Toy for a Year at Barkbox

When your monthly Barkbox arrives, it’s like Christmas morning for your dogs. I watch as my two dogs, Rosi and Randy, shake their little Chihuahua mix bodies with barely restrained excitement. They’re never gentle on their toys but the stimulation that comes from textures and chewing is good for their little brains. With Barkbox you get a steady supply of two unique toys and two bags of all-natural treats every month. If you want to see how your dogs react, this Barkbox coupon is good for new Barkbox subscription customers and adds an additional toy in your box every month for a year.

Save 50% on Your First Barkbox Food Subscription With a Barkbox Coupon Code

Another reason why Barkbox is the best dog subscription box is how easy the company makes it to keep your pantry stocked with your dog’s food. Use this Barkbox coupon to save 50% off your first Barkbox food subscription, so you won’t have to end up running out to the grocery store in the middle of the night when your scooper scrapes across the bottom of an empty kibble bin.

Fly Travel Stress-Free With Your Dog and Get $300 Off BARK Air Flights

If you live in a Barkbox flight hub destination, please know I am insanely jealous of you. It’s no secret that flying is stressful and can be very dangerous for pets, especially if they have to ride in a cargo hold. Barkbox makes them the VIP with BARK Air, letting them ride in the cabin with you and get doted on, so things are a lot less scary. This is another perk of having a BarkBox subscription, with the opportunity to save $300 off BARK Air Flights.

Advertisement

Support Your Dog’s Dental Health and Get $10 Off With a Barkbox Coupon

Dental health is crucial for dogs, as it can prevent disease not just in their mouths, but their vital organs. Don’t forget to schedule your yearly cleaning with your vet, but in the meantime, use this BarkBox discount code to get $10 off a special BarkBox Dental kit.

Get an Extra Premium Toy in Every BarkBox With the Extra Toy Club

For having such tiny mouths, my dogs can gnaw through toys with surprising speed. If you’re also buried in a pile of shredded fluff and squeakers from disemboweled toys, the Extra Toy Club can help. This subscription includes dog toys for aggressive chewers of all ages, breeds, and sizes, offering extra durable toys meant to last longer. So far, so good at my house. To upgrade to this subscription box, it’s an extra $9 per month.

Get Exclusive BarkBox Discounts: Join the Email List

If you assume that the punchy branding and witty lingo extend to Barkbox’s email subscribers and not just the box subscription, you’d be correct. As a bonus, you can get exclusive BarkBox discount codes when you sign up to receive these emails. Who also doesn’t love a furry face and reminder of their pet in between work subject lines and bill payment reminders, too?

Source link

Advertisement
Continue Reading

Tech

SoftBank credit outlook hit after betting $30bn more on OpenAI

Published

on

S&P research finds OpenAI to be one of SoftBank’s investments with the ‘weakest’ credit quality.

OpenAI is making SoftBank’s investment portfolio look bad, said financial analyst S&P Global, which lowered the Japanese investment firm’s outlook from ‘stable’ to ‘negative’, with a long-term issuer credit rating of ‘BB+’.

SoftBank is making massive bets on OpenAI, after already investing $30bn into the world’s largest private company as of last year. It is now is gearing to pour another $30bn into OpenAI over the course of the year.

With the new investment, S&P figures that OpenAI will represent 30pc of SoftBank’s investment assets – the same as its investments in Arm. And after the additional investment, SoftBank’s investment portfolio will likely exceed $320bn, making it one of the largest in the world.

Advertisement

However, S&P evaluation found OpenAI to be one of SoftBank’s investments with the “weakest” credit quality. The Japanese firm’s investments in AI majorly involve start-ups and private companies, including SambaNova, Wayve and ABB Robotics, which S&P said exposes SoftBank to “significant AI innovation risk”.

These kinds of investments could weaken SoftBank’s negotiating strength, S&P found, while the additional investment in OpenAI could also worsen the company’s loan-to-value (LTV) ratio.

Last November, SoftBank sold off all of its shares in Nvidia, which came to over $5bn. At the time, company chief financial officer Yoshimitsu Goto reiterated SoftBank’s belief in OpenAI and Arm, commenting: “OpenAI is one of our key growth drivers. Together, Arm and OpenAI are powering SoftBank Group toward our goal of becoming the number one platform provider for the artificial superintelligence era.”

An OpenAI initial public offering would be a well-needed boost for SoftBank’s investment portfolio, according to S&P, which also concluded that SoftBank will need to sell assets and holdings to improve its LTV.

Advertisement

“The negative outlook reflects our view that SoftBank Group’s large follow-on investment in OpenAI means it will take longer than we had assumed for the company to restore the liquidity and quality of its investment assets,” S&P said.

“The company may take measures to ease its financial burden, such as selling assets, but we believe the timing and scale of those measures remain uncertain.”

Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.

Advertisement

Source link

Continue Reading

Tech

Jolla Sailfish pitches a "European phone" for users wary of Google and Apple

Published

on


Jolla’s return to the smartphone market follows a turbulent decade during which the company nearly collapsed, pivoted to licensing its Sailfish OS platform, severed business ties with Russia after the invasion of Ukraine, and later reorganized under the new corporate structure Jollyboys. The reset produced a device assembled in Salo,…
Read Entire Article
Source link

Continue Reading

Tech

GPT-5.3 Instant cuts hallucinations by 26.8% as OpenAI shifts focus from speed to accuracy

Published

on

OpenAI’s GPT-5.3 Instant — the company’s most widely used model — reduces hallucinations by up to 26.8% compared to its predecessor, prioritizing accuracy and conversational reliability over raw performance gains, OpenAI says.

GPT-5.3 Instant, which is essentially the default and is the most used model for ChatGPT users, also improves on tone, relevance and conversation with fewer refusals. It is available on both ChatGPT and on the API. 

Right now, only the Instant model will be upgraded to 5.3, but the company said it is working on updating the other models under ChatGPT, Thinking, and Pro to 5.3 “soon.” 

GPT-5.3 Instant cuts hallucinations by up to 26.8%

OpenAI ran two internal evaluations: one across higher-stakes domains including medicine, finance, and law; the other drawing on user feedback.

Advertisement

Based on higher-stakes evaluations conducted by the company, GPT-5.3 Instant reduces hallucinations by 26.8% when using the web. It improves reliability by 19.7% when relying on its internal knowledge. User feedback showed a 22.5% decrease in hallucinations when answering queries using web search. 

The company said GPT-5.3 Instant is more reliable because it improved how it balances information from the internet with its own internal training and reasoning. 

“More broadly, GPT-5.3 Instant is less likely to overindex on web results, which previously could lead to long lists of links or loosely connected information. It does a stronger job of recognizing the subtext of questions and surfacing the most important information, especially upfront, resulting in answers that are more relevant and immediately usable, without sacrificing speed or tone,” the company said. 

An example OpenAI gave is when a user asks about the biggest signing in Major League Baseball and its impact. The previous model, GPT-5.2, often defaulted to summarizing search results.

Advertisement

Accuracy overtakes performance as OpenAI’s selling point

With this new release, first on its most used model, OpenAI wants enterprise customers and other ChatGPT users to understand that the battlefront is not just about how performant a model is, but also about how well it can adhere to actual information. Instead of focusing on performance metrics such as speed and token savings, the company is leaning more into GPT-5.3 Instant’s reliability. 

Competitors such as Google and Anthropic also tout greater accuracy in their new models. Anthropic said its new Claude Sonnet 4.6 has fewer hallucinations, while Google was forced to pull its Gemma 3 model after it hallucinated false information about a lawmaker. 

GPT-5.3 Instant dials back refusals and “cringe” tone

“This update focuses on the parts of the ChatGPT experience people feel every day: tone, relevance, and conversational flow. These are nuanced problems that don’t always show up in benchmarks, but shape whether ChatGPT feels helpful or frustrating. GPT-5.3 Instant directly reflects user feedback in these areas,” OpenAI said in a blog post.

GPT-5.3 Instant has a more natural conversation style, moving away from what OpenAI claimed was a “cringe” tone that came across as overbearing and made assumptions about user intent. The company noted that it will ensure the chat platform’s personality is more consistent across updates so users will not experience a tonal shift when conversing with the model.

Advertisement

The new model significantly reduces refusals. OpenAI said the previous model would often refuse to answer questions, even when they did not violate any guardrails. Sometimes, the prior model answers “in ways that feel overly cautious or preachy, particularly around sensitive topics.”

The company promises that GPT-5.3 will not do the same and will tone down “overly defensive or moralizing preambles.” This means the model will answer directly, without caveats, so users do not end conversations without a response to their query. 

Despite this, GPT-5.3 Instant still faces some limitations, especially in some languages like Korean and Japanese, where the answers still sound stilted. 

Safety card shows regressions in sexual content and self-harm categories

The new model does not have support for adult content, according to an OpenAI spokesperson in an email to VentureBeat, as the company is still figuring out “how to maximize user freedom while maintaining our high safety bar.” OpenAI does not have a timeline for when it will release that functionality.

Advertisement

OpenAI conducted safety benchmarking on the new model, noting on its safety card that, while it performed well against disallowed content, it still did not match the level of GPT-5.2 Instant. However, OpenAI noted these results could change after launch.

“GPT-5.3 Instant shows regressions relative to GPT-5.2 Instant and GPT-5.1 Instant for disallowed sexual content, and relative to GPT-5.2 Instant for self-harm on both standard and dynamic evaluations,” the company said.

In other categories, OpenAI said the model performs on par with or better than previous releases, and noted the regressions for graphic violence and violent illicit behavior have low statistical significance.

Expect a new model soon?

After announcing GPT-5.3 Instant and noting that updates for Thinking and Pro will be coming soon, OpenAI teased that even this new model could be retiring.

Advertisement

In a post on X, OpenAI said GPT-5.4 is coming “sooner than you think.”

OpenAI did not elaborate on what changes, if any, we can expect with GPT-5.4 and which modes will get it first. 

GPT-5.2 Instant, the predecessor model, will remain available on the ChatGPT model picker until June 3, when it will be retired.

Source link

Advertisement
Continue Reading

Tech

Facebook accounts unavailable in worldwide outage

Published

on

Facebook

Story update after outage was resolved.

Social media giant Facebook suffered a worldwide outage that prevented users from accessing their accounts.

When visiting the site, users were greeted with a message stating there account is temporarily unavailable.

“Your account is currently unavailable due to a site issue. We expect this to be resolved shortly. Please try again in a few minutes,” reads the outage message.

Advertisement
Facebook outage message stating your account is unavailable
Facebook outage message stating your account is unavailable
Source: BleepingComputer

According to DownDetector, the outage began around 4:15 PM ET and is impacting accounts worldwide.

However, the Meta status page only claims there are “High Disruptions” to the Facebook ad manager, Instagram Boost, and the WhatsApp Business API.

BleepingComputer contacted Facebook with questions about the outage and will update the story if we hear back.

Update 6:21 PM ET: The Facebook outage has now been resolved, with users once again able to access their accounts.

However, Facebook has yet to provide any information as to what caused the outage.

Advertisement

Malware is getting smarter. The Red Report 2026 reveals how new threats use math to detect sandboxes and hide in plain sight.

Download our analysis of 1.1 million malicious samples to uncover the top 10 techniques and see if your security stack is blinded.

Source link

Continue Reading

Trending

Copyright © 2025