Connect with us

Technology

DeepMind’s SCoRe shows LLMs can use their internal knowledge to correct their mistakes

Published

on

DeepMind's SCoRe shows LLMs can use their internal knowledge to correct their mistakes

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


While large language models (LLMs) are becoming increasingly effective at complicated tasks, there are many cases where they can’t get the correct answer on the first try. This is why there is growing interest in enabling LLMs to spot and correct their mistakes, also known as “self-correction.” However, current attempts at self-correction are limited and have requirements that often cannot be met in real-world situations.

In a new paper, researchers at Google DeepMind introduce Self-Correction via Reinforcement Learning (SCoRe), a novel technique that significantly improves the self-correction capabilities of LLMs using only self-generated data. SCoRe can be a valuable tool for making LLMs more robust and reliable and opens new possibilities for enhancing their reasoning and problem-solving abilities.

The importance of self-correction in LLMs

“Self-correction is a capability that greatly enhances human thinking,” Aviral Kumar, research scientist at Google DeepMind, told VentureBeat. “Humans often spend more time thinking, trying out multiple ideas, correcting their mistakes, to finally then solve a given challenging question, as opposed to simply in one-shot producing solutions for challenging questions. We would want LLMs to be able to do the same.”

Advertisement

Ideally, an LLM with strong self-correction capabilities should be able to review and refine its own answers until it reaches the correct response. This is especially important because LLMs often possess the knowledge needed to solve a problem internally but fail to use it effectively when generating their initial response.

“From a fundamental ML point of view, no LLM is expected to solve hard problems all within zero-shot using its memory (no human certainly can do this), and hence we want LLMs to spend more thinking computation and correct themselves to succeed on hard problems,” Kumar said.

Previous attempts at enabling self-correction in LLMs have relied on prompt engineering or fine-tuning models specifically for self-correction. These methods usually assume that the model can receive external feedback on the quality of the outputs or has access to an “oracle” that can guide the self-correction process.

These techniques fail to use the intrinsic self-correction capabilities of the model. Supervised fine-tuning (SFT) methods, which involve training a model to fix the mistakes of a base model, have also shown limitations. They often require oracle feedback from human annotators or stronger models and do not rely on the model’s own knowledge. Some SFT methods even require multiple models during inference to verify and refine the answer, which makes it difficult to deploy and use them.

Advertisement

Additionally, DeepMind’s research shows that while SFT methods can improve a model’s initial responses, they do not perform well when the model needs to revise its answers over multiple steps, which is often the case with complicated problems.

“It might very well happen that by the end of training the model will know how to fix the base model’s mistakes but might not have enough capabilities to detect its own mistakes,” Kumar said.

Another challenge with SFT is that it can lead to unintended behavior, such as the model learning to produce the best answer in the first attempt and not changing it in subsequent steps, even if it’s incorrect.

“We found behavior of SFT trained models largely collapses to this ‘direct’ strategy as opposed to learning how to self-correct,” Kumar said.

Advertisement

Self-correction through reinforcement learning

DeepMind SCoRe
DeepMind SCoRe framework (source: arXiv)

To overcome the limitations of previous approaches, the DeepMind researchers turned to reinforcement learning (RL). 

“LLMs today cannot do [self-correction], as is evident from prior studies that evaluate self-correction. This is a fundamental issue,” Kumar said. “LLMs are not trained to look back and introspect their mistakes, they are trained to produce the best response given a question. Hence, we started building methods for self-correction.”

SCoRe trains a single model to both generate responses and correct its own errors without relying on external feedback. Importantly, SCoRe achieves this by training the model entirely on self-generated data, eliminating the need for external knowledge.

Previous attempts to use RL for self-correction have mostly relied on single-turn interactions, which can lead to undesirable outcomes, such as the model focusing solely on the final answer and ignoring the intermediate steps that guide self-correction.

“We do see… ‘behavior collapse’ in LLMs trained to do self-correction with naive RL. It learned to simply ignore the instruction to self-correct and produce the best response out of its memory, in zero-shot, without learning to correct itself,” Kumar said.

Advertisement

To prevent behavior collapse, SCoRe uses a two-stage training process with regularization techniques. The first stage replaces SFT with a process that optimizes correction performance while ensuring that the model’s initial attempts remain close to the base model’s outputs.

The second stage employs multi-turn RL to optimize reward at both the initial and subsequent attempts while incorporating a reward bonus that encourages the model to improve its responses from the first to the second attempt.

“Both the initialization and the reward bonus ensure that the model cannot simply learn to produce the best first-attempt response and only minorly edit it,” the researchers write. “Overall, SCoRe is able to elicit knowledge from the base model to enable positive self-correction.”

SCoRe in action

The DeepMind researchers evaluated SCoRe against existing methods that use self-generated data for self-correction training. They focused on math and coding tasks, using benchmarks such as MATH, MBPP, and HumanEval.

Advertisement
DeepMind SCoRe vs other self-correct methods
DeepMind SCoRe outperforms other self-correct methods in multi-step correction. it also learns to avoid switching correct answers during the correction phase (source: arXiv)

The results showed that SCoRe significantly improved the self-correction capabilities of Gemini 1.0 Pro and 1.5 Flash models. For example, SCoRe achieved a 15.6% absolute gain in self-correction on the MATH benchmark and a 9.1% gain on the HumanEval benchmark in comparison to the base model, beating other self-correction methods by several percentage points.

The most notable improvement was in the model’s ability to correct its mistakes from the first to the second attempt. SCoRe also considerably reduced the instances where the model mistakenly changed a correct answer to an incorrect one, indicating that it learned to apply corrections only when necessary.

Furthermore, SCoRe proved to be highly efficient when combined with inference-time scaling strategies such as self-consistency. By splitting the same inference budget across multiple rounds of correction, SCoRe enabled further performance gains.

DeepMind SCoRe inference-time scaling
SCoRe (green line) enables LLMs to make better use of inference-time scaling techniques (source: arXiv)

While the paper primarily focuses on coding and reasoning tasks, the researchers believe that SCoRe can be beneficial for other applications as well.

“You could imagine teaching models to look back at their outputs that might potentially be unsafe and improve them all by themselves, before showing it to the user,” Kumar said.

The researchers believe that their work has broader implications for training LLMs and highlights the importance of teaching models how to reason and correct themselves rather than simply mapping inputs to outputs. 

Advertisement

Source link
Advertisement
Continue Reading
Advertisement
Click to comment

You must be logged in to post a comment Login

Leave a Reply

Servers computers

REKOMENDASI RAK SERVER 2023, INDORACK PASTINYA #indorack #rakserver #rackserver

Published

on

REKOMENDASI RAK SERVER 2023, INDORACK PASTINYA #indorack #rakserver #rackserver

source

Continue Reading

Technology

Hedosophia leads $7M seed round into retail supply chain AI startup Ameba

Published

on

Hedosophia leads $7M seed round into retail supply chain AI startup Ameba

Traditional retailers have a pressing problem. Fast-moving like Shein and Temu are eating their lunch by leveraging purpose-built, end-to-end supply chains. Meanwhile, incumbent retailers are still stuck on legacy platforms, juggling a myriad number of data sets, and struggling to respond to a punishingly fast market.

A London-based startup thinks it has the solution to this problem. Ameba claims to be able to the unstructured data in a retailer’s supply chain systems, sprinkle in some generative AI, and make the whole thing more efficient. 

The startup has now raised a $7.1 million seed round led by London-based VC firm Hedosophia, which has gained a reputation for rarely revealing which companies it invests in. TechCrunch reached out to the latter for further comment, but did not receive a response before publication. 

Ameba’s platform uses generative AI on top of existing supply chain software to give retailers insights into their global supply chains, extracting data from a wide range of sources in order to predict disruptions and react to bottlenecks. The company claims it can reduce manual data input by 30%.

Advertisement

“In supply chains, particularly in the fashion consumer space, a lot of very important data is currently not being captured,” Ameba’s founder, Cedrik Hoffmann, told TechCrunch. “A lot of times, the things that are in the shops are sold at the wrong cost or they’re out of stock, or whatever.”

He said Ameba captures these unstructured data points that cost systems don’t: “We release that information from the information silos, bring them to a central source and surface the insights that are developed from them to the relevant parties within your organization.”

Co-founder Craig Massie said their underlying AI mixes a range of foundational models, including Open AI’s: “It changes depending on the task at hand and what performs best in our benchmarks for that task. The underlying constant across our AI usage is our multi-step agents — they can take actions, explore your ontology and its connections, read your supplier emails, WhatsApps and attachments.”

So far, British interior hardware and lighting company Plank has used Ameba to generate 140 alerts highlighting critical production and delivery delays that would have previously been missed or overlooked.

Advertisement

Before Ameba, Hoffman was the former supply chain director and co-founder of e-commerce company VALOREO, while Massie is a former Palantir engineer.

Also participating in the funding round were Visionaries Club, which previously led Ameba’s pre-seed round, and Anamcara Capital.

Isabella Yamamoto, principal at Visionaries Club, said in a statement, “After speaking to many supply chain owners, we were convinced that Cedrik and Craig had the experience to  build a disruptive business using AI to eliminate fragmentation in supply chains and unlock competitive advantage for brands.”

Source link

Advertisement

Continue Reading

Technology

Microsoft Office 2024 is now available for Macs and PCs

Published

on

Microsoft Office 2024 is now available for Macs and PCs

Microsoft is releasing a new version of Office this week, designed for people that don’t want to subscribe to Microsoft 365. The standalone Microsoft Office 2024 release is now available for both consumers and small businesses, and includes locked-in-time versions of Word, Excel, PowerPoint, OneNote, and Outlook across both Mac and PC.

Office 2024 includes a lot of the updates that Microsoft has been delivering to Microsoft 365 subscribers over the past few years. Microsoft last released a standalone version of Office in 2021, and this new Office 2024 release includes improvements to the core apps, as well as accessibility and UI changes.

Office 2024 has a new default theme, with Microsoft’s latest Fluent Design principles that match the visual changes to Windows 11. Microsoft has also added accessibility-focused improvements to help Office users find potential accessibility issues in documents, slideshows, workbooks, and emails.

Excel 2024 can now reference Dynamic Arrays.
Image: Microsoft

Advertisement

The biggest changes in Office 2024 can be found in Excel, PowerPoint, and Outlook. Microsoft has added new functions in Excel to use text and arrays in worksheets, alongside a new IMAGE function that can pull pictures from the web. Excel 2024 can also now reference Dynamic Arrays in charts, which can automatically update rather than being fixed to set data points. Microsoft claims the overall speed and stability of Excel 2024 should also be improved.

In PowerPoint Microsoft has added the cameo feature, allowing you to insert a live camera feed into slides. PowerPoint also has a new recording studio feature that includes recording features for narration, animations, transitions, and inking. You can also add closed captions or subtitles to videos and audio files in slides, making presentations a lot more accessible.

Outlook 2024 has improvements to search.
Image: Microsoft

Outlook 2024 includes improvements to search so you get more relevant results for messages, attachments, contacts, and calendar entries. This latest Outlook release also includes more options for meetings, including the ability to automatically shorten them. Mac users can also customize swipe left and right gestures in Outlook.

Advertisement

In Word, Excel, and PowerPoint you can now insert a picture easily from an Android mobile device, and Microsoft is also supporting version 1.4 of the OpenDocument format (ODF) which includes a variety of new improvements. Word and PowerPoint also include the ability to like and react to comments in documents.

Word 2024 has an improved file recovery feature.
Image: Microsoft

Word 2024 users will also be able to recover a session if your PC crashes. Word will automatically open all the documents you had open before your PC crashed, you lost power, or Word simply closed unexpectedly. OneNote 2024 users will also get access to the new inking and drawing experience.

Microsoft says Office 2024 will require a Microsoft account and an internet connection, but if it’s anything like Office 2021 then you’ll only need an internet connection to install the suite, activate it, and get any security updates. Office 2024 will run on Windows 10 and 11 as well as the three most recent releases of macOS.

Advertisement

Office 2024 will be available in two different editions. Office Home 2024, priced at $149.99, includes Word, Excel, PowerPoint, and OneNote for PC or Mac. If you want Outlook, you’ll need to purchase the $249.99 Office Home and Business 2024 version, which also includes the rights to use the apps for commercial purposes.

Source link

Continue Reading

Servers computers

Fujitsu PRIMERGY BX900 Blade Server Enclosure Forefront Technologies

Published

on

Fujitsu PRIMERGY BX900 Blade Server Enclosure Forefront Technologies

source

Continue Reading

Technology

Forget AI — most UK firms just want to hire basic IT skills

Published

on

office workers in pods

Despite ongoing interest surrounding artificial intelligence technologies embedded into work environments, UK businesses are still prioritizing hiring workers with basic technical skills.

New research by Indeed found only 2.6% of job postings in the UK mentioned AI skills, with basic skills like Microsoft Office and generic IT expertise coming up more frequently.

Source link

Continue Reading

Servers computers

Networking Equipment Racks – How Do They Work?

Published

on

Networking Equipment Racks - How Do They Work?



Why do we need Networking Equipment Racks?
►► Grab My FREE Beginner Networking Course – https://acenetworker.com/networking
🎓 MY FREE TRAINING 🎓

How do they work and what size(s) are needed? Showing you some of the basics you’ll need to know when you get into the networking industry. .

source

Continue Reading

Trending

Copyright © 2024 WordupNews.com