Technology

DeepMind’s SCoRe shows LLMs can use their internal knowledge to correct their mistakes

Published

4 hours ago

October 2, 2024

DeepMind's SCoRe shows LLMs can use their internal knowledge to correct their mistakes

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

While large language models (LLMs) are becoming increasingly effective at complicated tasks, there are many cases where they can’t get the correct answer on the first try. This is why there is growing interest in enabling LLMs to spot and correct their mistakes, also known as “self-correction.” However, current attempts at self-correction are limited and have requirements that often cannot be met in real-world situations.

In a new paper, researchers at Google DeepMind introduce Self-Correction via Reinforcement Learning (SCoRe), a novel technique that significantly improves the self-correction capabilities of LLMs using only self-generated data. SCoRe can be a valuable tool for making LLMs more robust and reliable and opens new possibilities for enhancing their reasoning and problem-solving abilities.

The importance of self-correction in LLMs

“Self-correction is a capability that greatly enhances human thinking,” Aviral Kumar, research scientist at Google DeepMind, told VentureBeat. “Humans often spend more time thinking, trying out multiple ideas, correcting their mistakes, to finally then solve a given challenging question, as opposed to simply in one-shot producing solutions for challenging questions. We would want LLMs to be able to do the same.”

Ideally, an LLM with strong self-correction capabilities should be able to review and refine its own answers until it reaches the correct response. This is especially important because LLMs often possess the knowledge needed to solve a problem internally but fail to use it effectively when generating their initial response.

“From a fundamental ML point of view, no LLM is expected to solve hard problems all within zero-shot using its memory (no human certainly can do this), and hence we want LLMs to spend more thinking computation and correct themselves to succeed on hard problems,” Kumar said.

Previous attempts at enabling self-correction in LLMs have relied on prompt engineering or fine-tuning models specifically for self-correction. These methods usually assume that the model can receive external feedback on the quality of the outputs or has access to an “oracle” that can guide the self-correction process.

These techniques fail to use the intrinsic self-correction capabilities of the model. Supervised fine-tuning (SFT) methods, which involve training a model to fix the mistakes of a base model, have also shown limitations. They often require oracle feedback from human annotators or stronger models and do not rely on the model’s own knowledge. Some SFT methods even require multiple models during inference to verify and refine the answer, which makes it difficult to deploy and use them.

Additionally, DeepMind’s research shows that while SFT methods can improve a model’s initial responses, they do not perform well when the model needs to revise its answers over multiple steps, which is often the case with complicated problems.

“It might very well happen that by the end of training the model will know how to fix the base model’s mistakes but might not have enough capabilities to detect its own mistakes,” Kumar said.

Another challenge with SFT is that it can lead to unintended behavior, such as the model learning to produce the best answer in the first attempt and not changing it in subsequent steps, even if it’s incorrect.

“We found behavior of SFT trained models largely collapses to this ‘direct’ strategy as opposed to learning how to self-correct,” Kumar said.

Self-correction through reinforcement learning

*DeepMind SCoRe framework (source: arXiv)*

To overcome the limitations of previous approaches, the DeepMind researchers turned to reinforcement learning (RL).

“LLMs today cannot do [self-correction], as is evident from prior studies that evaluate self-correction. This is a fundamental issue,” Kumar said. “LLMs are not trained to look back and introspect their mistakes, they are trained to produce the best response given a question. Hence, we started building methods for self-correction.”

SCoRe trains a single model to both generate responses and correct its own errors without relying on external feedback. Importantly, SCoRe achieves this by training the model entirely on self-generated data, eliminating the need for external knowledge.

Previous attempts to use RL for self-correction have mostly relied on single-turn interactions, which can lead to undesirable outcomes, such as the model focusing solely on the final answer and ignoring the intermediate steps that guide self-correction.

“We do see… ‘behavior collapse’ in LLMs trained to do self-correction with naive RL. It learned to simply ignore the instruction to self-correct and produce the best response out of its memory, in zero-shot, without learning to correct itself,” Kumar said.

To prevent behavior collapse, SCoRe uses a two-stage training process with regularization techniques. The first stage replaces SFT with a process that optimizes correction performance while ensuring that the model’s initial attempts remain close to the base model’s outputs.

The second stage employs multi-turn RL to optimize reward at both the initial and subsequent attempts while incorporating a reward bonus that encourages the model to improve its responses from the first to the second attempt.

“Both the initialization and the reward bonus ensure that the model cannot simply learn to produce the best first-attempt response and only minorly edit it,” the researchers write. “Overall, SCoRe is able to elicit knowledge from the base model to enable positive self-correction.”

SCoRe in action

The DeepMind researchers evaluated SCoRe against existing methods that use self-generated data for self-correction training. They focused on math and coding tasks, using benchmarks such as MATH, MBPP, and HumanEval.

DeepMind SCoRe vs other self-correct methods — *DeepMind SCoRe outperforms other self-correct methods in multi-step correction. it also learns to avoid switching correct answers during the correction phase (source: arXiv)*

The results showed that SCoRe significantly improved the self-correction capabilities of Gemini 1.0 Pro and 1.5 Flash models. For example, SCoRe achieved a 15.6% absolute gain in self-correction on the MATH benchmark and a 9.1% gain on the HumanEval benchmark in comparison to the base model, beating other self-correction methods by several percentage points.

The most notable improvement was in the model’s ability to correct its mistakes from the first to the second attempt. SCoRe also considerably reduced the instances where the model mistakenly changed a correct answer to an incorrect one, indicating that it learned to apply corrections only when necessary.

Furthermore, SCoRe proved to be highly efficient when combined with inference-time scaling strategies such as self-consistency. By splitting the same inference budget across multiple rounds of correction, SCoRe enabled further performance gains.

DeepMind SCoRe inference-time scaling — *SCoRe (green line) enables LLMs to make better use of inference-time scaling techniques (source: arXiv)*

While the paper primarily focuses on coding and reasoning tasks, the researchers believe that SCoRe can be beneficial for other applications as well.

“You could imagine teaching models to look back at their outputs that might potentially be unsafe and improve them all by themselves, before showing it to the user,” Kumar said.

The researchers believe that their work has broader implications for training LLMs and highlights the importance of teaching models how to reason and correct themselves rather than simply mapping inputs to outputs.

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat’s Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link

Servers computers

REKOMENDASI RAK SERVER 2023, INDORACK PASTINYA #indorack #rakserver #rackserver

Published

9 mins ago

October 2, 2024

NewsAdmin

REKOMENDASI RAK SERVER 2023, INDORACK PASTINYA #indorack #rakserver #rackserver

source

Technology

Hedosophia leads $7M seed round into retail supply chain AI startup Ameba

Published

9 mins ago

October 2, 2024

NewsAdmin

Hedosophia leads $7M seed round into retail supply chain AI startup Ameba

Traditional retailers have a pressing problem. Fast-moving like Shein and Temu are eating their lunch by leveraging purpose-built, end-to-end supply chains. Meanwhile, incumbent retailers are still stuck on legacy platforms, juggling a myriad number of data sets, and struggling to respond to a punishingly fast market.

A London-based startup thinks it has the solution to this problem. Ameba claims to be able to the unstructured data in a retailer’s supply chain systems, sprinkle in some generative AI, and make the whole thing more efficient.

The startup has now raised a $7.1 million seed round led by London-based VC firm Hedosophia, which has gained a reputation for rarely revealing which companies it invests in. TechCrunch reached out to the latter for further comment, but did not receive a response before publication.

Ameba’s platform uses generative AI on top of existing supply chain software to give retailers insights into their global supply chains, extracting data from a wide range of sources in order to predict disruptions and react to bottlenecks. The company claims it can reduce manual data input by 30%.

“In supply chains, particularly in the fashion consumer space, a lot of very important data is currently not being captured,” Ameba’s founder, Cedrik Hoffmann, told TechCrunch. “A lot of times, the things that are in the shops are sold at the wrong cost or they’re out of stock, or whatever.”

He said Ameba captures these unstructured data points that cost systems don’t: “We release that information from the information silos, bring them to a central source and surface the insights that are developed from them to the relevant parties within your organization.”

Co-founder Craig Massie said their underlying AI mixes a range of foundational models, including Open AI’s: “It changes depending on the task at hand and what performs best in our benchmarks for that task. The underlying constant across our AI usage is our multi-step agents — they can take actions, explore your ontology and its connections, read your supplier emails, WhatsApps and attachments.”

So far, British interior hardware and lighting company Plank has used Ameba to generate 140 alerts highlighting critical production and delivery delays that would have previously been missed or overlooked.

Before Ameba, Hoffman was the former supply chain director and co-founder of e-commerce company VALOREO, while Massie is a former Palantir engineer.

Also participating in the funding round were Visionaries Club, which previously led Ameba’s pre-seed round, and Anamcara Capital.

Isabella Yamamoto, principal at Visionaries Club, said in a statement, “After speaking to many supply chain owners, we were convinced that Cedrik and Craig had the experience to build a disruptive business using AI to eliminate fragmentation in supply chains and unlock competitive advantage for brands.”

Source link

Technology

Microsoft Office 2024 is now available for Macs and PCs

Published

32 mins ago

October 2, 2024

NewsAdmin

Microsoft Office 2024 is now available for Macs and PCs

Microsoft is releasing a new version of Office this week, designed for people that don’t want to subscribe to Microsoft 365. The standalone Microsoft Office 2024 release is now available for both consumers and small businesses, and includes locked-in-time versions of Word, Excel, PowerPoint, OneNote, and Outlook across both Mac and PC.

Office 2024 includes a lot of the updates that Microsoft has been delivering to Microsoft 365 subscribers over the past few years. Microsoft last released a standalone version of Office in 2021, and this new Office 2024 release includes improvements to the core apps, as well as accessibility and UI changes.

Office 2024 has a new default theme, with Microsoft’s latest Fluent Design principles that match the visual changes to Windows 11. Microsoft has also added accessibility-focused improvements to help Office users find potential accessibility issues in documents, slideshows, workbooks, and emails.

Excel 2024 can now reference Dynamic Arrays.

Image: Microsoft

The biggest changes in Office 2024 can be found in Excel, PowerPoint, and Outlook. Microsoft has added new functions in Excel to use text and arrays in worksheets, alongside a new IMAGE function that can pull pictures from the web. Excel 2024 can also now reference Dynamic Arrays in charts, which can automatically update rather than being fixed to set data points. Microsoft claims the overall speed and stability of Excel 2024 should also be improved.

In PowerPoint Microsoft has added the cameo feature, allowing you to insert a live camera feed into slides. PowerPoint also has a new recording studio feature that includes recording features for narration, animations, transitions, and inking. You can also add closed captions or subtitles to videos and audio files in slides, making presentations a lot more accessible.

Outlook 2024 has improvements to search.

Image: Microsoft

Outlook 2024 includes improvements to search so you get more relevant results for messages, attachments, contacts, and calendar entries. This latest Outlook release also includes more options for meetings, including the ability to automatically shorten them. Mac users can also customize swipe left and right gestures in Outlook.

In Word, Excel, and PowerPoint you can now insert a picture easily from an Android mobile device, and Microsoft is also supporting version 1.4 of the OpenDocument format (ODF) which includes a variety of new improvements. Word and PowerPoint also include the ability to like and react to comments in documents.

Word 2024 has an improved file recovery feature.

Image: Microsoft

Word 2024 users will also be able to recover a session if your PC crashes. Word will automatically open all the documents you had open before your PC crashed, you lost power, or Word simply closed unexpectedly. OneNote 2024 users will also get access to the new inking and drawing experience.

Microsoft says Office 2024 will require a Microsoft account and an internet connection, but if it’s anything like Office 2021 then you’ll only need an internet connection to install the suite, activate it, and get any security updates. Office 2024 will run on Windows 10 and 11 as well as the three most recent releases of macOS.

Office 2024 will be available in two different editions. Office Home 2024, priced at $149.99, includes Word, Excel, PowerPoint, and OneNote for PC or Mac. If you want Outlook, you’ll need to purchase the $249.99 Office Home and Business 2024 version, which also includes the rights to use the apps for commercial purposes.

Source link

Servers computers

Fujitsu PRIMERGY BX900 Blade Server Enclosure Forefront Technologies

Published

41 mins ago

October 2, 2024

NewsAdmin

Fujitsu PRIMERGY BX900 Blade Server Enclosure Forefront Technologies

source

Technology

Forget AI — most UK firms just want to hire basic IT skills

Published

53 mins ago

October 2, 2024

NewsAdmin

Despite ongoing interest surrounding artificial intelligence technologies embedded into work environments, UK businesses are still prioritizing hiring workers with basic technical skills.

New research by Indeed found only 2.6% of job postings in the UK mentioned AI skills, with basic skills like Microsoft Office and generic IT expertise coming up more frequently.

According to the report, the most common technical skills sought by UK employers include generic IT skills (10%), Microsoft Office (6%) and Microsoft Excel (5%). Moreover, demand for basic IT skills has remained pretty consistent over the past five years, both in the UK and in other markets like the US.

UK businesses need basic IT skills more than AI

Besides tech skills, Indeed found that UK employers are also prioritizing human skills like communication (30%), leadership (9%) and organization (7%).

Moreover, Indeed’s research into the current state of the UK jobs market tackles ongoing concern that AI could replace human workers. The analysis of over 2,800 work skills found that two-thirds (68.7%) are ‘very unlikely’ or ‘unlikely’ to be replaced by generative AI.

“While AI and other advanced technologies are likely to shape the future labour market, the current reality is that many employers are simply seeking workers with basic computer skills,” commented Indeed Senior Economist Jack Kennedy.

“While AI may eventually necessitate a broad upskilling across the workforce to embrace advanced technologies, there remains a more pressing concern around closing basic digital skills gaps and allowing everyone to fully engage with work in the digital age.”

This is despite Britain’s Prime Minister, Sir Keir Starmer’s ambitions to make the UK a global AI hub.

Despite the Prime Minister’s efforts, the current jobs market suggests businesses are not yet aligned with the vision, with employers still seeking fundamental tech skills and human competencies.