Connect with us

Technology

Meta unveils AI tools to give robots a human touch in physical world

Published

on

Meta unveils AI tools to give robots a human touch in physical world

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Meta made several major announcements for robotics and embodied AI systems this week. This includes releasing benchmarks and artifacts for better understanding and interacting with the physical world. Sparsh, Digit 360 and Digit Plexus, the three research artifacts released by Meta, focus on touch perception, robot dexterity and human-robot interaction. Meta is also releasing PARTNR a new benchmark for evaluating planning and reasoning in human-robot collaboration.

The release comes as advances in foundational models have renewed interest in robotics, and AI companies are gradually expanding their race from the digital realm to the physical world.

There is renewed hope in the industry that with the help of foundation models such as large language models (LLMs) and vision-language models (VLMs), robots can accomplish more complex tasks that require reasoning and planning.

Advertisement

Tactile perception

Sparsh, which was created in collaboration with the University of Washington and Carnegie Mellon University, is a family of encoder models for vision-based tactile sensing. It is meant to provide robots with touch perception capabilities. Touch perception is crucial for robotics tasks, such as determining how much pressure can be applied to a certain object to avoid damaging it. 

The classic approach to incorporating vision-based tactile sensors in robot tasks is to use labeled data to train custom models that can predict useful states. This approach does not generalize across different sensors and tasks.

Meta Sparsh architecture Credit: Meta

Meta describes Sparsh as a general-purpose model that can be applied to different types of vision-based tactile sensors and various tasks. To overcome the challenges faced by previous generations of touch perception models, the researchers trained Sparsh models through self-supervised learning (SSL), which obviates the need for labeled data. The model has been trained on more than 460,000 tactile images, consolidated from different datasets. According to the researchers’ experiments, Sparsh gains an average 95.1% improvement over task- and sensor-specific end-to-end models under a limited labeled data budget. The researchers have created different versions of Sparsh based on various architectures, including Meta’s I-JEPA and DINO models.

Touch sensors

In addition to leveraging existing data, Meta is also releasing hardware to collect rich tactile information from the physical. Digit 360 is an artificial finger-shaped tactile sensor with more than 18 sensing features. The sensor has over 8 million taxels for capturing omnidirectional and granular deformations on the fingertip surface. Digit 360 captures various sensing modalities to provide a richer understanding of the environment and object interactions. 

Digit 360 also has on-device AI models to reduce reliance on cloud-based servers. This enables it to process information locally and respond to touch with minimal latency, similar to the reflex arc in humans and animals.

Advertisement
Meta Digit 360 Credit: Meta

“Beyond advancing robot dexterity, this breakthrough sensor has significant potential applications from medicine and prosthetics to virtual reality and telepresence,” Meta researchers write.

Meta is publicly releasing the code and designs for Digit 360 to stimulate community-driven research and innovation in touch perception. But as in the release of open-source models, it has much to gain from the potential adoption of its hardware and models. The researchers believe that the information captured by Digit 360 can help in the development of more realistic virtual environments, which can be big for Meta’s metaverse projects in the future.

Meta is also releasing Digit Plexus, a hardware-software platform that aims to facilitate the development of robotic applications. Digit Plexus can integrate various fingertip and skin tactile sensors onto a single robot hand, encode the tactile data collected from the sensors, and transmit them to a host computer through a single cable. Meta is releasing the code and design of Digit Plexus to enable researchers to build on the platform and advance robot dexterity research.

Meta will be manufacturing Digit 360 in partnership with tactile sensor manufacturer GelSight Inc. They will also partner with South Korean robotics company Wonik Robotics to develop a fully integrated robotic hand with tactile sensors on the Digit Plexus platform.

Evaluating human-robot collaboration

Meta is also releasing Planning And Reasoning Tasks in humaN-Robot collaboration (PARTNR), a benchmark for evaluating the effectiveness of AI models when collaborating with humans on household tasks. 

Advertisement

PARTNR is built on top of Habitat, Meta’s simulated environment. It includes 100,000 natural language tasks in 60 houses and involves more than 5,800 unique objects. The benchmark is designed to evaluate the performance of LLMs and VLMs in following instructions from humans. 

Meta’s new benchmark joins a growing number of projects that are exploring the use of LLMs and VLMs in robotics and embodied AI settings. In the past year, these models have shown great promise to serve as planning and reasoning modules for robots in complex tasks. Startups such as Figure and Covariant have developed prototypes that use foundation models for planning. At the same time, AI labs are working on creating better foundation models for robotics. An example is Google DeepMind’s RT-X project, which brings together datasets from various robots to train a vision-language-action (VLA) model that generalizes to various robotics morphologies and tasks.


Source link
Continue Reading
Advertisement
Click to comment

You must be logged in to post a comment Login

Leave a Reply

Technology

AI on your smartphone? Hugging Face’s SmolLM2 brings powerful models to the palm of your hand

Published

on

AI on your smartphone? Hugging Face’s SmolLM2 brings powerful models to the palm of your hand

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Hugging Face today has released SmolLM2, a new family of compact language models that achieve impressive performance while requiring far fewer computational resources than their larger counterparts.

The new models, released under the Apache 2.0 license, come in three sizes — 135M, 360M and 1.7B parameters — making them suitable for deployment on smartphones and other edge devices where processing power and memory are limited. Most notably, the 1.7B parameter version outperforms Meta’s Llama 1B model on several key benchmarks.

Performance comparison shows SmolLM2-1B outperforming larger rival models on most cognitive benchmarks, with particularly strong results in science reasoning and commonsense tasks. Credit: Hugging Face

Small models pack a powerful punch in AI performance tests

“SmolLM2 demonstrates significant advances over its predecessor, particularly in instruction following, knowledge, reasoning and mathematics,” according to Hugging Face’s model documentation. The largest variant was trained on 11 trillion tokens using a diverse dataset combination including FineWeb-Edu and specialized mathematics and coding datasets.

This development comes at a crucial time when the AI industry is grappling with the computational demands of running large language models (LLMs). While companies like OpenAI and Anthropic push the boundaries with increasingly massive models, there’s growing recognition of the need for efficient, lightweight AI that can run locally on devices.

Advertisement

The push for bigger AI models has left many potential users behind. Running these models requires expensive cloud computing services, which come with their own problems: slow response times, data privacy risks and high costs that small companies and independent developers simply can’t afford. SmolLM2 offers a different approach by bringing powerful AI capabilities directly to personal devices, pointing toward a future where advanced AI tools are within reach of more users and companies, not just tech giants with massive data centers.

A comparison of AI language models shows SmolLM2’s superior efficiency, achieving higher performance scores with fewer parameters than larger rivals like Llama3.2 and Gemma, where the horizontal axis represents the model size and the vertical axis shows accuracy on benchmark tests. Credit: Hugging Face

Edge computing gets a boost as AI moves to mobile devices

SmolLM2’s performance is particularly noteworthy given its size. On the MT-Bench evaluation, which measures chat capabilities, the 1.7B model achieves a score of 6.13, competitive with much larger models. It also shows strong performance on mathematical reasoning tasks, scoring 48.2 on the GSM8K benchmark. These results challenge the conventional wisdom that bigger models are always better, suggesting that careful architecture design and training data curation may be more important than raw parameter count.

The models support a range of applications including text rewriting, summarization and function calling. Their compact size enables deployment in scenarios where privacy, latency or connectivity constraints make cloud-based AI solutions impractical. This could prove particularly valuable in healthcare, financial services and other industries where data privacy is non-negotiable.

Industry experts see this as part of a broader trend toward more efficient AI models. The ability to run sophisticated language models locally on devices could enable new applications in areas like mobile app development, IoT devices, and enterprise solutions where data privacy is paramount.

The race for efficient AI: Smaller models challenge industry giants

However, these smaller models still have limitations. According to Hugging Face’s documentation, they “primarily understand and generate content in English” and may not always produce factually accurate or logically consistent output.

Advertisement

The release of SmolLM2 suggests that the future of AI may not solely belong to increasingly large models, but rather to more efficient architectures that can deliver strong performance with fewer resources. This could have significant implications for democratizing AI access and reducing the environmental impact of AI deployment.

The models are available immediately through Hugging Face’s model hub, with both base and instruction-tuned versions offered for each size variant.


Source link
Continue Reading

Technology

An Okta login bug bypassed checking passwords on some long usernames

Published

on

An Okta login bug bypassed checking passwords on some long usernames
Illustration of a password above an open combination lock, implying a data breach.
Illustration by Cath Virginia / The Verge | Photo from Getty Images

On Friday evening, Okta posted an odd update to its list of security advisories. The latest entry reveals that under specific circumstances, someone could’ve logged in by entering anything for a password, but only if the account’s username had over 52 characters.

According to the note people reported receiving, other requirements to exploit the vulnerability included Okta checking the cache from a previous successful login, and that an organization’s authentication policy didn’t add extra conditions like requiring multi-factor authentication (MFA).

Here are the details that are currently available:

On October 30, 2024, a vulnerability was internally identified in generating the cache key for AD/LDAP DelAuth. The Bcrypt algorithm was…

Continue reading…

Source link

Advertisement

Continue Reading

Technology

NYT Strands today — hints, answers and spangram for Saturday, November 2 (game #244)

Published

on

NYT Strands homescreen on a mobile phone screen, on a light blue background

Strands is the NYT’s latest word game after the likes of Wordle, Spelling Bee and Connections – and it’s great fun. It can be difficult, though, so read on for my Strands hints.

Want more word-based fun? Then check out my Wordle today, NYT Connections today and Quordle today pages for hints and answers for those games.

Source link

Continue Reading

Technology

What’s new on Apple TV+ this month (November 2024)

Published

on

What's new on Apple TV+ this month (November 2024)

Due to its unique model that includes only original content, Apple TV+ tends to have a very slim new release slate. However, just about every Apple TV+ release features A-list talent, and it has set a high bar for quality. Just look at Best Picture winner CODA and Emmy-winning drama Severance (returning in January).

This month is no exception, as there are only four new additions to the library in November. We’ve highlighted the two most anticipated, but don’t overlook Season 2 of the critically acclaimed comedy Bad Sisters or the Malala Yousafzai and Jennifer Lawrence documentary Bread & Roses.

There are only a few new arrivals each month to Apple TV+, but they’re usually all worth at least a glance. This month is no exception. Read on for everything coming to Apple TV+ in October 2024.

Looking for more content? Check out our guides on the best new shows to stream, the best shows on Apple TV+, the best shows on Netflix, and the best shows on Hulu.

Need more suggestions?

Our top picks for November

Everything new on Apple TV+ in November

November 13

November 15

November 22

Last month’s top picks






Source link

Advertisement

Continue Reading

Technology

Google could add album art to ‘Now Playing’ on Pixel phones

Published

on

Google could add album art to ‘Now Playing’ on Pixel phones

Google may upgrade the “Now Playing” feature by adding the much-needed album art to the history page. Now Playing has been able to identify songs with a high degree of accuracy, but the list only included the name of the song and the artist.

Now Playing is constantly operating in the background, but only for music

Introduced way back in 2017 along with the Pixel 2, the Now Playing feature has remained exclusive to the Google Pixel phones. It essentially identifies songs that are playing nearby and works well even on the latest Pixel 9 devices.

Apps like Shazam have been recognizing music and songs for quite some time. However, Now Playing has some tricks for the Pixel phones. Now Playing works entirely in the background. Pixel users don’t even need to pull out their phones.

While working in the background, Now Playing relies on the low-power efficiency cores to continuously analyze audio through the microphone. If it picks up audio that seems like music or a song, Now Playing requests the performance cores to record a few seconds of the audio.

Advertisement

Now Playing then matches the recorded audio on a database containing tens of thousands of fingerprints of the most popular songs in a particular region. After processing and matching, Now Playing displays the name and artist of the song on the lock screen as well as in a notification.

Needless to say, Now Playing is fairly accurate. However, the list of songs it recognizes contains only the name of the song, the artist, and a timestamp.

Google’s Now Playing feature for Pixel devices may get album art

The songs that Now Playing recognized are visible under Settings > Sound & vibration > Now Playing. The page lists the history of identified songs in reverse chronological order.

Although there’s an icon next to each song, Google has refused to append any album art to the songs Now Playing recognizes. According to Android Authority, this might change in the future.

Advertisement

The hidden system app that downloads the Now Playing database may soon also grab album art. The code change is titled “#AlbumArt Add Now Playing album art downloads to the network usage log”.

Google has yet to assign a dedicated online repository from where Now Playing will download album art for the songs it recognizes. However, Ambient Music Mod, an open-source port of Now Playing by developer Kieron Quinn, already has the feature. The reverse-engineered version essentially replaces the generic music note icon with album art.

Source link

Continue Reading

Technology

Disney forms dedicated AI and XR group to coordinate company-wide use and adoption

Published

on

Menu

Disney is adding another layer to its AI and extended reality strategies. As first reported by Reuters, the company recently formed a dedicated emerging technologies unit. Dubbed the Office of Technology Enablement, the group will coordinate the company’s exploration, adoption and use of artificial intelligence, AR and VR tech.

It has tapped Jamie Voris, previously the CTO of its Studios Technology division, to oversee the effort. Before joining Disney in 2010, Voris was the chief technology officer at the National Football League. More recently, he led the development of the company’s Apple Vision Pro app. Voris will report to Alan Bergman, the co-chairman of Disney Entertainment. Reuters reports the company eventually plans to grow the group to about 100 employees.

“The pace and scope of advances in AI and XR are profound and will continue to impact consumer experiences, creative endeavors, and our business for years to come — making it critical that Disney explore the exciting opportunities and navigate the potential risks,” Bergman wrote in an email Disney shared with Engadget. “The creation of this new group underscores our dedication to doing that and to being a positive force in shaping responsible use and best practices.”

A Disney spokesperson told Engadget the Office of Technology Enablement won’t take over any existing AI and XR projects at the company. Instead, it will support Disney’s other teams, many of which are already working on products that involve those technologies, to ensure their work fits into the company’s broader strategic goals.

Advertisement

“It is about bringing added focus, alignment, and velocity to those efforts, and about reinforcing our commitment being a positive force in shaping responsible use and best practices,” the spokesperson said.

It’s safe to say Disney has probably navigated the last two decades of technological change better than most of Hollywood. For instance, the company’s use of the Unreal Engine in conjunction with a digital set known as The Volume has streamlined the production of VFX-heavy shows like The Mandalorian. With extended reality and AI in particular promising tidal changes to how humans work and play, it makes sense to add some additional oversight to how those technologies are used at the company.

If you buy something through a link in this article, we may earn commission.

Source link

Advertisement

Continue Reading

Trending

Copyright © 2024 WordupNews.com