Connect with us

Technology

Meta Introduces Spirit LM open source model that combines text and speech inputs/outputs

Published

on

Meta Introduces Spirit LM open source model that combines text and speech inputs/outputs

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Just in time for Halloween 2024, Meta has unveiled Meta Spirit LM, the company’s first open-source multimodal language model capable of seamlessly integrating text and speech inputs and outputs.

As such, it competes directly with OpenAI’s GPT-4o (also natively multimodal) and other multimodal models such as Hume’s EVI 2, as well as dedicated text-to-speech and speech-to-text offerings such as ElevenLabs.

Designed by Meta’s Fundamental AI Research (FAIR) team, Spirit LM aims to address the limitations of existing AI voice experiences by offering a more expressive and natural-sounding speech generation, while learning tasks across modalities like automatic speech recognition (ASR), text-to-speech (TTS), and speech classification.

Advertisement

Unfortunately for entrepreneurs and business leaders, the model is only currently available for non-commercial usage under Meta’s FAIR Noncommercial Research License, which e grants users the right to use, reproduce, modify, and create derivative works of the Meta Spirit LM models, but only for noncommercial purposes. Any distribution of these models or derivatives must also comply with the noncommercial restriction.

A new approach to text and speech

Traditional AI models for voice rely on automatic speech recognition to process spoken input before synthesizing it with a language model, which is then converted into speech using text-to-speech techniques.

While effective, this process often sacrifices the expressive qualities inherent to human speech, such as tone and emotion. Meta Spirit LM introduces a more advanced solution by incorporating phonetic, pitch, and tone tokens to overcome these limitations.

Meta has released two versions of Spirit LM:

Advertisement

Spirit LM Base: Uses phonetic tokens to process and generate speech.

Spirit LM Expressive: Includes additional tokens for pitch and tone, allowing the model to capture more nuanced emotional states, such as excitement or sadness, and reflect those in the generated speech.

Both models are trained on a combination of text and speech datasets, allowing Spirit LM to perform cross-modal tasks like speech-to-text and text-to-speech, while maintaining the natural expressiveness of speech in its outputs.

Open-source noncommercial — only available for research

In line with Meta’s commitment to open science, the company has made Spirit LM fully open-source, providing researchers and developers with the model weights, code, and supporting documentation to build upon.

Advertisement

Meta hopes that the open nature of Spirit LM will encourage the AI research community to explore new methods for integrating speech and text in AI systems.

The release also includes a research paper detailing the model’s architecture and capabilities.

Mark Zuckerberg, Meta’s CEO, has been a strong advocate for open-source AI, stating in a recent open letter that AI has the potential to “increase human productivity, creativity, and quality of life” while accelerating advancements in areas like medical research and scientific discovery.

Applications and future potential

Meta Spirit LM is designed to learn new tasks across various modalities, such as:

Advertisement

Automatic Speech Recognition (ASR): Converting spoken language into written text.

Text-to-Speech (TTS): Generating spoken language from written text.

Speech Classification: Identifying and categorizing speech based on its content or emotional tone.

The Spirit LM Expressive model goes a step further by incorporating emotional cues into its speech generation.

Advertisement

For instance, it can detect and reflect emotional states like anger, surprise, or joy in its output, making the interaction with AI more human-like and engaging.

This has significant implications for applications like virtual assistants, customer service bots, and other interactive AI systems where more nuanced and expressive communication is essential.

A broader effort

Meta Spirit LM is part of a broader set of research tools and models that Meta FAIR is releasing to the public. This includes an update to Meta’s Segment Anything Model 2.1 (SAM 2.1) for image and video segmentation, which has been used across disciplines like medical imaging and meteorology, and research on enhancing the efficiency of large language models.

Meta’s overarching goal is to achieve advanced machine intelligence (AMI), with an emphasis on developing AI systems that are both powerful and accessible.

Advertisement

The FAIR team has been sharing its research for more than a decade, aiming to advance AI in a way that benefits not just the tech community, but society as a whole. Spirit LM is a key component of this effort, supporting open science and reproducibility while pushing the boundaries of what AI can achieve in natural language processing.

What’s next for Spirit LM?

With the release of Meta Spirit LM, Meta is taking a significant step forward in the integration of speech and text in AI systems.

By offering a more natural and expressive approach to AI-generated speech, and making the model open-source, Meta is enabling the broader research community to explore new possibilities for multimodal AI applications.

Whether in ASR, TTS, or beyond, Spirit LM represents a promising advance in the field of machine learning, with the potential to power a new generation of more human-like AI interactions.

Advertisement

Source link
Advertisement
Continue Reading
Advertisement
Click to comment

You must be logged in to post a comment Login

Leave a Reply

Technology

ChatGPT’s Canvas now shows tracked changes

Published

on

ChatGPT's Canvas now shows tracked changes

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


ChatGPT’s Canvas feature allows users to edit the chatbot’s responses on the app rather than copying and pasting them to a separate document. 

However, when Canvas launched in early October for its paid tiers, it didn’t let people see what changes GPT-4o made to its responses. OpenAI’s latest update to the feature corrects that. 

The show changes button will show the most recent changes to either the generated text or code on Canvas. It will highlight added information in green and deleted sections in red. 

Tracking changes has always been a good feature of any editing platform; Google Docs and Word documents offer a toggle for users to check what’s been changed. But OpenAI had been planning to roll out updates to Canvas slowly as ChatGPT subscribers get used to it. 

Advertisement

Canvas already offers familiar features like comments, where users can add suggestions or give more instructions for the AI model to follow when editing responses. 

Canvas is still only available on the web version of ChatGPT for ChatGPT Plus, Teams, Enterprise and Edu users. Mac app users and anyone downloading the recently released Windows version of ChatGPT will have to wait until Canvas is rolled out to these standalone apps. 

Currently, people can access Canvas on the regular ChatGPT window rather than in any custom GPTs. 

A much requested feature

OpenAI’s developer X account acknowledged that developer customers have requested a track or show change feature since Canvas launched. 

Advertisement

But while many developers said this was a step in the right direction, Canvas still doesn’t immediately connect to code repositories like GitHub or let users visually see how the edited code works. 

This is one area where ChatGPT competitor Claude from Anthropic and its Artifacts feature excels. Artifacts function much like Canvas; users can begin a prompt on the Claude chat interface. 

When users launch Artifacts, a dedicated window opens where they can manipulate the model’s responses. Artifacts let users replicate websites using the code Claude just generated and edited, so developers can see not only which lines of code have changed but also whether it worked. Artifacts are now available to all Claude users, including those on mobile devices. 

Canvas and Artifacts represent what could be the next phase in the evolution of AI chat platforms and assistants. The Interface War could see other platforms begin to explore how to keep users in the platform instead of opening other dedicated windows for different tasks. 

Advertisement

Source link
Advertisement
Continue Reading

Technology

In Shinichirō Watanabe’s Lazarus, parkour is the only way to go

Published

on

In Shinichirō Watanabe’s Lazarus, parkour is the only way to go

Along with giving New York Comic Con attendees a chance to see Lazarus’ first episode, Adult Swim revealed today that Koichi Yamadera and Megumi Hayashibara — the Japanese actors who voiced Cowboy Bebop’s Spike Spiegel and Faye Valentine — are part of the show’s cast. Yamadera and Hayashibara are joined by Mamoru Miyano (Death Note), Maaya Uchida (Chainsaw Man), Yuma Uchida (Jujutsu Kaisen), Makoto Furukawa (Kaiju No. 8), Manaka Iwami (Phantom of the Idol), and Akio Otsuka (Spy x Family).

Adult Swim also posted a short clip from Lazarus introducing Axel Gilberto (Miyano) a young fugitive who’s knack for breaking out of prison and evading police makes him the perfect candidate to join team Lazarus. With the entire world plunged into panic about a popular wonder drug that’s about to start killing people, Axel and Lazarus’ other operatives are the only people who might be able to find a solution.

The clip does’t really clue you into just what Axel will bring to the team when Lazarus premieres some time in 2025, but it does show off how useful his parkour skills will be as he sets out to save the world.

Source link

Continue Reading

Technology

Yes! You can finally buy 26TB hard disk drives, two years after launch, but only in packs of 20 for $9100, and you will probably need a data center to run them

Published

on

Yes! You can finally buy 26TB hard disk drives, two years after launch, but only in packs of 20 for $9100, and you will probably need a data center to run them

Back in May 2022, Western Digital unveiled its 22TB CMR and 26TB UltraSMR hard disk drives, the latter of which achieved its high capacity through the use of large block encoding and an advanced error correction algorithm to increase track-per-inch (TPI) density.

The 26TB Ultrastar DC HC670 UltraSMR HDD is a 3.5-inch hard drive featuring a SATA or SAS interface with a transfer rate of up to 261MB/s. It operates at 7200 RPM using SMR (Shingled Magnetic Recording) technology and includes a 512MB cache for improved performance. Built with Western Digital’s EAMR, TSA, and HelioSeal technologies, it’s optimized for sequential write applications and is ideal for use in data centers.

Source link

Continue Reading

Technology

NYT Crossword: answers for Friday, October 18

Published

on

NYT Crossword: answers for Monday, September 23


The New York Times crossword puzzle can be tough! If you’re stuck, we’re here to help with a list of today’s clues and answers.

Source link

Continue Reading

Technology

You can stream Assassin’s Creed Mirage on GeForce NOW today

Published

on

Featured image for You can stream Assassin

GeForce NOW is one of several subscription services that revolve around gaming that you can throw your money at every month, and it’s because NVIDIA adds games like Assassin’s Creed Mirage that make that worth doing. This week is sort of a big one for the service as there are two or three big games that are being added. In total, there are ten new games joining the lineup, some of them new, some of them not.

Although if you really think about it, any game is new if it’s new to you, right? Whether you’re a new subscriber or you’ve been around a while, there’s nothing quite like picking up a new game and falling in love with it. You can do just that this week with a multitude of options across a handful of genres. Do you like mech games? Good news. GeForce NOW has a new mech game for you to play in the cloud this week.

Are action-adventure titles more your speed? Well, you have options there too. Whatever you like when it comes to games, GeForce NOW has something for you.

Assassin’s Creed Mirage is now playable via GeForce NOW

Assassin’s Creed Mirage is probably one of the best games in the franchise in a while. Taking some of the elements of its recent predecessors and mixing them with the gameplay mechanics from the beloved older titles in the franchise. It’s also considerably shorter than say, Assassin’s Creed Valhalla. So if you want a more condensed experience with an AC game, you should definitely give Mirage a shot.

Advertisement

And now you can do that more easily than ever because you can stream it in the cloud on GeForce NOW. This is for the Steam version of the game, as it was already available to stream if you owned it on Ubisoft Connect. So the good thing about this week’s addition of Mirage is that Steam loyalists who prefer not to stray from Gabe Newel’s digital game haven can play the game in the cloud now too.

Alongside Assassin’s Creed Mirage, GeForce NOW subscribers can also now stream Neva, MechWarrior 5: Clans, A Quiet Place: The Road Ahead, Artisan TD, ASKA, Dungeon Tycoon, South Park: The Fractured But Whole, Spirit City: Lofi Sessions, and Star Truck.

GeForce NOW Dragon Age The Veilguard

Dragon Age: The Veilguard will be streamable on October 31

Dragon Age fans can look forward to streaming the latest game in the series when it launches later this month. It officially releases on October 31 so there are only a couple more weeks before you can dive into it.

If you haven’t already pre-ordered the game, NVIDIA is offering a pretty sweet deal for members. If you subscribe to GeForce NOW’s Ultimate membership for a six-month subscription, you’ll be given a free copy of Dragon Age: The Veilguard. Now, a six-month Ultimate membership will set you back $99.99. That being said, the standard version of Dragon Age: The Veilguard will cost you $70. So another $30 and you’ll get six months of the best option GeForce NOW offers, and you get the game for free.

Advertisement

This is an even better value for anyone who is planning to pick up the Deluxe Edition or above.

Source link

Continue Reading

Science & Environment

What will space exploration be like in 50 years?

Published

on

What will space exploration be like in 50 years?


Day trips to the Moon, living on Mars, space elevators… when it comes to the future of space exploration, some possibilities might be closer than we think!

Made by BBC Ideas in partnership with the Royal Society, external

Animated by Jess Mountfield, narrated by Dr Becky Smethurst at the University of Oxford

💡 Watch more videos at BBC Ideas

Advertisement



Source link

Continue Reading

Trending

Copyright © 2024 WordupNews.com