Amazon races to transplant Alexa’s ‘brain’ with generative AI

Estimated read time 7 min read

Amazon is gearing up to relaunch its Alexa voice-powered digital assistant as an artificial intelligence “agent” that can complete practical tasks, as the tech group races to resolve the challenges that have dogged the system’s AI overhaul.

The $2.4tn company has for the past two years sought to redesign Alexa, its conversational system embedded within 500mn consumer devices worldwide, so the software’s “brain” is transplanted with generative AI. 

Rohit Prasad, who leads the artificial general intelligence (AGI) team at Amazon, told the Financial Times the voice assistant still needed to surmount several technical hurdles before the rollout.

This includes solving the problem of “hallucinations” or fabricated answers, its response speed or “latency”, and reliability. “Hallucinations have to be close to zero,” said Prasad. “It’s still an open problem in the industry, but we are working extremely hard on it.” 

The vision of Amazon’s leaders is to transform Alexa, which is currently still used for a narrow set of simple tasks such as playing music and setting alarms, to an “agentic” product that acts as a personalised concierge. This could include anything from suggesting restaurants to configuring the lights in the bedroom based on a person’s sleep cycles.

Alexa’s redesign has been in train since the launch of OpenAI’s ChatGPT, backed by Microsoft, in late 2022. While Microsoft, Google, Meta and others have quickly embedded generative AI into their computing platforms and enhanced their software services, critics have questioned whether Amazon can resolve its technical and organisational struggles in time to compete with its rivals.

According to multiple staffers who have worked on Amazon’s voice assistant teams in recent years, its effort has been beset with complications and follows years of AI research and development.

Several former workers said the long wait for a rollout was largely due to the unexpected difficulties involved in switching and combining the simpler, predefined algorithms Alexa was built on, with more powerful but unpredictable large language models. 

In response, Amazon said it was “working hard to enable even more proactive and capable assistance” of its voice assistant. It added that a technical implementation of this scale, into a live service and suite of devices used by customers around the world, was unprecedented, and not as simple as overlaying a LLM on to the Alexa service.

Prasad, the former chief architect of Alexa, said last month’s release of the company’s in-house Amazon Nova models — led by his AGI team — was in part motivated by the specific needs for optimum speed, cost and reliability, in order to help AI applications such as Alexa “get to that last mile, which is really hard”. 

To operate as an agent, Alexa’s “brain” has to be able to call hundreds of third-party software and services, Prasad said.

“Sometimes we underestimate how many services are integrated into Alexa, and it’s a massive number. These applications get billions of requests a week, so when you’re trying to make reliable actions happen at speed . . . you have to be able to do it in a very cost-effective way,” he added. 

The complexity comes from Alexa users expecting quick responses as well as extremely high levels of accuracy. Such qualities are at odds with the inherent probabilistic nature of today’s generative AI, a statistical software that predicts words based on speech and language patterns.

Some former staff also point to struggles to preserve the assistant’s original attributes, including its consistency and functionality, while imbuing it with new generative features such as creativity and free-flowing dialogue. 

Because of the more personalised, chatty nature of LLMs, the company also plans to hire experts to shape the AI’s personality, voice and diction so it remains familiar to Alexa users, according to one person familiar with the matter.

One former senior member of the Alexa team said while LLMs were very sophisticated, they come with risks, such as producing answers that are “completely invented some of the time”. 

“At the scale that Amazon operates, that could happen large numbers of times per day,” they said, damaging its brand and reputation.

In June, Mihail Eric, a former machine learning scientist at Alexa and founding member of its “conversational modelling team”, said publicly that Amazon had “dropped the ball” on becoming “the unequivocal market leader in conversational AI” with Alexa.

Eric said despite having strong scientific talent and “huge” financial resources, the company had been “riddled with technical and bureaucratic problems”, suggesting “data was poorly annotated” and “documentation was either non-existent or stale”. 

According to two former employees working on Alexa-related AI, the historic technology underpinning the voice assistant had been inflexible and difficult to change quickly, weighed down by a clunky and disorganised code base and an engineering team “spread too thin”.

The original Alexa software, built on top of technology acquired from British start-up Evi in 2012, was a question-answering machine that worked by searching within a defined universe of facts to find the right response, such as the day’s weather or a specific song in your music library.

The new Alexa uses a bouquet of different AI models to recognise and translate voice queries and generate responses, as well as to identify policy violations, such as picking up inappropriate responses and hallucinations. Building software to translate between the legacy systems and the new AI models has been a major obstacle in the Alexa-LLM integration.

The models include Amazon’s own in-house software, including the latest Nova models, as well as Claude, the AI model from start-up Anthropic, in which Amazon has invested $8bn over the course of the past 18 months. 

“[T]he most challenging thing about AI agents is making sure they’re safe, reliable and predictable,” Anthropic’s chief executive Dario Amodei told the FT last year.

Agent-like AI software needs to get to the point “where . . . people can actually have trust in the system”, he added. “Once we get to that point, then we’ll release these systems.”

One current employee said more steps were still needed, such as overlaying child safety filters and testing custom integrations with Alexa such as smart lights and the Ring doorbell.

“The reliability is the issue — getting it to be working close to 100 per cent of the time,” the employee added. “That’s why you see us . . . or Apple or Google shipping slowly and incrementally.” 

Numerous third parties developing “skills” or features for Alexa said they were unsure when the new generative AI-enabled device would be rolled out and how to create new functions for it.

“We’re waiting for the details and understanding,” said Thomas Lindgren, co-founder of Swedish content developer Wanderword. “When we started working with them they were a lot more open . . . then with time, they’ve changed.”

Another partner said after an initial period of “pressure” that was put on developers by Amazon to start getting ready for the next generation of Alexa, things had gone quiet. 

An enduring challenge for Amazon’s Alexa team — which was hit by major lay-offs in 2023 — is how to make money. Figuring out how to make the assistants “cheap enough to run at scale” will be a major task, said Jared Roesch, co-founder of generative AI group OctoAI.

Options being discussed include creating a new Alexa subscription service, or to take a cut of sales of goods and services, said a former Alexa employee.

Prasad said Amazon’s goal was to create a variety of AI models that could act as the “building blocks” for a variety of applications beyond Alexa. 

“What we are always grounded on is customers and practical AI, we are not doing science for the sake of science,” Prasad said. “We are doing this . . . to deliver customer value and impact, which in this era of generative AI is becoming more important than ever because customers want to see a return on investment.” 

Source link

You May Also Like

More From Author

+ There are no comments

Add yours