- OpenAI has officially launched it’s first AI Agent: Operator
- It’s works within a web browser to complete tasks for you, and is out now as a limited research preview
- Operator can make a dinner reservation, fill out a form, and complete other web tasks
OpenAI is always looking for the next big thing to add to ChatGPT, and after months of rumors, including a report from earlier this week that teased a launch, the technology giant’s first AI Agent is here. Operator is designed to complete web tasks for you, all with a touch of a button.
Essentially, Operator is a Computer Using Agent (CUA) that uses GPT-4o’s visual skills to browse and search the web. This means that it can understand the context of what to search for, and thanks to its multi-modality, it understands what it sees as it searches. It’s available now as a research preview for ChatGPT Pro subscribers in the United States.
Operator is described as “an agent that can use its own browser to perform tasks for you.” OpenAI released a demo showing Operator browsing the web as we (that is, we humans) do. You might ask Operator to book a dinner reservation for you, fill out an arduously long form, order groceries from a service, or even book a flight. It can use OpenTable to find and book a reservation at a restaurant, as shown in the demo. Operator will even walk you through its steps.
Operator is a ‘research preview,’ so know that it’s in its early days. OpenAI does impose some limitations. We haven’t had the chance to go hands-on yet, but it certainly looks impressive. This is OpenAI’s first entry into the world of AI agents, which will likely be the theme of the year in the realm of artificial intelligence.
OpenAI writes in a blog post announcing Operator that it “is one of our first agents, which are AIs capable of doing work for you independently—you give it a task and it will execute it.” This hints that not only are there other agents in the pipeline – Altman confirmed this during the live demo – but that they’re all based around the notion of doing things for you – a big step in the quest to make AI even more helpful, giving us some time back.
Operator is powered by the new Computer Using Agent (CUA) model, which pairs GPT4o’s vision skills with advanced reasoning. This all comes together to let Operator understand and use elements within a browser – the search bar, various buttons, and on-screen content.
OpenAI explains that “Operator can ‘see’ (through screenshots) and ‘interact’ (using all the actions a mouse and keyboard allow) with a browser,” allowing it to functionally use a browser to complete a task. That’s pretty neat, especially if it works at a high rate of success, and according to the blog post, it can self-correct.
However, as with most new AI tools and skills, it will likely take some time for this to become truly useful in the real world. That will also require OpenAI to open it up to more folks, though as an early research preview it’s still certainly an impressive demo.
For now, if you’re in the United States and subscribed to ChatGPT Pro, you can try it out on OpenAI’s website. OpenAI CEO Sam Altman teased that it would eventually arrive in other countries and be added to the ChatGPT Plus subscription. As we remember from some of the announcements from 12 Days of OpenAI, Europe will likely take a bit longer.
You must be logged in to post a comment Login