Connect with us

Technology

Microsoft’s agentic AI OmniParser rockets up open source charts

Published

on

Microsoft’s agentic AI OmniParser rockets up open source charts

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Microsoft’s OmniParser is on to something.

The new open source model that converts screenshots into a format that’s easier for AI agents to understand was released by Redmond earlier this month, but just this week became the number one trending model (as determined by recent downloads) on AI code repository Hugging Face.

It’s also the first agent-related model to do so, according to a post on X by Hugging Face’s co-founder and CEO Clem Delangue.

Advertisement

But what exactly is OmniParser, and why is it suddenly receiving so much attention?

At its core, OmniParser is an open-source generative AI model designed to help large language models (LLMs), particularly vision-enabled ones like GPT-4V, better understand and interact with graphical user interfaces (GUIs).

Released relatively quietly by Microsoft, OmniParser could be a crucial step toward enabling generative tools to navigate and understand screen-based environments. Let’s break down how this technology works and why it’s gaining traction so quickly.

What is OmniParser?

OmniParser is essentially a powerful new tool designed to parse screenshots into structured elements that a vision-language model (VLM) can understand and act upon. As LLMs become more integrated into daily workflows, Microsoft recognized the need for AI to operate seamlessly across varied GUIs. The OmniParser project aims to empower AI agents to see and understand screen layouts, extracting vital information such as text, buttons, and icons, and transforming it into structured data.

Advertisement

This enables models like GPT-4V to make sense of these interfaces and act autonomously on the user’s behalf, for tasks that range from filling out online forms to clicking on certain parts of the screen.

While the concept of GUI interaction for AI isn’t entirely new, the efficiency and depth of OmniParser’s capabilities stand out. Previous models often struggled with screen navigation, particularly in identifying specific clickable elements, as well as understanding their semantic value within a broader task. Microsoft’s approach uses a combination of advanced object detection and OCR (optical character recognition) to overcome these hurdles, resulting in a more reliable and effective parsing system.

The technology behind OmniParser

OmniParser’s strength lies in its use of different AI models, each with a specific role:

  • YOLOv8: Detects interactable elements like buttons and links by providing bounding boxes and coordinates. It essentially identifies what parts of the screen can be interacted with.
  • BLIP-2: Analyzes the detected elements to determine their purpose. For instance, it can identify whether an icon is a “submit” button or a “navigation” link, providing crucial context.
  • GPT-4V: Uses the data from YOLOv8 and BLIP-2 to make decisions and perform tasks like clicking on buttons or filling out forms. GPT-4V handles the reasoning and decision-making needed to interact effectively.

Additionally, an OCR module extracts text from the screen, which helps in understanding labels and other context around GUI elements. By combining detection, text extraction, and semantic analysis, OmniParser offers a plug-and-play solution that works not only with GPT-4V but also with other vision models, increasing its versatility.

Open-source flexibility

OmniParser’s open-source approach is a key factor in its popularity. It works with a range of vision-language models, including GPT-4V, Phi-3.5-V, and Llama-3.2-V, making it flexible for developers with a broad range of access to advanced foundation models.

Advertisement

OmniParser’s presence on Hugging Face has also made it accessible to a wide audience, inviting experimentation and improvement. This community-driven development is helping OmniParser evolve rapidly. Microsoft Partner Research Manager Ahmed Awadallah noted that open collaboration is key to building capable AI agents, and OmniParser is part of that vision.

The race to dominate AI screen interaction

The release of OmniParser is part of a broader competition among tech giants to dominate the space of AI screen interaction. Recently, Anthropic released a similar, but closed-source, capability called “Computer Use” as part of its Claude 3.5 update, which allows AI to control computers by interpreting screen content. Apple has also jumped into the fray with their Ferret-UI, aimed at mobile UIs, enabling their AI to understand and interact with elements like widgets and icons.

What differentiates OmniParser from these alternatives is its commitment to generalizability and adaptability across different platforms and GUIs. OmniParser isn’t limited to specific environments, such as only web browsers or mobile apps—it aims to become a tool for any vision-enabled LLM to interact with a wide range of digital interfaces, from desktops to embedded screens. 

Challenges and the road ahead

Despite its strengths, OmniParser is not without limitations. One ongoing challenge is the accurate detection of repeated icons, which often appear in similar contexts but serve different purposes—for instance, multiple “Submit” buttons on different forms within the same page. According to Microsoft’s documentation, current models still struggle to differentiate between these repeated elements effectively, leading to potential missteps in action prediction.

Advertisement

Moreover, the OCR component’s bounding box precision can sometimes be off, particularly with overlapping text, which can result in incorrect click predictions. These challenges highlight the complexities inherent in designing AI agents capable of accurately interacting with diverse and intricate screen environments. 

However, the AI community is optimistic that these issues can be resolved with ongoing improvements, particularly given OmniParser’s open-source availability. With more developers contributing to fine-tuning these components and sharing their insights, the model’s capabilities are likely to evolve rapidly. 


Source link
Continue Reading
Advertisement
Click to comment

You must be logged in to post a comment Login

Leave a Reply

Technology

Why is Nintendo targeting this YouTuber?

Published

on

Why is Nintendo targeting this YouTuber?

Russ Crandall knows how to reinvent himself. At 24, he relearned how to walk and write after a stroke impacted his brain. When open-heart surgery wasn’t enough to address a rare autoimmune disease, he adopted a paleo diet — and became a New York Times bestselling cookbook author and food blogger following his seemingly miraculous recovery. Last year, he retired from a 22-year career as a US Navy translator to become a full-time YouTuber instead.

Now, he’s wondering if Nintendo will force him to change yet again.

Crandall runs Retro Game Corps, a YouTube channel with half a million subscribers that shows hundreds of ways to play classic games using modern hardware and emulation. If there’s a handheld gaming device released in the past four years, odds are Crandall has made a 20-minute video about it. He started the channel as a hobby in 2020 during the covid-19 pandemic but soon realized it could become his day job.

So, last year, he shut down his food blog — “I was kind of done telling people what to eat,” he says — and left the military with the rank of master chief petty officer.

Advertisement

Yes, Retro Game Corps was a master chief, just like in Halo. (I saw his DD-214.)
Selfie by Russ Crandall

But four years into his YouTube career, on September 28th, Crandall saw how easily his new life as a content creator could disintegrate. Walking back from his studio after pulling an all-nighter, he checked his phone to see if a just-edited video was uploading properly. It was — but another one of his videos vanished before his eyes. Days earlier, he’d published a 14-minute video about how well Nintendo Wii U games can run on Android handhelds, and now it had been wiped from YouTube.

“This can’t be happening,” he recalls saying out loud. A few minutes later, a YouTube email confirmed it wasn’t a glitch: Nintendo had issued a DMCA takedown notice, YouTube had removed his video, and his entire 500,000-subscriber channel was now at risk of permanent deletion. 

“We’ll have to terminate your channel” after one more strike, YouTube warned

Advertisement

It was his second YouTube copyright strike from Nintendo, and Crandall says that’s when it truly sank in. YouTube maintains a strict “three strikes, you’re out” rule, and he realized his family’s livelihood depended on preventing strike number three. “It all sort of came crashing down in that moment,” he tells The Verge.

In a panic, he rushed back to the studio, canceled his upload, and publicly declared that Nintendo was targeting him. He would begin self-censoring all his videos to hopefully escape the Japanese company’s wrath. “I will no longer show any Nintendo games on-screen,” he told his fans and related communities on Reddit, YouTube, and social networks.

Nintendo was well within its rights to ask for a takedown, of course: Crandall had shown the company’s copyrighted content onscreen. And yet that doesn’t explain the copyright strike at all since countless Twitch streamers, YouTubers, TikTokers, and Instagrammers show Nintendo content every single day. Clearly, Nintendo was using copyright as a pretext to get these videos taken down.

Crandall says he received this YouTube notice on September 28th.
Advertisement

Most institutions have historically taken Nintendo’s legal threats seriously. Countless fan projects, including unofficial remakes and sequels, have been voluntarily terminated by their creators after receiving cease and desist orders from Nintendo. While the technology behind video game emulators is generally considered legal, even the lead developers of the Nintendo Switch emulators Yuzu and Ryujinx folded when Nintendo came knocking on their doors.

But unlike many of those developers, Crandall isn’t some pseudonymous person who could slink back into the internet’s shadows. Nor is he someone Nintendo can readily accuse of “facilitating piracy at a colossal scale,” like Yuzu, for distributing software tools. 

Even among content creators, Crandall doesn’t seem like the kind of person Nintendo usually threatens — he’s known for advocating that people should buy Nintendo products before they use emulators and often shows off physical cartridges in his videos to drive that message home. 

“If I’m playing a Switch game on my Steam Deck, the cartridge will be there or the box will be there to indicate that I have purchased the game,” he says. While he admits he hasn’t done that 100 percent of the time, he’s been careful with Nintendo Switch games in particular. In one of the videos that YouTube removed, he flips through a wallet full of 80 genuine cartridges. He also produces guides on how to create personal backups of your own genuine classic games.

Advertisement

Here’s his wallet of 80 genuine Switch cartridges, from one of the videos that Nintendo asked YouTube to remove.

That’s why the community was so surprised when Nintendo targeted him, of all YouTubers — and it’s why Crandall might possibly take the unusual step of challenging Nintendo’s takedowns. 

Crandall says he’s been a Nintendo fan for nearly 40 years, ever since his family bought an NES for Christmas in 1985. The copyright strikes hit hard. “This is the first actual interaction I’ve had with Nintendo, and it’s crazy. I feature most of their games not because I’m trying to, like, stick it to them, but just sharing the love of those games,” he says. 

But he does have a guess as to why Nintendo targeted him. The first copyright strike landed on his video about the MIG Dumper and the MIG Flash, a pair of devices that let you turn genuine Nintendo Switch cartridges into digital files and then carry around an entire library of those ROMs in a special microSD-equipped flash cartridge for your console. I’ve watched the video, and while Crandall does explicitly take an anti-piracy stance, it’s easy to imagine these gadgets being used by bad actors, too. 

Advertisement

“I think the first strike was simply due to the fact that they wanted to minimize attention around the MIG Flash cartridge and dumper, and they had an opportunity,” Crandall says. That opportunity was a relatively tiny mistake: unlike, say, fellow YouTuber Taki Udon’s video on the MIG products, Retro Game Corps showed off four seconds of the title screen of Mario to prove the MIG hardware could legitimately dump and run games, potentially infringing Nintendo’s exclusive right to distribute and / or perform its audiovisual intellectual property.

In one of the videos YouTube removed, Crandall never shows more than the title screen of this Nintendo game.

Isn’t that fair use? Crandall thinks so. It seems like his uses could be brief, limited, and educational enough to satisfy the four-factor fair use test, and arguing that could genuinely get him out of YouTube purgatory. I could easily find dozens of similar examples in our journalism here at The Verge. But in order to submit what’s called a “copyright counter notification” with YouTube, which argues that he’s been inaccurately targeted and isn’t infringing on someone’s copyright, Crandall would have to open himself up to a potential Nintendo lawsuit. 

“It’s a dangerous game,” says Richard Hoeg, a business attorney who hosts the Virtual Legality podcast. “You really don’t want to get into federal court over something that even if you win, will be an expensive and time-consuming burden.”

Advertisement

But Crandall knows this — he seems quite read up on both the DMCA and YouTube processes — and yet he’s considered at least trying his luck. Crandall says he’s conflicted; he doesn’t want to “poke the bear.” He has his family to think about. But it’s possible Nintendo could continue to come after him, he admits, even if he lies low.

While he’s already eliminated Nintendo games from his testing suite for all future videos, he says he simply doesn’t have time to go back through the hundreds of videos he’s created that already contain Mario footage and blur or delete every last scrap. And yet, the way things stand, Nintendo could pick any of those videos to immediately designate his channel for deletion. 

Companies can freely pick and choose who they target with copyright infringement complaints and lawsuits, several legal experts tell me. Unlike with trademarks, they don’t need to actively or consistently defend their works in order to maintain their rights.

Crandall says that even YouTube initially thought that perhaps Nintendo made a mistake when targeting him. He’s part of the YouTube Partner Program, and his designated partner manager told him to sit tight while YouTube asked Nintendo if it might retract its own takedown requests. But Nintendo wouldn’t, and YouTube has now told him he’s on his own. 

Advertisement

On November 23rd, one of the copyright strikes should simply expire — unless Nintendo makes a move before then.
Image via Russ Crandall

As of late October, he’s waffling. He could simply wait two more months until YouTube’s 90-day copyright strikes expire because, as soon as they do, his channel will no longer be in danger of immediate termination. Nintendo’s takedown requests already succeeded in removing those videos, and he can hope Nintendo feels it’s made enough of an example out of him to do anything more. 

Or he can submit a document that shows he’s not willing to be that example, not willing to be pushed around by Nintendo — and hope it doesn’t land him in a world of legal hurt. 

It’s painful for Crandall, who has been a lifelong fan of Nintendo’s work. Even after a long day of making videos about games, he likes to relax by playing through a couple of classic Mario or Donkey Kong levels, purely to admire the artistry and design. “Since the second strike I haven’t been doing that much at all, because even just seeing the box art leaves a bit of a sour taste in my mouth,” he says.

Advertisement

Nintendo didn’t respond to repeated requests for comment. 

Source link

Continue Reading

Technology

Amazon finally adds MFA to its enterprise email service

Published

on

Amazon finally adds MFA to its enterprise email service

Eight years on from its initial launch, Amazon has introduced multi-factor authentication (MFA) to its business cloud-hosted email service, WorkMail.

Better late than never appears to be the justification behind the near-decade delay, especially for one of the most basic forms of identity verification that has been standard practice for several years now.

Source link

Advertisement

Continue Reading

Technology

More than winter is coming: Warner Bros. is developing a Game of Thrones movie

Published

on

More than winter is coming: Warner Bros. is developing a Game of Thrones movie
Emilia Clarke and Kit Harington in Hame of Thrones.
HBO

In the half decade since Game of Thrones finished its eight-season run on HBO, the premium cable network has put together numerous spinoff projects — only two of which have come to fruition: House of the Dragon and the upcoming prequel A Knight of the Seven Kingdoms. Now, HBO’s parent company, Warner Bros. Discovery, is making plans to bring George R.R. Martin’s fantasy world to the big screen.

According to The Hollywood Reporter, Warner Bros. is “quietly developing” a Game of Thrones movie, but it will certainly be a lot less quiet now that the word is out. The story notes that the project is still very early in development, with no director or screenwriters attached. It’s also unclear if the studio has a concept in mind for the film, or if Martin will be directly involved with crafting the story.

Ironically, Game of Thrones showrunners David Benioff and Dan Weiss pitched HBO a trilogy of movies to wrap up the series. Martin was reportedly in favor of that plan as well, but HBO was adamant that its most popular intellectual property remain exclusive to the network. This was before Max was launched as a standalone streamer, and new ownership took over. So there doesn’t seem to be a roadblock in the way of getting the film made now.

Thus far, almost all of the proposed Game of Thrones spinoffs have been prequels. While Martin’s backstory for his A Song of Ice and Fire novels has a lot territory to explore, there was a single Game of Thrones sequel series that would have featured Kit Harington reprising his role as Jon Snow. That project fell through, but a sequel story would be the only way to bring Harington and other cast members from the show back for a new story. For now, we’ll just have to wait and see what develops.






Source link

Advertisement

Continue Reading

Science & Environment

Exxon (XOM) earnings Q3 2024

Published

on

Exxon (XOM) earnings Q3 2024


An Exxon gas station is seen in the Brooklyn borough of New York City on Oct. 6, 2023.

Michael M. Santiago | Getty Images

Exxon Mobil beat third-quarter earnings expectations, as the oil major reached its highest production level in more than four decades.

Here is what Exxon reported for the third quarter compared with what Wall Street was expecting, based on a survey of analysts by LSEG: 

Advertisement

  • Earnings per share: $1.92 adjusted, vs. $1.88 per share expected.
  • Revenues: $90 billion, vs. $93.94 billion expected

The oil major booked net income of $8.61 billion in the quarter, or $1.92 per share, down about 5% compared to $9.1 billion, or $2.25 per share, in the year-ago period. Exxon’s profits have declined as refining margins and natural gas prices have pulled back from from historically high levels in 2023.

The company returned $9.8 billion to shareholders in the quarter and increased its fourth-quarter dividend to $0.99 per share.

Exxon said it has reached its high production level in more than 40 years at 3.2 million barrels per day.

The oil major’s stock rose about 1% in pre-market trading.

This is a developing story. Please check back for updates.

Advertisement



Source link

Continue Reading

Technology

The US forced Huawei to build its own technology, founder says

Published

on

The US forced Huawei to build its own technology, founder says

Ren Zhengfei, the founder of Huawei Technologies, said that the US forced Huawei to build its own technology. In other words, the US ban did that, as the company did not really have many options following a series of those bans.

Huawei founder believes the US forced Huawei to build its own technology

The company’s founder attended the ICPC (International Collegiate Programming Content) coding competition for university students, where he said that. He also talked with students about Huawei, technology in general, future goals, and a number of other topics.

At one point during the event, Zhengfei actually said that he believes Huawei can learn from the receptive culture of the US. He believes that it can make both Huawei and China advance in science and technology.

These were his exact words: “The US has set an example for all countries and companies worldwide on being open. If a country is closed off, it will fall behind”.

Advertisement

As many of you know at this point, the US sanctions landed in 2019, as security concerns were quoted. A number of additional roadblocks were set after that, though. Huawei is still blocked off from accessing tools to make chips and various other equipment.

Those bans forced Huawei to be self-sufficient

Huawei was forced to be self-sufficient, and it seemingly benefited the company in a way. Zhengfei said the following: “American technologies and tools are very good… [but] Huawei cannot use them; we had no choice but to create our tools. Open innovation and utilizing the advanced achievements of others is the true way forward for an enterprise”.

On top of everything, Huawei’s founder also highlighted the importance of AI. He said that artificial intelligence is becoming unstoppable. He said that if Huawei uses it in the right way, it could achieve a lot of success moving forward.

Huawei is expected to announce its 5nm processor made in collaboration with SMIC soon. That Kirin chip will be first used in the upcoming Huawei Mate 70 series.

Advertisement

Source link

Continue Reading

Science & Environment

Cloud-inspired material can bend light around corners

Published

on

Cloud-inspired material can bend light around corners


A new material can bend light

University of Glasgow

Scientists have discovered a technique whereby light can be bent around corners, inspired by the way clouds scatter sunlight. This type of light-bending could lead to advances in medical imaging, electronics cooling and even nuclear reactor design.

Daniele Faccio at the University of Glasgow, UK, and his colleagues say they are shocked this type of light scattering wasn’t noticed before. It works on the same basis as clouds, snow and other white materials that absorb light: once photons hit the surface of such a material, they are scattered in all directions, barely penetrating at all and getting reflected out the way they came. For instance, when sunlight hits a tall cumulonimbus cloud, it bounces off the top, making this part of the cloud appear bright white. But so little light reaches the bottom of the cloud that this part appears grey – despite being made up of the same water droplets.

Advertisement

“The light bounces around and sort of tries to get in, and it’s bouncing off all the molecules and the defects,” says Faccio. “And eventually what happens is it just gets reflected back because it can’t get in. This is this scattering.”

To replicate this process, the team 3D printed objects from opaque white material while leaving thin tunnels of clear resin within. When light is shone at the material, it travels into these tunnels and is scattered – just as light is on snow or clouds. However, instead of scattering randomly in every direction until they are evenly dispersed, the photons are directed to return to the resin tunnel by the opaque material. The team put this to use, creating a range of objects that steer light in an organised way.

3D-printed white blocks with curved channels guide scattering light

University of Glasgow

Advertisement

These 3D-printed objects are functionally similar to fibre optic cables, which route light along their length, but they operate on fundamentally different principles. Fibre optic cables steer light by infinitely reflecting internally. When photons attempt to leave a cable’s inner core of plastic or glass, they hit another material with a lower refractive index and are reflected back inside. In this way, light can be carried for kilometres at a time, even around bends.

The researchers say their material boosts light transmission by more than two orders of magnitude compared with solid blocks without the same clear tunnels, and also allows it to be directed around curves. This is much less efficient than fibre optic, and will therefore struggle to achieve the great distances that it does, but it is also very simple and cheap.

This method of light-bending could make use of existing tunnels of translucent material, such as tendons and fluid in the spinal column, to provide new ways to carry out medical imaging. Faccio says the exact same principle also works to direct heat and neutrons, and could therefore also find use in a range of engineering applications such as cooling systems and nuclear reactors.

“It wasn’t obvious that this would work at all. We were shocked,” says Faccio, who believes the phenomenon could easily have been discovered decades or even centuries ago. “It’s not like we’ve created or found some really niche, weird equation with some weird properties.”

Advertisement

Topics:



Source link

Continue Reading

Trending

Copyright © 2024 WordupNews.com