Connect with us

Technology

Ai2’s new Molmo open source AI models beat GPT-4o, Claude

Published

on

Ai2's new Molmo open source AI models beat GPT-4o, Claude

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


The Allen Institute for AI (Ai2) today unveiled Molmo, an open-source family of state-of-the-art multimodal AI models which outpeform top proprietary rivals including OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 1.5 on several third-party benchmarks.

The models can therefore accept and analyze imagery uploaded to them by users, similar to the leading proprietary foundation models.

Yet, Ai2 also noted in a post on X that Molmo uses “1000x less data” than the proprietary rivals — thanks to some clever new training techniques described in greater detail below and in a technical report paper published by the Paul Allen-founded and Ali Farhadi-led company.

Advertisement

Ai2 says the release underscores its commitment to open research by offering high-performing models, complete with open weights and data, to the broader community — and of course, companies looking for solutions they can completely own, control, and customize.

It comes on the heels of Ai2’s release two weeks ago of another open model, OLMoE, which is a “mixture of experts” or combination of smaller models designed for cost effectiveness.

Closing the Gap Between Open and Proprietary AI

Molmo consists of four main models of different parameter sizes and capabilities:

  1. Molmo-72B (72 billion parameters, or settings — the flagship model, based on based on Alibaba Cloud’s Qwen2-72B open source model)
  2. Molmo-7B-D (“demo model” based on Alibaba’s Qwen2-7B model)
  3. Molmo-7B-O (based on Ai2’s OLMo-7B model)
  4. MolmoE-1B (based on OLMoE-1B-7B mixture-of-experts LLM, and which Ai2 says “nearly matches the performance of GPT-4V on both academic benchmarks and user preference.”)

These models achieve high performance across a range of third-party benchmarks, outpacing many proprietary alternatives. And they’re all available under permissive Apache 2.0 licenses, enabling virtually any sorts of usages for research and commercialization (e.g. enterprise grade).

Notably, Molmo-72B leads the pack in academic evaluations, achieving the highest score on 11 key benchmarks and ranking second in user preference, closely following GPT-4o.

Advertisement

Vaibhav Srivastav, a machine learning developer advocate engineer at AI code repository company Hugging Face, commented on the release on X, highlighting that Molmo offers a formidable alternative to closed systems, setting a new standard for open multimodal AI.

In addition, Google DeepMind robotics researcher Ted Xiao took to X to praise the inclusion of pointing data in Molmo, which he sees as a game-changer for visual grounding in robotics.

Advertisement

This capability allows Molmo to provide visual explanations and interact more effectively with physical environments, a feature that is currently lacking in most other multimodal models.

The models are not only high-performing but also entirely open, allowing researchers and developers to access and build upon cutting-edge technology.

Advanced Model Architecture and Training Approach

Molmo’s architecture is designed to maximize efficiency and performance. All models use OpenAI’s ViT-L/14 336px CLIP model as the vision encoder, which processes multi-scale, multi-crop images into vision tokens.

These tokens are then projected into the language model’s input space through a multi-layer perceptron (MLP) connector and pooled for dimensionality reduction.

Advertisement

The language model component is a decoder-only Transformer, with options ranging from the OLMo series to the Qwen2 and Mistral series, each offering different capacities and openness levels.

The training strategy for Molmo involves two key stages:

  1. Multimodal Pre-training: During this stage, the models are trained to generate captions using newly collected, detailed image descriptions provided by human annotators. This high-quality dataset, named PixMo, is a critical factor in Molmo’s strong performance.
  2. Supervised Fine-Tuning: The models are then fine-tuned on a diverse dataset mixture, including standard academic benchmarks and newly created datasets that enable the models to handle complex real-world tasks like document reading, visual reasoning, and even pointing.

Unlike many contemporary models, Molmo does not rely on reinforcement learning from human feedback (RLHF), focusing instead on a meticulously tuned training pipeline that updates all model parameters based on their pre-training status.

Outperforming on Key Benchmarks

The Molmo models have shown impressive results across multiple benchmarks, particularly in comparison to proprietary models.

For instance, Molmo-72B scores 96.3 on DocVQA and 85.5 on TextVQA, outperforming both Gemini 1.5 Pro and Claude 3.5 Sonnet in these categories. It further outperforms GPT-4o on AI2D (Ai2’s own benchmark, short for “A Diagram Is Worth A Dozen Images,” a dataset of 5000+ grade school science diagrams and 150,000+ rich annotations)

Advertisement

The models also excel in visual grounding tasks, with Molmo-72B achieving top performance on RealWorldQA, making it especially promising for applications in robotics and complex multimodal reasoning.

Open Access and Future Releases

Ai2 has made these models and datasets accessible on its Hugging Face space, with full compatibility with popular AI frameworks like Transformers.

This open access is part of Ai2’s broader vision to foster innovation and collaboration in the AI community.

Over the next few months, Ai2 plans to release additional models, training code, and an expanded version of their technical report, further enriching the resources available to researchers.

For those interested in exploring Molmo’s capabilities, a public demo and several model checkpoints are available now via Molmo’s official page.

Advertisement

Source link
Advertisement
Continue Reading
Advertisement
Click to comment

You must be logged in to post a comment Login

Leave a Reply

Technology

NYT Strands today — hints, answers and spangram for Wednesday, October 2 (game #213)

Published

on

NYT Strands homescreen on a mobile phone screen, on a light blue background

Strands is the NYT’s latest word game after the likes of Wordle, Spelling Bee and Connections – and it’s great fun. It can be difficult, though, so read on for my Strands hints.

Want more word-based fun? Then check out my Wordle today, NYT Connections today and Quordle today pages for hints and answers for those games.

Source link

Continue Reading

Servers computers

Dell Blade Server Cost (PowerEdge M420, M520, M620, M820, M910, M915)

Published

on

Dell Blade Server Cost (PowerEdge M420, M520, M620, M820, M910, M915)



http://bit.ly/newDellCoupon
Find the latest Dell PowerEdge M420, M520, M620, M820, M910, M915 Blade server cost and discount coupon code .

source

Continue Reading

Technology

Watch how astronauts drink coffee in space

Published

on

Watch how astronauts drink coffee in space

How Do Astronauts Drink Coffee in Space?

Like many folks, astronauts enjoy a cup of joe from time to time, but the lack of gravity means that preparing and drinking it is a little different to how you do it back on terra firma.

With that in mind, NASA has just released a short video (above) revealing how astronauts aboard the International Space Station (ISS) get their daily coffee fix.

To get the water for their brew, the astronauts use a specially designed water dispensing unit that takes recycled liquids and moisture drawn from the air. Once the water has been heated, the astronaut grabs a plastic pouch filled with freeze-dried coffee grounds, connects it to the unit, and fills it with the hot water. After that, they can go off to enjoy their coffee, sipping it through a straw. Or from a cup … let us explain.

Zero Gravity coffee cup

Back in 2008, one astronaut, Don Pettit (who happens to be aboard the station right now, too), decided that he wanted to enjoy his coffee in the more traditional way, by drinking it from a mug. So he invented what eventually became known as the Zero Gravity coffee cup, and you can see it in the video. To make a prototype, Pettit tore a piece of plastic from his Flight Data File mission book to create a teardrop-shaped drinking vessel. The design relies on surface tension and the laws of physics to keep the liquid from floating away in the microgravity conditions.

Advertisement

Further development and refinement of the design led to the Zero Gravity coffee cup becoming the first patented product invented in space.

Now that you know how astronauts drink coffee in space, you may be wondering how they go to the bathroom — apparently this is the question that astronauts get asked most. Well, this video explains all.

For more insight into how astronauts live and work aboard the space station, take a look at this collection of videos made over the years by visitors to the orbital outpost.






Source link

Advertisement

Continue Reading

Technology

AT&T claims T-Mobile Priority is ‘false and confusing marketing’

Published

on

Featured image for AT&T claims T-Mobile Priority is ‘false and confusing marketing’

AT&T has called out T-Mobile for its marketing campaign that promotes “T-Mobile Priority”. A direct competitor to AT&T’s FirstNet, T-Mobile Priority will cater to the public safety community.

AT&T claims T-Mobile Priority marketing campaign is misleading or confusing

Telecommunications and data networks for first responders and emergency workers operate on a different level. They are not clubbed with commercial cellular communication.

To offer immediate and quick access to the internet and communications during a crisis, AT&T offers its FirstNet network. Similarly, Verizon has its Frontline service.

T-Mobile recently announced T-Mobile Priority or T-Priority, which could be considered a competitor to AT&T’s FirstNet and Verizon’s Frontline. However, there’s a big difference in the technologies employed to offer internet and communications during a crisis.

Advertisement

The Mobile Report has access to an internal AT&T document, wherein the telecom company has criticized T-Mobile. AT&T has written to its employees claiming T-Mobile “falsely claims it is the world’s first network slice for First Responders”.

The document stresses how FirstNet is different and better than T-Priority. The internal memo even implies T-Mobile is testing unproven technology on the “wrong people”. The company has called T-Mobile “irresponsible” for doing so.

How is AT&T’s FirstNet different from T-Mobile Priority?

In the internal document, AT&T has stressed its FirstNet service offers “a dedicated communications platform for public safety”. The company has called T-Mobile Priority a “commercial offering”.

Technically speaking, AT&T’s FirstNet operates on a dedicated cellular frequency (band 14). Similarly, Verizon Frontline uses band 13. Needless to say, these frequency bands are reserved for first responders.

Advertisement

T-Mobile Priority will reportedly operate on T-Mobile’s existing 5G bands. However, the company plans to segment the traffic ensuring emergency workers have a reliable communication pathway.

Moreover, T-Mobile has indicated it will deploy 24/7 Emergency Management trucks. These vehicles could act as mobile communication towers to help fix problems affecting the network. They will also offer support during disasters, public safety incidents, and more.

Although T-Mobile’s solution could work, AT&T has slammed the company for testing its technology on a sector that has critical communications needs. AT&T has suggested T-Mobile should have first tested its network slicing on commercial customers or subscribers.

Incidentally, AT&T has admitted it plans to deploy 5G network slicing. However, the company pointed out it will use them for specific mission needs only.

Advertisement

Source link

Continue Reading

Servers computers

What's the difference between a server and a cloud hosting?

Published

on

What's the difference between a server and a cloud hosting?



Want to learn more? Check out our free course on tech management for startups: link: https://myctofriend.co/htbasaccess

If you have a specific question for your project, just go ahead and ask on: https://myctofriend.co/ask
Get all the links mentioned and the print-ready notes here: https://myctofriend.co/videos/

Sacha and his cofounder already developed the first version of their product. They are now undertaking the Version 2 of their development and are considering moving to Amazon Web Services (AWS) as their cloud hosting solution.

Moving an application to the cloud is usually a very good option because you are not renting a server anymore but buying a delivered service instead.

Let’s see how cloud services work.

If you want to know more about us and our program, visit: https://myctofriend.co
—————————-
If you want to swing into action and build a startup without a CTO, then check out our FREE course: “How to build a Startup without a CTO” : https://myctofriend.co/htbasaccess .

source

Continue Reading

Technology

Google allegedly got the Juno YouTube app removed from the Vision Pro App Store

Published

on

Google allegedly got the Juno YouTube app removed from the Vision Pro App Store

Juno, a widely praised (unofficial) YouTube app for Vision Pro, has been removed from Apple’s App Store after complaints from Google, according to from Juno’s developer Christian Selig. Google, Selig says, suggested that his app violates their trademark.

It’s the latest setback for Selig, who shut down his popular Apollo last year after the company changed its developer policies to charge for use of its API. The shutdown of Apollo and other apps like it ignited a from Reddit users and moderators.

This time, Selig says he doesn’t want drama, noting the $5 app was a “hobby project” for him to tinker with developing for visionOS. “I really enjoyed building Juno, but it was always something I saw as fundamentally a little app I built for fun,” Selig wrote on his website. “Because of that, I have zero desire to spin this into a massive fight akin to what happened with Reddit years ago.”

It’s unclear what aspect of Juno may have been the issue. Selig says that Google referenced its “trademarks and iconography” in a message to Apple, “stating that Juno does not adhere to YouTube guidelines and modifies the website” in a way that’s not permitted. “I don’t personally agree with this, as Juno is just a web view, and acts as little more than a browser extension that modifies CSS to make the website and video player look more ‘visionOS’ like,” Selig explains. “No logos are placed other than those already on the website, and the ‘for YouTube’ suffix is permitted in their branding guidelines.”

Advertisement

Google hasn’t made its own YouTube app for Vision Pro, though the company said such an app was “on our roadmap.” The company didn’t immediately respond to a request for comment.

Selig says that people who have already paid for the app should be able to keep using it for the time being, though there’s a chance a future YouTube update could end up bricking it.

Source link

Continue Reading

Trending

Copyright © 2024 WordupNews.com