Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
The Allen Institute for AI (Ai2) today unveiled Molmo, an open-source family of state-of-the-art multimodal AI models which outpeform top proprietary rivals including OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 1.5 on several third-party benchmarks.
The models can therefore accept and analyze imagery uploaded to them by users, similar to the leading proprietary foundation models.
Yet, Ai2 also noted in a post on X that Molmo uses “1000x less data” than the proprietary rivals — thanks to some clever new training techniques described in greater detail below and in a technical report paper published by the Paul Allen-founded and Ali Farhadi-led company.
Advertisement
Ai2 says the release underscores its commitment to open research by offering high-performing models, complete with open weights and data, to the broader community — and of course, companies looking for solutions they can completely own, control, and customize.
It comes on the heels of Ai2’s release two weeks ago of another open model, OLMoE, which is a “mixture of experts” or combination of smaller models designed for cost effectiveness.
Closing the Gap Between Open and Proprietary AI
Molmo consists of four main models of different parameter sizes and capabilities:
Molmo-72B (72 billion parameters, or settings — the flagship model, based on based on Alibaba Cloud’s Qwen2-72B open source model)
Molmo-7B-D (“demo model” based on Alibaba’s Qwen2-7B model)
Molmo-7B-O (based on Ai2’s OLMo-7B model)
MolmoE-1B (based on OLMoE-1B-7B mixture-of-experts LLM, and which Ai2 says “nearly matches the performance of GPT-4V on both academic benchmarks and user preference.”)
These models achieve high performance across a range of third-party benchmarks, outpacing many proprietary alternatives. And they’re all available under permissive Apache 2.0 licenses, enabling virtually any sorts of usages for research and commercialization (e.g. enterprise grade).
Notably, Molmo-72B leads the pack in academic evaluations, achieving the highest score on 11 key benchmarks and ranking second in user preference, closely following GPT-4o.
Advertisement
Vaibhav Srivastav, a machine learning developer advocate engineer at AI code repository company Hugging Face, commented on the release on X, highlighting that Molmo offers a formidable alternative to closed systems, setting a new standard for open multimodal AI.
In addition, Google DeepMind robotics researcher Ted Xiao took to X to praise the inclusion of pointing data in Molmo, which he sees as a game-changer for visual grounding in robotics.
Advertisement
This capability allows Molmo to provide visual explanations and interact more effectively with physical environments, a feature that is currently lacking in most other multimodal models.
The models are not only high-performing but also entirely open, allowing researchers and developers to access and build upon cutting-edge technology.
Advanced Model Architecture and Training Approach
Molmo’s architecture is designed to maximize efficiency and performance. All models use OpenAI’s ViT-L/14 336px CLIP model as the vision encoder, which processes multi-scale, multi-crop images into vision tokens.
These tokens are then projected into the language model’s input space through a multi-layer perceptron (MLP) connector and pooled for dimensionality reduction.
Advertisement
The language model component is a decoder-only Transformer, with options ranging from the OLMo series to the Qwen2 and Mistral series, each offering different capacities and openness levels.
The training strategy for Molmo involves two key stages:
Multimodal Pre-training: During this stage, the models are trained to generate captions using newly collected, detailed image descriptions provided by human annotators. This high-quality dataset, named PixMo, is a critical factor in Molmo’s strong performance.
Supervised Fine-Tuning: The models are then fine-tuned on a diverse dataset mixture, including standard academic benchmarks and newly created datasets that enable the models to handle complex real-world tasks like document reading, visual reasoning, and even pointing.
Unlike many contemporary models, Molmo does not rely on reinforcement learning from human feedback (RLHF), focusing instead on a meticulously tuned training pipeline that updates all model parameters based on their pre-training status.
Outperforming on Key Benchmarks
The Molmo models have shown impressive results across multiple benchmarks, particularly in comparison to proprietary models.
For instance, Molmo-72B scores 96.3 on DocVQA and 85.5 on TextVQA, outperforming both Gemini 1.5 Pro and Claude 3.5 Sonnet in these categories. It further outperforms GPT-4o on AI2D (Ai2’s own benchmark, short for “A Diagram Is Worth A Dozen Images,” a dataset of 5000+ grade school science diagrams and 150,000+ rich annotations)
Advertisement
The models also excel in visual grounding tasks, with Molmo-72B achieving top performance on RealWorldQA, making it especially promising for applications in robotics and complex multimodal reasoning.
Open Access and Future Releases
Ai2 has made these models and datasets accessible on its Hugging Face space, with full compatibility with popular AI frameworks like Transformers.
This open access is part of Ai2’s broader vision to foster innovation and collaboration in the AI community.
Over the next few months, Ai2 plans to release additional models, training code, and an expanded version of their technical report, further enriching the resources available to researchers.
For those interested in exploring Molmo’s capabilities, a public demo and several model checkpoints are available now via Molmo’s official page.
Advertisement
VB Daily
Stay in the know! Get the latest news in your inbox daily
Strands is the NYT’s latest word game after the likes of Wordle, Spelling Bee and Connections – and it’s great fun. It can be difficult, though, so read on for my Strands hints.
SPOILER WARNING: Information about NYT Strands today is below, so don’t read on if you don’t want to know the answers.
Your Strands expert
Your Strands expert
Marc McLaren
NYT Strands today (game #213) – hint #1 – today’s theme
What is the theme of today’s NYT Strands?
• Today’s NYT Strands theme is… Fresh out of the oven
Advertisement
NYT Strands today (game #213) – hint #2 – clue words
Play any of these words to unlock the in-game hints system.
TRAIT
STONE
SOAR
TRUMP
DINE
LINE
NYT Strands today (game #213) – hint #3 – spangram
What is a hint for today’s spangram?
• Dough but not nuts
NYT Strands today (game #213) – hint #4 – spangram position
What are two sides of the board that today’s spangram touches?
First: top, 3rd column
Last: bottom, 4th column
Right, the answers are below, so DO NOT SCROLL ANY FURTHER IF YOU DON’T WANT TO SEE THEM.
Advertisement
NYT Strands today (game #213) – the answers
The answers to today’s Strands, game #213, are…
SCONE
DANISH
CROISSANT
MUFFIN
STRUDEL
GALETTE
SPANGRAM: PASTRIES
My rating: Moderate
My score: Perfect
Hello, Mr/Mrs/Ms NYT, I have a question for you: in what way is a MUFFIN a PASTRY? Or a SCONE? STRUDEL, yes. CROISSANT, definitely. DANISH, sure. GALETTE… well, we’ll get to that. But not MUFFIN or SCONE, which are both quick breads or potentially cakes. Maybe I’m missing some crucial detail here – I’m not a chef – but I just don’t get why they were included.
Sign up for breaking news, reviews, opinion, top tech deals, and more.
Fortunately, it wasn’t too difficult for me to find them anyway, because I’m well aware that sometimes the NYT makes odd decisions with categorization, so I always watch out for curveballs. That didn’t help me with GALETTE, though, chiefly because I’ve never heard that word before. Still, they look lovely, so I shall be trying one next time I see one.
Yesterday’s NYT Strands answers (Tuesday 1 October, game #212)
PARAMOUNT
DISCOVERY
HISTORY
HALLMARK
LIFETIME
SPANGRAM: NETWORK
What is NYT Strands?
Strands is the NYT’s new word game, following Wordle and Connections. It’s now out of beta so is a fully fledged member of the NYT’s games stable and can be played on the NYT Games site on desktop or mobile.
I’ve got a full guide to how to play NYT Strands, complete with tips for solving it, so check that out if you’re struggling to beat it each day.
Like many folks, astronauts enjoy a cup of joe from time to time, but the lack of gravity means that preparing and drinking it is a little different to how you do it back on terra firma.
With that in mind, NASA has just released a short video (above) revealing how astronauts aboard the International Space Station (ISS) get their daily coffee fix.
To get the water for their brew, the astronauts use a specially designed water dispensing unit that takes recycled liquids and moisture drawn from the air. Once the water has been heated, the astronaut grabs a plastic pouch filled with freeze-dried coffee grounds, connects it to the unit, and fills it with the hot water. After that, they can go off to enjoy their coffee, sipping it through a straw. Or from a cup … let us explain.
Back in 2008, one astronaut, Don Pettit (who happens to be aboard the station right now, too), decided that he wanted to enjoy his coffee in the more traditional way, by drinking it from a mug. So he invented what eventually became known as the Zero Gravity coffee cup, and you can see it in the video. To make a prototype, Pettit tore a piece of plastic from his Flight Data File mission book to create a teardrop-shaped drinking vessel. The design relies on surface tension and the laws of physics to keep the liquid from floating away in the microgravity conditions.
Advertisement
Further development and refinement of the design led to the Zero Gravity coffee cup becoming the first patented product invented in space.
Now that you know how astronauts drink coffee in space, you may be wondering how they go to the bathroom — apparently this is the question that astronauts get asked most. Well, this video explains all.
For more insight into how astronauts live and work aboard the space station, take a look at this collection of videos made over the years by visitors to the orbital outpost.
AT&T has called out T-Mobile for its marketing campaign that promotes “T-Mobile Priority”. A direct competitor to AT&T’s FirstNet, T-Mobile Priority will cater to the public safety community.
AT&T claims T-Mobile Priority marketing campaign is misleading or confusing
Telecommunications and data networks for first responders and emergency workers operate on a different level. They are not clubbed with commercial cellular communication.
To offer immediate and quick access to the internet and communications during a crisis, AT&T offers its FirstNet network. Similarly, Verizon has its Frontline service.
T-Mobile recently announced T-Mobile Priority or T-Priority, which could be considered a competitor to AT&T’s FirstNet and Verizon’s Frontline. However, there’s a big difference in the technologies employed to offer internet and communications during a crisis.
Advertisement
The Mobile Report has access to an internal AT&T document, wherein the telecom company has criticized T-Mobile. AT&T has written to its employees claiming T-Mobile “falsely claims it is the world’s first network slice for First Responders”.
The document stresses how FirstNet is different and better than T-Priority. The internal memo even implies T-Mobile is testing unproven technology on the “wrong people”. The company has called T-Mobile “irresponsible” for doing so.
How is AT&T’s FirstNet different from T-Mobile Priority?
In the internal document, AT&T has stressed its FirstNet service offers “a dedicated communications platform for public safety”. The company has called T-Mobile Priority a “commercial offering”.
Technically speaking, AT&T’s FirstNet operates on a dedicated cellular frequency (band 14). Similarly, Verizon Frontline uses band 13. Needless to say, these frequency bands are reserved for first responders.
Advertisement
T-Mobile Priority will reportedly operate on T-Mobile’s existing 5G bands. However, the company plans to segment the traffic ensuring emergency workers have a reliable communication pathway.
Moreover, T-Mobile has indicated it will deploy 24/7 Emergency Management trucks. These vehicles could act as mobile communication towers to help fix problems affecting the network. They will also offer support during disasters, public safety incidents, and more.
Although T-Mobile’s solution could work, AT&T has slammed the company for testing its technology on a sector that has critical communications needs. AT&T has suggested T-Mobile should have first tested its network slicing on commercial customers or subscribers.
Incidentally, AT&T has admitted it plans to deploy 5G network slicing. However, the company pointed out it will use them for specific mission needs only.
Want to learn more? Check out our free course on tech management for startups: link: https://myctofriend.co/htbasaccess
If you have a specific question for your project, just go ahead and ask on: https://myctofriend.co/ask
Get all the links mentioned and the print-ready notes here: https://myctofriend.co/videos/
Sacha and his cofounder already developed the first version of their product. They are now undertaking the Version 2 of their development and are considering moving to Amazon Web Services (AWS) as their cloud hosting solution.
Moving an application to the cloud is usually a very good option because you are not renting a server anymore but buying a delivered service instead.
Let’s see how cloud services work.
If you want to know more about us and our program, visit: https://myctofriend.co
—————————-
If you want to swing into action and build a startup without a CTO, then check out our FREE course: “How to build a Startup without a CTO” : https://myctofriend.co/htbasaccess .
Juno, a widely praised (unofficial) YouTube app for Vision Pro, has been removed from Apple’s App Store after complaints from Google, according to from Juno’s developer Christian Selig. Google, Selig says, suggested that his app violates their trademark.
It’s the latest setback for Selig, who shut down his popular Apollo last year after the company changed its developer policies to charge for use of its API. The shutdown of Apollo and other apps like it ignited a from Reddit users and moderators.
This time, Selig says he doesn’t want drama, noting the $5 app was a “hobby project” for him to tinker with developing for visionOS. “I really enjoyed building Juno, but it was always something I saw as fundamentally a little app I built for fun,” Selig wrote on his website. “Because of that, I have zero desire to spin this into a massive fight akin to what happened with Reddit years ago.”
It’s unclear what aspect of Juno may have been the issue. Selig says that Google referenced its “trademarks and iconography” in a message to Apple, “stating that Juno does not adhere to YouTube guidelines and modifies the website” in a way that’s not permitted. “I don’t personally agree with this, as Juno is just a web view, and acts as little more than a browser extension that modifies CSS to make the website and video player look more ‘visionOS’ like,” Selig explains. “No logos are placed other than those already on the website, and the ‘for YouTube’ suffix is permitted in their branding guidelines.”
Advertisement
Google hasn’t made its own YouTube app for Vision Pro, though the company said such an app was “on our roadmap.” The company didn’t immediately respond to a request for comment.
Selig says that people who have already paid for the app should be able to keep using it for the time being, though there’s a chance a future YouTube update could end up bricking it.
You must be logged in to post a comment Login