**Molmo Open Source AI Models Outperform GPT-4o and Claude**
The Allen Institute for AI (Ai2) has unveiled Molmo, an open-source family of state-of-the-art multimodal AI models that outperform top proprietary rivals, including OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 1.5. These models can accept and analyze imagery uploaded by users, similar to leading proprietary foundation models.
Molmo uses “1000x less data” than proprietary rivals, thanks to clever new training techniques. This achievement underscores Ai2’s commitment to open research by offering high-performing models, complete with open weights and data, to the broader community.
Molmo’s architecture is designed to maximize efficiency and performance. The models use OpenAI’s ViT-L/14 336px CLIP model as the vision encoder, processing multi-scale, multi-crop images into vision tokens. These tokens are then projected into the language model’s input space through a multi-layer perceptron (MLP) connector and pooled for dimensionality reduction.
The Molmo models have shown impressive results across multiple benchmarks, particularly in comparison to proprietary models. Molmo-72B has achieved top performance on several benchmarks, including DocVQA, TextVQA, and AI2D.
Ai2 has made these models and datasets accessible on its Hugging Face space, with full compatibility with popular AI frameworks like Transformers. This open access is part of Ai2’s broader vision to foster innovation and collaboration in the AI community. Over the next few months, Ai2 plans to release additional models, training code, and an expanded version of their technical report.
For those interested in exploring Molmo’s capabilities, a public demo and several model checkpoints are available now via Molmo’s official page.