The Allen Institute for AI (Ai2) has released the Multimodal Open Language Model, or Molmo, a powerful open source AI model with visual abilities. This model can interpret images as well as converse through a chat interface, enabling AI agents to perform tasks such as browsing the web, navigating through file directories, and drafting documents.
“With this release, many more people can deploy a multimodal model,” says Ali Farhadi, CEO of Ai2. “It should be an enabler for next-generation apps.” Molmo can be used to power AI agents, which are being widely touted as the next big thing in AI. These agents have the potential to go beyond chatting and take complex actions on computers when given a command.
Molmo is an open source model, which means that developers can fine-tune their agents for specific tasks by providing additional training data. This is in contrast to commercial models, which can only be fine-tuned to a limited degree through their APIs. “Having an open source, multimodal model means that any startup or researcher that has an idea can try to do it,” says Ofir Press, a postdoc at Princeton University who works on AI agents.
The release of Molmo brings AI agents one step closer to becoming a reality. With its open source and multimodal capabilities, Molmo has the potential to enable the development of more powerful and useful AI agents.