To everyone’s surprise, Apple is turning weakness into strength with its evolving approach to artificial intelligence (AI), as it becomes one of the biggest open-source research contributors in the field.
Apple last week released its DCLM (dataComp for Language Models) models on HuggingFace. This aims to be a solution for data training and curation, enabling new models to be trained and tested with relatively few “training tokens.”
‘Best performing truly open-source models’
“Our baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% & 66%) and performs similarly on an average of 53 natural language understanding tasks while being trained with 6.6x less compute than Llama 3 8B,” the team wrote. “Our results highlight the importance of dataset design for training language models and offer a starting point for further research on data curation.”
Apple’s researchers worked with peers from the University of Washington, the Toyota Research Institute, Stanford, and others on the project.
“To our knowledge these are by far the best performing truly open-source models (open data, open weight models, open training code),” Apple machine learning researcher Vaishaal Shankar wrote when announcing the news.
Reaction seems positive. “The data curation process is a must-study for anyone looking to train a model from scratch or fine-tune an existing one,” said Applied AI Scientist Akash Shetty
Opening up, strategically
The model seems to compete strongly with other models of its type, including Mistral-7B, and approaches the performance of models from Meta and Google — even though it is trained on smaller quantities of content.
The idea is that research teams can use the tech to create their own small AI models, which can themselves be embedded in (say) apps and deployed at low cost.
While it is unwise to read too much into things, Apple’s AI teams do seem to have embraced a more open approach to research in the field. That makes sense for a company allegedly racing to catch up to competitors, of course; it also makes sense in another way, because the company that contributes and maintains code to the open source community puts itself in a strong position for future research in terms of contact with peers and fostering future goodwill.
Collaboration counts
That alone is a small, but remarkable, step for Apple, which has a reputation for prizing secrecy above all else. That secrecy has, we’ve been intermittently told in recent years, been a big problem for Apple’s research teams, who wanted to work in a more collaborative way with others in the cutting-edge industry.
Apple seems to have listened, which is why I think it is now turning what was once a disadvantage into an advantage. In the short-term, the company wants to promote effective innovation in AI while it develops its own solutions under the Apple Intelligence brand.
It might also hope to position itself as a source technology provider powering many open-source projects. Ensuring good technologies are widely available to the open-source community could help prevent other entities owning too much of the core technology.
Coders on the edge of time
The release of the small-size model also reflects Apple’s core approach toward edge-based AI solutions, supplemented by its own secure server-based AI services and also third-party offers, such as from OpenAI or in future Google Gemini.
The Apple release is just the latest in a string of such releases to have emerged since the company intensified its focus on AI research. The company has now published dozens of models, including most recently its OpenELM and CoreML models. The latter models are optimized to run generative AI (genAI) and machine learning applications on device.
While nothing has been stated to this effect, the cards Apple is showing indicate it is working more closely with researchers outside the company. And it’s investing on the development of edge AI — ironically, the direction the industry will inevitably head toward as the real-life problems of power and water consumption, copyright, and privacy present existential challenges to the future evolution of the server-led AI space.
Please follow me on Mastodon, or join me in the AppleHolic’s bar & grill and Apple Discussions groups on MeWe.
To everyone’s surprise, Apple is turning weakness into strength with its evolving approach to artificial intelligence (AI), as it becomes one of the biggest open-source research contributors in the field.
Apple last week released its DCLM (dataComp for Language Models) models on HuggingFace. This aims to be a solution for data training and curation, enabling new models to be trained and tested with relatively few “training tokens.”
‘Best performing truly open-source models’
“Our baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% & 66%) and performs similarly on an average of 53 natural language understanding tasks while being trained with 6.6x less compute than Llama 3 8B,” the team wrote. “Our results highlight the importance of dataset design for training language models and offer a starting point for further research on data curation.”
Apple’s researchers worked with peers from the University of Washington, the Toyota Research Institute, Stanford, and others on the project.
“To our knowledge these are by far the best performing truly open-source models (open data, open weight models, open training code),” Apple machine learning researcher Vaishaal Shankar wrote when announcing the news.
Reaction seems positive. “The data curation process is a must-study for anyone looking to train a model from scratch or fine-tune an existing one,” said Applied AI Scientist Akash Shetty
Opening up, strategically
The model seems to compete strongly with other models of its type, including Mistral-7B, and approaches the performance of models from Meta and Google — even though it is trained on smaller quantities of content.
The idea is that research teams can use the tech to create their own small AI models, which can themselves be embedded in (say) apps and deployed at low cost.
While it is unwise to read too much into things, Apple’s AI teams do seem to have embraced a more open approach to research in the field. That makes sense for a company allegedly racing to catch up to competitors, of course; it also makes sense in another way, because the company that contributes and maintains code to the open source community puts itself in a strong position for future research in terms of contact with peers and fostering future goodwill.
Collaboration counts
That alone is a small, but remarkable, step for Apple, which has a reputation for prizing secrecy above all else. That secrecy has, we’ve been intermittently told in recent years, been a big problem for Apple’s research teams, who wanted to work in a more collaborative way with others in the cutting-edge industry.
Apple seems to have listened, which is why I think it is now turning what was once a disadvantage into an advantage. In the short-term, the company wants to promote effective innovation in AI while it develops its own solutions under the Apple Intelligence brand.
It might also hope to position itself as a source technology provider powering many open-source projects. Ensuring good technologies are widely available to the open-source community could help prevent other entities owning too much of the core technology.
Coders on the edge of time
The release of the small-size model also reflects Apple’s core approach toward edge-based AI solutions, supplemented by its own secure server-based AI services and also third-party offers, such as from OpenAI or in future Google Gemini.
The Apple release is just the latest in a string of such releases to have emerged since the company intensified its focus on AI research. The company has now published dozens of models, including most recently its OpenELM and CoreML models. The latter models are optimized to run generative AI (genAI) and machine learning applications on device.
While nothing has been stated to this effect, the cards Apple is showing indicate it is working more closely with researchers outside the company. And it’s investing on the development of edge AI — ironically, the direction the industry will inevitably head toward as the real-life problems of power and water consumption, copyright, and privacy present existential challenges to the future evolution of the server-led AI space.
Please follow me on Mastodon, or join me in the AppleHolic’s bar & grill and Apple Discussions groups on MeWe. Read More