Small is big: Meta bets on AI models for mobile devices

Facebook-parent Meta has been working on developing a new small language model (SLM) compatible with mobile devices with the aim of running on-device applications while mitigating energy consumption during model inferencing tasks, a paper published by company researchers showed.  

To set the context, large language models (LLMs) have a lot more parameters. For instance, Mistral-22B has 22 billion parameters while GPT-4 has 1.76 trillion parameters. In contrast, smaller language models have relatively fewer parameters, such as Microsoft’s Phi-3 family of SLMs, which have different versions starting from 3.8 billion parameters.  

A parameter helps an LLM decide between different answers it can provide to queries — the more the number of parameters, the more the need for a larger computing infrastructure.

However, Meta researchers believe that effective SLMs with less than a billion parameters can be developed and it would unlock the adoption of generative AI across use cases involving mobile devices, which have relatively less compute infrastructure than a server or a rack.

The researchers, according to the paper, ran experiments with models, architected differently, having 125 million and 350 million parameters, and found that smaller models prioritizing depth over width enhance model performance.

“Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our investigation underscores the significance of model architecture for sub-billion scale LLMs,” the researchers wrote.

“Leveraging deep and thin architectures, coupled with embedding sharing and grouped-query attention mechanisms, we establish a strong baseline network denoted as MobileLLM, which attains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M state-of-the-art models,” they added.

The 125 and 350 million models, dubbed MobileLLM, according to the researchers, were as effective as large language models, such as Llama 2, in handling chat and several API calling tasks, highlighting the capability of small models for common on-device use cases. While MobileLLM is not available across any of Meta’s products for public use, the researchers have made the code and data for the experiment available along with the paper.

More Meta news:

Meta’s privacy policy lets it use your posts to train its AI

Meta signals the end of the road for Workplace

Meta opens its mixed-reality Horizon OS to other headset makers

​Facebook-parent Meta has been working on developing a new small language model (SLM) compatible with mobile devices with the aim of running on-device applications while mitigating energy consumption during model inferencing tasks, a paper published by company researchers showed.  

To set the context, large language models (LLMs) have a lot more parameters. For instance, Mistral-22B has 22 billion parameters while GPT-4 has 1.76 trillion parameters. In contrast, smaller language models have relatively fewer parameters, such as Microsoft’s Phi-3 family of SLMs, which have different versions starting from 3.8 billion parameters.  

A parameter helps an LLM decide between different answers it can provide to queries — the more the number of parameters, the more the need for a larger computing infrastructure.

However, Meta researchers believe that effective SLMs with less than a billion parameters can be developed and it would unlock the adoption of generative AI across use cases involving mobile devices, which have relatively less compute infrastructure than a server or a rack.

The researchers, according to the paper, ran experiments with models, architected differently, having 125 million and 350 million parameters, and found that smaller models prioritizing depth over width enhance model performance.

“Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our investigation underscores the significance of model architecture for sub-billion scale LLMs,” the researchers wrote.

“Leveraging deep and thin architectures, coupled with embedding sharing and grouped-query attention mechanisms, we establish a strong baseline network denoted as MobileLLM, which attains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M state-of-the-art models,” they added.

The 125 and 350 million models, dubbed MobileLLM, according to the researchers, were as effective as large language models, such as Llama 2, in handling chat and several API calling tasks, highlighting the capability of small models for common on-device use cases. While MobileLLM is not available across any of Meta’s products for public use, the researchers have made the code and data for the experiment available along with the paper.

More Meta news:

Meta’s privacy policy lets it use your posts to train its AI

Meta signals the end of the road for Workplace

Meta opens its mixed-reality Horizon OS to other headset makers Read More