Small language models are the new rage, the researchers say
The original version to this story appears in How many magazineS
Large language models work well because they are so big. The most models of Openai, Meta and Deepseek use hundreds of billions of “parameters” – adjustable buttons that determine the data between the data and are fixed during the training process. With more parameters, the models are more capable of identifying models and connections, which in turn makes them help and accurate.
But this power comes at a price. Training a model with hundreds of billions of parameters takes huge computing resources. To train his model Gemini 1.0 Ultra, for example, reports that Google has spent 191 million dollarsS Large language models (LLM) also require considerable computing power every time they meet a request that makes them known energy pigs. Single Request for Chatgpt consumes about 10 times As much as a one -time search on Google, according to the Institute for Electricity Study.
In response, some researchers are now thinking of a small one. IBM, Google, Microsoft and Openai have recently launched small language models (SLMS), which uses several billion parameters – part of their colleagues LLM.
Small models are not used as general-purpose tools like their larger cousins. But they can be distinguished with specific, more defined tasks, such as summarizing conversations, answering patients as a chat for health care and collecting data in smart devices. “For many tasks, the 8 billion model and parameters are actually quite good,” said Ziko ColterA computer scientist at Carnegie Melon University. They can also be performed on a laptop or mobile phone instead of a huge data center. (There is no consensus regarding the exact definition of “small”, but the new models are all a maximum of about 10 billion parameters.)
To optimize the training process for these small models, researchers use several tricks. Large models often scrape off raw learning data from the Internet and this data can be unorganized, messy and difficult to process. But these large models can generate a high quality set of data that can be used to train a small model. The approach, called the distillation of knowledge, receives the greater model for the effective transmission of his studies as a teacher who gives lessons to a student. “The reason (SLM) gets so good with such small models and so small data is that they use high quality data instead of messy things,” Calter said.
Researchers have also studied ways to create small models, starting with large models and trimming them. A method known as trimming involves removing unnecessary or ineffective parts of a neural network– The scattered network of related data points that underlies a large model.
Pruning is inspired by a neural network in real life, the human brain, which acquires efficiency by attracting connections between synapses as an age. Today’s arrangement approaches track to A 1989 document In which computer scientist Jan Lecun, who is now in Meta, claims that up to 90 percent of parameters in a trained neural network can be removed without a victim of efficiency. He called the “optimal brain damage” method. Pruning can help researchers refine a small language model for a particular task or environment.
For researchers who are interested in how language models do the things they do, the smaller models offer a cheap way to test new ideas. And since they have less parameters than large models, their reasoning may be more transparent. “If you want to make a new model, you have to try things,” said LeshemA researcher at the Laboratory of MIT-BM WATSON AI. “Small models allow researchers to experiment with lower bets.”
Large, expensive models, with their ever -increasing parameters, will remain useful for applications such as generalized chatbot, image generators and drug detectionS But for many users, a small, targeted model will work just as well, while being easier for researchers to train and build. “These effective models can save money, time and calculation,” Choshen said.
Original story reprinted with permission from How many magazine., editorial independent publication of Simons Foundation whose mission is to improve the social understanding of science by covering the development of research and trends in mathematics and physical and life sciences.