Discover Google CALM: 3x Faster Model Performance Speed

In a recent blog post, Search Engine Journal discusses Google CALM, a new language model technology known as Confident Adaptive Language Modeling (CALM). According to the article, researchers at Google have found a way to increase language model performance speed by up to three times with minimal modifications to the LLM.

To make an analogy, the solution is somewhat like the difference between answering an easy question and solving a more difficult one. An easy question, like "What color is the sky?", can be answered with little thought. A more difficult question, on the other hand, might require more analysis and consideration before reaching a conclusion.

The CALM technology works by focusing on the easier questions first and then gradually increasing the difficulty level as the model becomes more confident in its abilities. This allows the model to learn faster and perform more efficiently.

The research paper on CALM states the problem and solution like this:

“Recent advances in Transformer-based large language models (LLMs) have led to significant performance improvements across many tasks. These gains come with a drastic increase in the models’ size, potentially leading to slow and costly use at inference time. In practice, however, the series of generations made by LLMs is composed of varying levels of difficulty. While certain predictions truly benefit from the models’ full capacity, other continuations are more trivial and can be solved with reduced compute. …While large models do better in general, the same amount of computation may not be required for every input to achieve similar performance (e.g., depending on if the input is easy or hard).”

