Announcing the first series of Liquid Foundation Models (LFMs) – a new generation of generative AI models that achieve state-of-the-art performance at every scale, while maintaining a smaller memory footprint and more efficient inference.
We announce the first series of Liquid Foundation Models (LFMs), a new generation of generative AI models built from first principles.
Our 1B, 3B, and 40B LFMs achieve state-of-the-art performance in terms of quality at each scale, while maintaining a smaller memory footprint and more efficient inference.
Try LFMs today on Liquid Playground, Lambda (Chat UI and API), Perplexity Labs, and soon on Cerebras Inference. The LFM stack is being optimized for NVIDIA, AMD, Qualcomm, Cerebras, and Apple hardware.
We build private, edge, and on-premise AI solutions for enterprises of any size.
We are scaling LFMs and expect to introduce new and better capabilities across various industries, such as financial services, biotechnology, and consumer electronics.
At Liquid AI, we build new methods for designing powerful AI systems over which we have significant control. We design them the same way engineers built engines, cars, and airplanes: from first principles. Our mission is to create best-in-class, intelligent, and efficient systems at every scale – systems designed to process large amounts of sequential multimodal data, to enable advanced reasoning, and to achieve reliable decision-making.
Today, we introduce the first generation of Liquid Foundation Models (LFMs). LFMs are large neural networks built with computational units deeply rooted in the theory of dynamical systems, signal processing, and numerical linear algebra. This unique blend allows us to leverage decades of theoretical advances in these fields in our quest to enable intelligence at every scale. LFMs are general-purpose AI models that can be used to model any kind of sequential data, including video, audio, text, time series, and signals.
Our name “Liquid” pays homage to our roots in dynamic and adaptive learning systems.
We are proud to release our first series of language models:
A dense 1.3B model, ideal for highly resource-constrained environments.
A dense 3.1B model, optimized for edge deployment.
A 40.3B Mixture of Experts (MoE) model, designed for tackling more complex tasks.
Architecture work cannot happen in a vacuum – our goal is to develop useful models that are competitive with the current best-in-class LLMs. In doing so, we hope to show that model performance isn’t just about scale – it’s also about innovation.
We report the results of our fine-tuned LFMs and compare them with similar-sized language models using Eleuther AI’s lm-evaluation-harness v0.4. Unless specified otherwise, we compare to other fine-tuned models.
LFM-1B achieves the highest scores across various benchmarks in the 1B category, making it the new state-of-the-art model at this size. This is the first time a non-GPT architecture significantly outperforms transformer-based models.
LFM-3B delivers incredible performance for its size. It positions itself as first place among 3B parameter transformers, hybrids, and RNN models, but also outperforms the previous generation of 7B and 13B models. It is also on par with Phi-3.5-mini on multiple benchmarks, while being 18.4% smaller. LFM-3B is the ideal choice for mobile and other edge text-based applications.
LFM-40B offers a new balance between model size and output quality. It leverages 12B activated parameters at use. Its performance is comparable to models larger than itself, while its MoE architecture enables higher throughput and deployment on more cost-effective hardware.
LFMs have a reduced memory footprint compared to transformer architectures. This is particularly true for long inputs, where the KV cache in transformer-based LLMs grows linearly with sequence length. By efficiently compressing inputs, LFMs can process longer sequences on the same hardware. For example, compared to other 3B-class models, LFMs maintain a minimal memory footprint.
In this preview release, we have optimized our models to deliver a best-in-class 32k token context length, pushing the boundaries of efficiency for our size. This was confirmed by the RULER benchmark, where a length is considered “effective” when its corresponding score is higher than 85.6 [Hsieh et al. 2024 - RULER]. The following table compares several models at different context lengths.
This highly efficient context window enables long-context tasks on edge devices for the first time. For developers, it unlocks new applications, including document analysis and summarization, more meaningful interactions with context-aware chatbots, and improved Retrieval-Augmented Generation (RAG) performance.
Our goal is to keep scaling LFMs across model size, train/test time compute, and context length. Beyond our language LFMs, we have designed models for various data modalities, domains, and applications that we plan to release in the next months.
To achieve these results, we optimized our pre- and post-training pipelines and infrastructure to ensure our models excel across five criteria:
Breadth and depth of information across various domains and tasks at any given size. We achieve this using a comprehensive pre-training set, advances in model architectures, new pre-training/mid-training/post-training strategies. This allows LFMs to be competitive with larger models on knowledge-based tasks.
The ability to break down a problem and apply logical and rigorous thinking. We distilled system 2 tasks during the core phases of training, enabling robust analytical capabilities in compact model architectures.
A model's maximum input size is not the same as its effective context length. We specifically trained LFMs to maximize recall performance and in-context learning capabilities across the entire input range.
Memory usage of transformer-based models explodes for long inputs, which makes them ill-suited for edge deployment. LFMs have near-constant inference time and memory complexity – as the input context length grows, it does not significantly affect generation speed or increase the amount of memory required.
Training GPT-like foundation models demands significant computational resources. LFMs are efficient for training on long-context data.
Building on a long line of research in designing expressive and efficient learning systems, we have developed a new design space for foundation models, focusing on different modalities and hardware requirements. Our goal is to explore ways to build foundation models beyond Generative Pre-trained Transformers (GPTs).
With LFMs, we put into practice new principles and methods guiding model design, developed by our team over the past months.
Our models are derived from a set of computational units – the building blocks of an architecture – belonging to a new design space. Liquid systems and their composition maximize knowledge capacity and reasoning, while unlocking improved training efficiency, reduced memory cost at inference time, and increased performance in modeling data such as video, audio, text, time series, and signals.
The design of our models reciprocally informs our scaling, inference, alignment, and model analysis strategy. We can analyze the dynamics of LFMs via classical signal processing analysis methods and probe their behavior, from model outputs to model internals.
We can automatically optimize architectures for a specific platform (e.g., Apple, Qualcomm, Cerebras, and AMD) or match given parameter requirements and inference cache size.
Liquid’s design space is primarily defined by featurization and footprint of architectures and their core operators. Featurization refers to the process of converting input data (e.g., text, audio, images, video) into a structured set of features or vectors that are used to modulate computation inside the model in an adaptive manner. For example, audio and time series data generally requires less featurization in operators due to lower information density, compared to language and multi-modal data. The other key dimension is the computational complexity of the operators. Being able to traverse and complete the design space of structured adaptive operators allows us maximize performance with controlled computational requirements.
At their core, LFMs are built with computational units that can be expressed as adaptive linear operators whose actions are determined by inputs. The LFM design framework unifies and subsumes a wide range of existing computational units in deep learning, providing a systematic approach to exploring the space of architectures. Specifically, our analysis informs model building by improving three key aspects: token-mixing structure (how the operator mixes embeddings in the input sequence), channel-mixing structure (how it mixes channel dimensions), and featurization, responsible for modulating computation based on the input context.
As we are still in the early stages of this journey, we welcome the opportunity to collaborate and discover the strengths and weaknesses of these systems together.
What are Language LFMs good at today:
What are Language LFMs not good at today:
At Liquid AI, we take an open-science approach. We have and will continue to contribute to the advancement of the AI field by openly publishing our findings and methods through scientific and technical reports. As part of this commitment, we will release relevant data and models produced by our research efforts to the wider AI community. We have dedicated a lot of time and resources to developing these architectures, so we're not open-sourcing our models at the moment. This allows us to continue building on our progress and maintain our edge in the competitive AI landscape.
If your enterprise is looking to experience the forefront of AI, we invite you to get in touch with us. If this aligns with your personal goals and ambitions, we invite you to join our team and drive this vision forward. We are very early on this journey and actively innovating across various aspects of foundation model development and deployment. We invite enthusiastic users to share their experience as well as criticism, and join our red-teaming efforts to improve the capabilities of our models.