Liquid Foundation Models: Our First Series of Generative AI Models

Announcing the first series of Liquid Foundation Models (LFMs) – a new generation of generative AI models that achieve state-of-the-art performance at every scale, while maintaining a smaller memory footprint and more efficient inference.

Try Liquid
Liquid Foundation ModelsLiquid Foundation Models
Fig. 1. LFMs offer a new best performance/size tradeoff in the 1B, 3B, and 12B (active parameters) categories.

Takeaways

We announce the first series of Liquid Foundation Models (LFMs), a new generation of generative AI models built from first principles.

Our 1B, 3B, and 40B LFMs achieve state-of-the-art performance in terms of quality at each scale, while maintaining a smaller memory footprint and more efficient inference.

Try LFMs today on Liquid Playground, Lambda (Chat UI and API), Perplexity Labs, and soon on Cerebras Inference. The LFM stack is being optimized for NVIDIA, AMD, Qualcomm, Cerebras, and Apple hardware.

We build private, edge, and on-premise AI solutions for enterprises of any size.

We are scaling LFMs and expect to introduce new and better capabilities across various industries, such as financial services, biotechnology, and consumer electronics.

Try Liquid

At Liquid AI, we build new methods for designing powerful AI systems over which we have significant control. We design them the same way engineers built engines, cars, and airplanes:  from first principles. Our mission is to create best-in-class, intelligent, and efficient systems at every scale – systems designed to process large amounts of sequential multimodal data, to enable advanced reasoning, and to achieve reliable decision-making.

Today, we introduce the first generation of Liquid Foundation Models (LFMs). LFMs are large neural networks built with computational units deeply rooted in the theory of dynamical systems, signal processing, and numerical linear algebra. This unique blend allows us to leverage decades of theoretical advances in these fields in our quest to enable intelligence at every scale. LFMs are general-purpose AI models that can be used to model any kind of sequential data, including video, audio, text, time series, and signals. 

Our name “Liquid” pays homage to our roots in dynamic and adaptive learning systems. 

Introducing the First Generation of Language LFMs

We are proud to release our first series of language models:

A dense 1.3B model, ideal for highly resource-constrained environments.

A dense 3.1B model, optimized for edge deployment.

A 40.3B Mixture of Experts (MoE) model, designed for tackling more complex tasks.

Architecture work cannot happen in a vacuum – our goal is to develop useful models that are competitive with the current best-in-class LLMs. In doing so, we hope to show that model performance isn’t just about scale – it’s also about innovation.

State-of-the-Art Performance

We report the results of our fine-tuned LFMs and compare them with similar-sized language models using Eleuther AI’s lm-evaluation-harness v0.4. Unless specified otherwise, we compare to other fine-tuned models.

LFM-1B achieves the highest scores across various benchmarks in the 1B category, making it the new state-of-the-art model at this size. This is the first time a non-GPT architecture significantly outperforms transformer-based models.

Benchmark
LFM-1B
Preview
1.3B
OpenELM
(Apple)
1.1B
Llama 3.2
(Meta)
1.2B
Phi 1.5
(Microsoft)
1.4B
Stable LM 2
(Stability)
1.6B
RWKV 6
(RWKV)
1.6B
Smol LM
(Hugging Face)
1.7B
Danube 2
(H2O)
1.8B
Rene
(Cartesia)
Base 1.3B
R Gemma 2
(Google)
Base 2.7B
Context length (tokens)
32k
1k
128k
2k
4k
1k
2k
8k
-
256k
MMLU
(5 shot)
58.55
25.65
45.46
42.26
41.06
26.02
28.46
37.63
32.61
34.38
MMLU-Pro
(5 shot)
30.65
11.19
19.41
16.80
16.73
11.61
10.94
14.00
12.27
11.78
Hellaswag
(10-shot)
67.28
71.8
59.72
64.03
69.33
61.46
62.52
73.99
69.93
72.24
ARC-C
(25-shot)
54.95
41.64
41.3
53.75
44.11
36.95
45.48
43.77
38.91
46.76
GSM8K
(5-shot)
55.34
0.38
33.36
31.61
41.55
5.76
0.38
31.92
2.58
17.74

LFM-3B delivers incredible performance for its size. It positions itself as first place among 3B parameter transformers, hybrids, and RNN models, but also outperforms the previous generation of 7B and 13B models. It is also on par with Phi-3.5-mini on multiple benchmarks, while being 18.4% smaller. LFM-3B is the ideal choice for mobile and other edge text-based applications.

Benchmark
LFM-3B
Preview
3.1B
Gemma 2
(Google)
2.6B
Zamba 2
(Zyphra)
2.7B
AFM Edge
(Apple)
3B
Llama 3.2
(Meta)
3.2B
Phi-3.5
(Microsoft)
3.8B
Mistral-7b v0.3
(Mistral AI)
7B
Llama 3.1
(Meta)
8B
Mistral Nemo
(Mistral AI)
12.2B
Context length (tokens)
32k
8k
-
32k
128k
128k
4k
128k
128k
MMLU
(5 shot)
66.16
56.96
56*
60.64*
59.65
68.91
62.04
67.92
68.47
MMLU-Pro
(5 shot)
38.41
27.32
-
-
30.07
38.31
30.35
37.72
35.56
Hellaswag
(10-shot)
78.48
71.31
76*
55.24*
73.36
78.84
84.62
80.00
84.31
ARC-C
(25-shot)
63.99
57.94
56*
45.39*
52.65
64.51
64.16
60.58
65.70
GSM8K
(5-shot)
70.28
44.28
-
-
64.9
79.15
49.05
75.44
73.54
*Scores reported by the developers. All the other scores were calculated with the same evaluation harness we used for our own models.

LFM-40B offers a new balance between model size and output quality. It leverages 12B activated parameters at use. Its performance is comparable to models larger than itself, while its MoE architecture enables higher throughput and deployment on more cost-effective hardware.

Benchmark
LFM-40
Preview
40B A12B
Jamba 1.5
(AI21)
52B A12B
Mixtral
(Mistral)
47B A13B
Qwen 2
(Alibaba)
57B A14B
Gemma 2
(Google)
27B
Yi 1.5
(01.AI)
34B
AFM Server
(Apple)
Llama 3.1
(Meta)
70B
Context length (tokens)
32k
256k
8k
32k
128k
32k
32k
128k
MMLU
(5 shot)
78.76
59.57
73.42
75.75
76.20
76.19
75.3*
82.25
MMLU-Pro
(5 shot)
55.63
28.69
38.12
47.47
45.69
45.13
-
52.89
Hellaswag
(10-shot)
82.07
77.16
87.54
85.96
85.79
85.37
86.9*
86.40
ARC-C
(25-shot)
67.24
60.90
71.33
66.89
74.83
69.11
69.7*
70.39
GSM8K
(5-shot)
76.04
46.47
64.22
77.79
84.53
79.68
72.4*
88.10
*Scores reported by the developers. All the other scores were calculated with the same evaluation harness we used for our own models.

LFMs are Memory-Efficient

LFMs have a reduced memory footprint compared to transformer architectures. This is particularly true for long inputs, where the KV cache in transformer-based LLMs grows linearly with sequence length. By efficiently compressing inputs, LFMs can process longer sequences on the same hardware. For example, compared to other 3B-class models, LFMs maintain a minimal memory footprint.

Fig. 2. Total inference memory footprint of different language models vs. the input+generation length.Fig. 2. Total inference memory footprint of different language models vs. the input+generation length.
Fig. 2. Total inference memory footprint of different language models vs. the input+generation length.

LFMs Truly Exploit their Context Length

In this preview release, we have optimized our models to deliver a best-in-class 32k token context length, pushing the boundaries of efficiency for our size. This was confirmed by the RULER benchmark, where a length is considered “effective” when its corresponding score is higher than 85.6 [Hsieh et al. 2024 - RULER]. The following table compares several models at different context lengths.

Model
Claimed length
Effective length
4k
8k
16k
32k
64k
Gemma 2 2B (Google)
8k
4k
88.5
0.60
-
-
-
Llama 3.2 3B (Meta)
128k
4k
88.7
82.4
78.3
74.1
-
Phi-3.5 3.8 B
(Microsoft)
128k
32k
94.3
91.7
90.9
87.3
78.0
Llama 3.1 8B
(Meta)
128k
32k
95.5
93.8
91.6
87.4
84.7
LFM-3B
32k
32k
94.4
93.5
91.8
89.5
-

This highly efficient context window enables long-context tasks on edge devices for the first time. For developers, it unlocks new applications, including document analysis and summarization, more meaningful interactions with context-aware chatbots, and improved Retrieval-Augmented Generation (RAG) performance.

Our goal is to keep scaling LFMs across model size, train/test time compute, and context length. Beyond our language LFMs, we have designed models for various data modalities, domains, and applications that we plan to release in the next months.

Advancing the Pareto Frontier of Large AI Models

To achieve these results, we optimized our pre- and post-training pipelines and infrastructure to ensure our models excel across five criteria:

Knowledge capacity
Multi-step reasoning
Long context recall
Inference efficiency
Training efficiency

Reimagining Model Architectures

Building on a long line of research in designing expressive and efficient learning systems, we have developed a new design space for foundation models, focusing on different modalities and hardware requirements. Our goal is to explore ways to build foundation models beyond Generative Pre-trained Transformers (GPTs).

With LFMs, we put into practice new principles and methods guiding model design, developed by our team over the past months.

LFMs are composed of structured operators.
LFM architectures are under control.
LFMs are adaptive and can serve as the substrate for AI at every scale.
Fig. 4. Our architectures feature custom computational units arranged in depth groups (weight sharing), with additional featurizer interconnections (feature sharing).Fig. 4. Our architectures feature custom computational units arranged in depth groups (weight sharing), with additional featurizer interconnections (feature sharing).
Fig. 4. Our architectures feature custom computational units arranged in depth groups (weight sharing), with additional featurizer interconnections (feature sharing).Fig. 4. Our architectures feature custom computational units arranged in depth groups (weight sharing), with additional featurizer interconnections (feature sharing).
Fig. 3. Our architectures feature custom computational units arranged in depth groups (targeted weight sharing), with additional featurizer interconnections (feature sharing).

Liquid’s design space is primarily defined by featurization and footprint of architectures and their core operators. Featurization refers to the process of converting input data (e.g., text, audio, images, video) into a structured set of features or vectors that are used to modulate computation inside the model in an adaptive manner. For example, audio and time series data generally requires less featurization in operators due to lower information density, compared to language and multi-modal data. The other key dimension is the computational complexity of the operators. Being able to traverse and complete the design space of structured adaptive operators allows us maximize performance with controlled computational requirements.

Fig. 5. We built the foundations of a new design space for computational units, enabling customization to different modalities and hardware requirements.Fig. 5. We built the foundations of a new design space for computational units, enabling customization to different modalities and hardware requirements.
Fig. 4. We built the foundations of a new design space for computational units, enabling customization to different modalities and hardware requirements.

At their core, LFMs are built with computational units that can be expressed as adaptive linear operators whose actions are determined by inputs. The LFM design framework unifies and subsumes a wide range of existing computational units in deep learning, providing a systematic approach to exploring the space of architectures. Specifically, our analysis informs model building by improving three key aspects: token-mixing structure (how the operator mixes embeddings in the input sequence), channel-mixing structure (how it mixes channel dimensions), and featurization, responsible for modulating computation based on the input context.

Join us as an early adopter of LFMs

As we are still in the early stages of this journey, we welcome the opportunity to collaborate and discover the strengths and weaknesses of these systems together.

What are Language LFMs good at today:

  • General and expert knowledge
  • Mathematics and logical reasoning
  • Efficient and effective long-context tasks
  • Their primary language is English, with secondary multilingual capabilities in Spanish, French, German, Chinese, Arabic, Japanese, and Korean

What are Language LFMs not good at today:

  • Zero-shot code tasks
  • Precise numerical calculations
  • Time-sensitive information
  • Counting r's in the word "Strawberry"!
  • Human preference optimization techniques have not been applied extensively to our models yet.

At Liquid AI, we take an open-science approach. We have and will continue to contribute to the advancement of the AI field by openly publishing our findings and methods through scientific and technical reports. As part of this commitment, we will release relevant data and models produced by our research efforts to the wider AI community. We have dedicated a lot of time and resources to developing these architectures, so we're not open-sourcing our models at the moment. This allows us to continue building on our progress and maintain our edge in the competitive AI landscape.

If your enterprise is looking to experience the forefront of AI, we invite you to get in touch with us. If this aligns with your personal goals and ambitions, we invite you to join our team and drive this vision forward. We are very early on this journey and actively innovating across various aspects of foundation model development and deployment. We invite enthusiastic users to share their experience as well as criticism, and join our red-teaming efforts to improve the capabilities of our models.

Share your feedback

Liquid Product Launch Event

October 23, 2024  |  Cambridge, MA 

Come join us at MIT Kresge, Cambridge, MA on October 23rd 2024, to learn more about Liquid as we unveil more products and progress on LFMs and their applications in consumer electronics, finance, healthcare, biotechnology, and more!

RSVP Here

Share:

      Manage your preferences

      We use cookies to enhance your browsing experience and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.

      Learn more
      • Essential cookies required