Takeaways

We unveil LFM-7B, the best-performing model in its size class on the market.

LFM-7B uses a non-transformer, Liquid Foundation Model architecture, with high throughput and the lowest memory footprint. 

LFM-7B is the natural choice of a language model for local deployment, latency-bound, and cost-constrained tasks.

LFM-7B is the world’s best-in-class multilingual language model in English, Arabic, and Japanese.

Try LFM-7B today on Liquid Playground, and soon on Openrouter, Perplexity Playground, Lambda API, and AWS marketplace.

LFM-7B comes with inference and customization stacks for enterprises. Get in touch with us to learn more.

Try Liquid

Chat Capabilities

LFM-7B is specifically optimized for response quality, accuracy, and usefulness. To assess its chat capabilities, we leverage a diverse frontier LLM jury to compare responses generated by LFM-7B against other models in the 7B-8B parameter category. It allows us to reduce individual biases and produce more reliable comparisons.

We compared answers to English prompts that include curated business use cases such as following instructions, questions from Arena-Hard-Auto (Li et al.), and real-world conversations (Zheng et al.). Thanks to our comprehensive preference alignment process, LFM-7B outperforms every LLM in the same size category.

Chat CapabilitiesChat CapabilitiesChat Capabilities
Fig. 1. LLM-as-a-jury evaluation of chat capabilities in English.

The following head-to-head evaluation shows the proportion of times the LLM jury preferred answers generated by LFM-7B over those from other models. It contains the same exact English prompts.

LFM-7BLFM-7BLFM-7B
Fig. 2. Head-to-head evaluation of chat capabilities in English.

Automated Benchmarks

LFM-7B maintains the core capabilities of expansive knowledge and reasoning similar to our other models. In addition to enhanced conversational skills, it also showcases improved coding and instruction-following abilities.

LFM-7BLFM-7BLFM-7B
Fig. 3. Average score across thirteen automated benchmarks (MMLU, HellaSwag, ARC-C, IFEval, MMLU-Pro, MATH Lvl 5, GPQA, MuSR, HumanEval, HumanEval+, MBPP, MBPP+).

The following scores were obtained on standard automated benchmarks, using Eleuther AI’s Language Model Evaluation Harness v0.4.5. We only compare post-trained models.

Benchmark
LFM-7B
Liquid AI
7.7B
Ministral
(Mistral AI)
8.0B
Llama 3.1
(Meta)
8.0B
Command R7B
(Cohere)
8.0B
Qwen 2.5
(Alibaba)
7.6B
OLMo 2
(AI2)
7.3B
Context length
(tokens)
32k
128k
128k
128k
128k
4k
MMLU
(5-shot)
69.34
64.66
67.92
70.44
74.31
62.18
HellaSwag
(10-shot)
83.07
80.58
80.00
80.53
81.37
85.77
ARC-C
(25-shot)
70.56
61.77
60.58
66.55
67.24
68.09
TruthfulQA
(0-shot)
63.89
48.65
54.02
55.38
64.76
54.50
IFEval
(0-shot)
60.72
29.17
50.7
34.56
63.71
59.26
MMLU-PRO
(5-shot)
42.42
35.04
37.72
36.55
44.65
29.66
MATH Lvl 5
(4-shot)
21.42
13.62
11.77
19.07
23.77
9.82
GPQA
(0-shot)
32.29
31.01
33.26
29.55
32.45
28.53
MuSR
(0-shot)
40.79
42.75
39.72
43.33
42.9
39.44
HumanEval
(pass@1)
63.41
25.61
64.02
55.49
26.83
41.46
HumanEval+
(pass@1)
56.71
24.39
59.15
48.78
23.17
37.8
MBPP
(pass@1)
51.60
31.60
52.20
51.20
50.80
26.0
MBPP+
(pass@1)
55.56
45.24
57.41
61.64
52.91
36.51
Table 1. Performance of LLMs across automated benchmarks.

Multilingual Capabilities

LFM-7B supports English, Spanish, French, German, Chinese, Arabic, Japanese, and Korean. While evaluating our models, we observed that automated benchmarks like MMMLU add confounding factors (e.g., world knowledge) and do not require any writing skills in the target language. On the other hand, arena evaluations specifically focus on producing grammatically correct and relevant answers. This is why we built language-specific arenas in Arabic and Japanese to assess the quality of models in a fair and relevant manner.

For the Arabic arena, we use a curated subset of real-world conversations (Zheng et al.) in Arabic. LFM-7B is fluent in Arabic and significantly preferred over other models in the same size category.

Multilingual CapabilitiesMultilingual CapabilitiesMultilingual Capabilities
Fig. 4. LLM-as-a-jury evaluation of chat capabilities in Arabic.

For the Japanese arena, we use a combination of ELYZA-tasks-100 (Sasaki et al.) and real-world prompts curated by our partner ITOCHU-CTC. This creates a diverse set of prompts representative of business use cases. LFM-7B also leads our Japanese arena by a significant margin.

Multilingual CapabilitiesMultilingual CapabilitiesMultilingual Capabilities
Fig. 5. LLM-as-a-jury evaluation of chat capabilities in Japanese.

Memory Efficiency

Like our previous models, LFM-7B has a minimal memory footprint compared to other architectures.

Low Memory FootprintLow Memory FootprintLow Memory Footprint
Fig. 6. Memory requirements for language model inference for different models as a function of combined input and generation sequence length. All models use bfloat16 precision without quantization. LFM-7B offers significant memory savings over other models. Memory usage can be reduced further through quantization techniques.

The memory efficiency of LFM-7B allows for several key features, including long-context understanding, energy-efficient inference, and high-throughput deployments on local devices. LFM-7B can also be efficiently customized to any knowledge or task using our on-premise fine-tuning stack. Consequently, LFM-7B significantly increases value for end users in applications such as private enterprise chat, secure code generation, fast instruction following, long document analysis, energy-efficient on-device AI assistants, and multi-step agentic workflows.

In addition to having the ability to process long input contexts efficiently, LFM-7B can retrieve from and reason over long contexts effectively. We validated this across all stages of development via our specialized Liquid internal long-context evals. In addition, we also evaluate the long-context ability of LFM-7B via two public long-context evals: RULER (Hsieh et al.) and LongBench v2 (Bai et al.). With RULER, a length is considered “effective” when its corresponding score is higher than 85.6. This shows that LFM-7B has an effective context length of 32k.

Model
LongBench v2
Claimed length
Effective length
RULER 4k
RULER 8k
RULER 16K
RULER 32k
RULER 64k
Ministral
(Mistal AI) 8.0B
26.1
128k
32k
96.0
93.5
90.6
86.4
37.0
Llama 3.1
(Meta) 8.0B
35.0
128k
32k
95.5
93.8
91.6
87.4
84.7
Qwen 2.5
(Alibaba) 7.6B
36.1
128k
32k
95.3
93.0
92.2
90.2
74.5
LFM-7B
(Liquid AI) 7.7B
36.1
32k
32k
91.3
89.2
87.7
88.5
-
Table 2. Long-context performance measured by LongBench v2 and RULER.

Partner With Liquid

To chat with LFMs go to Playground.liquid.ai

Coming soon:

  • For testing our models via API get in touch with us or try them on Lambda API.
  • To build with our models via API, go to OpenRouter.
  • For enterprise usage via API, go to AWS Marketplace.
  • If you like our model and want to license or purchase it for on-device or on-prem applications, contact us.

Sales

If your enterprise has use cases that need the efficient and high-throughput performance of our LFMs in order to do more with less, get in touch with us to discuss licensing or purchasing our models.

Talent

If our mission aligns with your personal goals and ambitions, we invite you to join our team and drive this vision forward. We are very early on this journey and actively innovating across various aspects of foundation model development and deployment.

Feedback

We invite enthusiastic users to share their experience as well as criticism, and join our red-teaming efforts to continuously refine the capabilities of our models – send your feedback here.

FAQ

As an enterprise, can we purchase full local access to LFMs?
Can we fine-tune LFMs?
What Languages does LFM-7B support?
Where can I learn more about Liquid foundation models?

Share:

Manage your preferences

We use cookies to enhance your browsing experience and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.

Learn more
  • Essential cookies required