LLMs at Their Limit – Time for New Paths

Aug 14

Introduction

For years, Large Language Models (LLMs) like GPT-4/5, Gemini or Claude have grown almost exponentially. Yet, recently it’s become clear: Big leaps in progress have levelled off. We’ve hit a performance plateau—new record-breaking models offer more parameters, but barely deliver more substance (as seen in the recently released GPT-5).

The Problems

To meaningfully advance an LLM, you need first-class and highly diverse training data. But the web has been mined—high-quality, helpful datasets are rare and sometimes legally difficult to access. Synthetic data brings progress in some areas, but can amplify biases or introduce new issues.

Meanwhile, costs are exploding: Training GPT-4 reportedly cost over $100 million; Google Gemini’s training expense is estimated between $30 and $191 million (depending on the source, as there are no official figures), varying by version and method. And there are ongoing costs for infrastructure, energy, and staff.

Technically, the transformer principle underlying most LLMs is largely maxed out. It revolutionized AI since 2017—but is now hitting limits:

LLMs only recognize statistical patterns—they don’t really “understand.”
Hallucinations remain problematic: Models generate convincing but completely false responses.
Context windows keep expanding, but true long-term memory is still missing.
Biases and prejudices from training data persist as a recurring challenge.
Algorithmic and Technical Limitations

The transformer architecture can handle models with millions or even billions of parameters, but fundamental skills like logical reasoning, causal relationships, or genuine “understanding” are lacking. Scaling rarely delivers real advances in many applications.

Advanced methods like “Retrieval Augmented Generation” (RAG) or “Mixture of Experts” (MoE) offer temporary help, but don’t tackle the core issue: LLMs aren’t intelligent agents. They cannot perceive their environment or learn independently—they just statistically replicate what they have seen.

Is Further Investment Justified?

Looking at cost-benefit ratios, it’s questionable whether pouring billions more into even larger LLMs makes sense. Current ambitions in the US to inject over $10 trillion (!) into LLM development seem, in my view, misguided.

Societal and scientific returns are stagnating
Resources for hardware, energy, and data are limited and ecologically concerning

Rather than scaling blindly, it would be wiser to distribute funding more broadly: into fundamental research, alternative AI approaches, and domain-specific models with less bias and more targeted benefits.

AGI/ASI – Can LLMs Achieve This?

The pursuit of Artificial General Intelligence (AGI), or even Artificial Superintelligence (ASI), is the holy grail for many. But LLMs have fundamental limits:

They simulate intelligence, but lack real awareness or contextual understanding.
Humans learn from physical experience—LLMs (still) cannot do this.

Leading models outperform humans in specific tasks (e.g. coding, text generation), but true AGI means solving any problem at least as well as a human—which is not achievable with LLMs.

New Paths: World Models and Convergence as the Key

This is where new approaches like world models (e.g. JEPA2 from Meta) come into play: Instead of controlling language, these systems build an abstract model of the physical world. The model learns to predict connections, plan actions and adapt to unfamiliar environments—much like a person who sees an object for the first time and immediately understands how it might work. Training occurs with video data in a self-supervised way: AI no longer needs everything “labelled”; it observes and generalizes independently.

JEPA2 doesn't aim for pixel-perfect predictions, but focuses on physical and conceptual relationships. That’s the game-changer—instead of grinding more data, context and causality become the learning target.

The next real leap may come from convergence of diverse approaches:

Multimodal models that integrate text, image, audio, sensors and actions
AI agents that link knowledge from multiple sources and independently test new hypotheses
Self-learning systems that are robust even with limited data
Method combinations: World models fused with classical symbolic AI, probabilistic graphs and deep learning—all together yielding more than the sum of their parts

Conclusion

The future of AI is not just about increasing parameters—it requires a paradigm shift. The era of ever-larger LLMs is entering a decisive phase—now we need new avenues, greater diversity and above all, intelligent convergence. Only then can AI take the next big step: from statistical parrot to truly learning machine.

Sources

Michael Boeni