Yann LeCun on What Comes After LLMs

A four-year argument, sharpened by a quiet year for its opponent.

Source · youtube.com/watch?v=ngBraLDqzdI ↗

00:00 1:21:56

§02

Snippets

№01

0:08:12 0:10:48

On the LLM ceiling

A four-year-old has been exposed to vastly more data than the largest language model — through vision, through hands that grasp and drop and try again. A cat has been exposed to more than the four-year-old. Neither the cat nor the child has read a single word. The argument is not that scaling text models does not help. It is that text was a happy accident of where the data lived, not the substrate intelligence is supposed to grow on.

The familiar opening, sharpened. The shift since 2023 is the rhetorical permission. He no longer hedges the claim.
№02

0:23:55 0:27:18

On what is missing

What current systems do not have, in order: persistent memory between sessions, a model of the world they can run forward without producing text, plans that span more than one horizon, and a way to update beliefs from the consequences of their own actions. They predict the next token. They do not simulate the next state. The gap is structural, not a matter of more parameters.

A taxonomy of absences. Useful as a checklist for anyone trying to evaluate the next architecture, theirs or anyone else’s.
№03

0:42:11 0:46:33

On JEPA

Prediction in pixel space is the wrong objective. Prediction in token space is the wrong objective for anything other than text. The right objective is to predict in a learned representation — a latent that throws away the irrelevant detail and keeps the structure. Train hierarchically. Predict at multiple time scales. The architecture has a name. It has had one for four years.

JEPA, again. The line worth keeping is the last one — the patience of a researcher whose horizon is longer than the funding cycle.
№04

1:05:42 1:08:09

On timelines

The honest answer is that autonomous machine intelligence is not next year. It is not this decade in its full form. Several ideas have to land that have not yet landed. We are missing the equivalent of backpropagation for what comes next. We will recognize it when we see it. Anyone telling a different timeline is selling something.

The bravest line in the interview. Said calmly, as if it cost nothing.

§03

Synthesis

LeCun has been making this argument since at least 2022. The slides change, the audience changes, the interlocutor changes; the shape does not. What is different in May 2026 — fifteen months after the field bet the house on agentic post-training, eighteen months into the most expensive frontier-lab race in computing history — is the texture of the argument’s reception.

There is no triumphalism in the interview. He is not crowing. The position he held when it was contrarian he holds now, when the empirical evidence has accumulated and the room is quieter. The vein worth mining here is not LeCun’s content — which is on his Twitter and in his 2022 paper — but the cadence of someone whose long bet is paying off without making him louder.

For a reader trying to assemble a private canon on what comes after LLMs, the question is not whether to add LeCun. It is which earlier interview to keep next to this one. The 2023 talk where he first put numbers to the LLM ceiling. The 2024 Lex Fridman conversation where he sketched V-JEPA. This 2026 conversation, where the proposition is no longer “world models will matter” but “world models are the only thing that will matter.” Each is a layer of the same lode. The curator’s job is to pick which layer.

The fan-out below is the work that begins after the bundle closes. Read the 2022 paper. Read what V-JEPA actually predicts. Read Pearl. Watch what the animals can do that the models cannot.

§04

Fan-out

Questions raised

01 If LLM scaling is hitting a wall, what is the empirical signal — eval saturation, cost-per-unit-of-capability, or something subtler in the loss curves?
02 Are LeCun’s and Karpathy’s positions on world models converging, or do they still disagree about what “model” is supposed to mean?
03 What is the experimental risk on JEPA that has not yet been tried at frontier scale — compute, data, architectural detail, or just patience?

Concepts to learn

01 Joint Embedding Predictive Architecture (JEPA, V-JEPA)
02 Energy-based models for unsupervised representation learning
03 Hierarchical planning in model-based reinforcement learning
04 The cost of pixel-space prediction (generative versus predictive objectives)

References invoked

01 LeCun, “A Path Towards Autonomous Machine Intelligence,” 2022
02 Judea Pearl, on the ladder of causation
03 V-JEPA and DINOv2 (Meta FAIR)
04 The animal-cognition literature — rats, cats, corvids — as benchmark

Mine your own.

Lode is a workbench, not a feed. Paste a YouTube URL. The model proposes a transcript, a set of quote-grounded snippets, a synthesis essay, and the fan-out. You decide what stays. You edit, tag, approve. The bundle that lands in your library is yours.

The work is in the curation, not in the consumption. Most summary tools erase the part of the work that produces understanding. Lode keeps you in the loop on the part that matters, and removes the part that does not.

Open the curator