| CARVIEW |
Select Language
HTTP/2 301
server: GitHub.com
content-type: text/html
location: https://phillipi.github.io/prh/
access-control-allow-origin: *
expires: Mon, 29 Dec 2025 23:20:30 GMT
cache-control: max-age=600
x-proxy-cache: MISS
x-github-request-id: 3DD0:3946E9:955E39:A7C0B8:69530A66
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 23:10:31 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210033-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767049831.876860,VS0,VE205
vary: Accept-Encoding
x-fastly-request-id: aed4225c2b22878562a9618c642e97bbee2d0a1e
content-length: 162
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Sun, 25 May 2025 03:24:18 GMT
access-control-allow-origin: *
etag: W/"68328d62-6a69"
expires: Mon, 29 Dec 2025 23:20:31 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 8DDE:2118F1:9523C1:A7865C:69530A65
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 23:10:31 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210033-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767049831.095190,VS0,VE201
vary: Accept-Encoding
x-fastly-request-id: a38dd5adce9860a5a313694c83449818159ad1f9
content-length: 8165
The Platonic Representation Hypothesis
Kernel alignment metrics quantify the degree to which statements like the above are true, and we use these metrics to analyze if representations in different models are converging. Check out our code for implementations of such metrics, including several new ones we introduce.
This analysis makes various assumptions and should be read as a starting point for a fuller theory. Nonetheless, empirically, we do find that PMI over pixel colors recovers a similar kernel to human perception of colors, and this is also similar to the kernel that LLMs recover:
However, like any good hypothesis, there are also numerous counterarguments one can make: what about the knowledge that is unique to each model and modality? What about specialist systems, that don't require general-purpose world representations? We hope this work sparks vigorous debate.
Other works that have made similar arguments:
[1] Allegory of the Cave, Plato, c. 375 BC
[2] Three Kinds of Scientific Realism, Putnam, The Philosophical Quarterly, 1982
[3] Contrastive Learning Inverts the Data Generating Process, Zimmermann, Sharma, Schneider, Bethge, Brendel, ICML 2021
[4] Revisiting Model Stitching to Compare Neural Representations, Yamini Bansal, Preetum Nakkiran, Boaz Barak, NeurIPS 2021
[5] Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color , Abdou, Kulmizev, Hershcovich, Frank, Pavlik, Søgaard, CoNLL 2021
[6] Explanatory models in neuroscience: Part 2 -- Constraint-based intelligibility, Cao, Yamins, Cognitive Systems Research, 2024
[7] Robust agents learn causal world models, Jonathan Richens, Tom Everitt, ICLR 2024
Accessibility
| The Platonic Representation Hypothesis | |||
| Minyoung Huh* | Brian Cheung* | Tongzhou Wang* | Phillip Isola* |
|
MIT
![]() ![]() |
|||
| Position Paper in ICML 2024 | |||
|
Paper
|
Code
|
The world (Z) can be viewed in many different ways: in images (X), in text (Y), etc. We conjecture that representations learned on each
modality on its own will converge to similar representations of Z.
Conventionally, different AI systems represent the world in different ways. A vision system might represent shapes and colors,
a language model might focus on syntax and semantics. However, in recent years, the architectures and objectives
for modeling images and text, and many other signals, are becoming remarkably alike. Are the internal representations in these
systems also converging?
We argue that they are, and put forth the following hypothesis:
The intuition behind our hypothesis is that all the data we consume -- images, text, sounds, etc -- are projections of some underlying reality. A concept like
can be viewed in many
different ways but the meaning, what is represented, is roughly* the same. Representation learning algorithms might recover
this shared meaning.
We argue that they are, and put forth the following hypothesis:
Neural networks, trained with different objectives on different data and modalities,
are converging to a shared statistical model of reality in their representation spaces.
The intuition behind our hypothesis is that all the data we consume -- images, text, sounds, etc -- are projections of some underlying reality. A concept like
"apple"
🍎
* Not exactly the same. The text "apple" does not tell whether the fruit is red or green, but an image can.
Sufficiently descriptive text is necessary. See the limitations section of our paper for discussion of this point.
How to measure if representations are converging?
We characterize representations in terms of their kernels, i.e. how they measure distance/similarity between inputs. Two representations are considered the same if their kernels are the same for corresponding inputs. We then say the representations are aligned. For example, if a text encoder is aligned with an image encoder , then we would have relationships like:
\[
\text{sim}(f_{\text{text}}(\text{“apple"}), f_{\text{text}}(\text{“orange"})) \quad\approx\quad \text{sim}(f_{\text{img}}(\text{🍎}), f_{\text{img}}(\text{🍊}))
\]
Kernel alignment metrics quantify the degree to which statements like the above are true, and we use these metrics to analyze if representations in different models are converging. Check out our code for implementations of such metrics, including several new ones we introduce.
Evidence of convergence
We survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Then, we demonstrate convergence across data modalities: as vision models and language models get larger, they measure distance between datapoints in a more and more alike way:
As LLMs get better at language modeling, they learn representations that are more and more aligned with vision models (and conversely, bigger vision models are also better aligned with LLM embeddings).
Plotted using voronoi.
What is driving convergence?
We argue that task and data pressures, combined with increasing model capacity, can lead to convergence. One such pressure is visualized below: As we train models on more tasks, there are fewer representations that can satisfy our demands. As models become more general-purpose, they become more alike:
The more tasks we must solve, the fewer functions satisfy them all. Cao & Yamins term this the "Contravariance principle."
What representation are we converging to?
In a particular idealized world, we show that a certain family of learners will converge to a representation whose kernel is equal to the pointwise mutual information (PMI) function over the underlying events (Z) that cause our observations, regardless of modality. For example, in a world of colors, where events and generate visual and textual observations, we would have:
\[
\text{sim}(f_{\text{text}}(\text{red"}), f_{\text{text}}(\text{“orange"})) \quad=\quad \text{PMI}(z_{\text{red}}, z_{\text{orange}}) + \text{const}
\]
\[
\text{sim}(f(\color{red}{\blacksquare}\color{black}), f(\color{orange}{\blacksquare}\color{black})) \quad=\quad \text{PMI}(z_{\text{red}}, z_{\text{orange}}) + \text{const}
\]
This analysis makes various assumptions and should be read as a starting point for a fuller theory. Nonetheless, empirically, we do find that PMI over pixel colors recovers a similar kernel to human perception of colors, and this is also similar to the kernel that LLMs recover:
This analysis suggests that certain representation learning algorithms may boil down to a simple rule: find an embedding in which similarity equals pointwise mutual information.
Kernels visualized with multidimensional scaling (i.e. a visualization where nearby points are similar according to the kernel, and far apart points are dissimilar).
The language experiment here is a replication of Abdou et al. 2021.
Implications and limitations
The final sections of our paper discuss implications and limitations of the hypothesis. Perhaps the primary implication is this: if there is indeed a platonic representation, then finding it, and fully characterizing it, is a research program worth pursuing.However, like any good hypothesis, there are also numerous counterarguments one can make: what about the knowledge that is unique to each model and modality? What about specialist systems, that don't require general-purpose world representations? We hope this work sparks vigorous debate.
Other works that have made similar arguments:
[1] Allegory of the Cave, Plato, c. 375 BC
[2] Three Kinds of Scientific Realism, Putnam, The Philosophical Quarterly, 1982
[3] Contrastive Learning Inverts the Data Generating Process, Zimmermann, Sharma, Schneider, Bethge, Brendel, ICML 2021
[4] Revisiting Model Stitching to Compare Neural Representations, Yamini Bansal, Preetum Nakkiran, Boaz Barak, NeurIPS 2021
[5] Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color , Abdou, Kulmizev, Hershcovich, Frank, Pavlik, Søgaard, CoNLL 2021
[6] Explanatory models in neuroscience: Part 2 -- Constraint-based intelligibility, Cao, Yamins, Cognitive Systems Research, 2024
[7] Robust agents learn causal world models, Jonathan Richens, Tom Everitt, ICLR 2024
Plato imagined an "ideal" reality of which our observations are mere shadows. Putnam and others developed the idea of "convergent realism": scientists,
via observation, converge on truth; our position is that deep nets work similarly. Zimmermann et al., Richens and Everitt, and many others have argued
that certain representation learners recover statistical models of the latent causes of our observations.
Bansal et al. hypothesized an "Anna Karenina scenario," in which all well-performing neural nets are alike. Abdou et al. showed that LLMs learn
visual similarities from text alone (an experiment we have replicated). Cao and Yamins argue for a "Contravariance Principle," by which
models and minds become aligned when tasked to solve hard problems. This is a curated list of close work. Please see our paper for more.


