CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://phillipi.github.io/prh/ access-control-allow-origin: * expires: Mon, 29 Dec 2025 23:20:30 GMT cache-control: max-age=600 x-proxy-cache: MISS x-github-request-id: 3DD0:3946E9:955E39:A7C0B8:69530A66 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 23:10:31 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210033-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767049831.876860,VS0,VE205 vary: Accept-Encoding x-fastly-request-id: aed4225c2b22878562a9618c642e97bbee2d0a1e content-length: 162 HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Sun, 25 May 2025 03:24:18 GMT access-control-allow-origin: * etag: W/"68328d62-6a69" expires: Mon, 29 Dec 2025 23:20:31 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 8DDE:2118F1:9523C1:A7865C:69530A65 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 23:10:31 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210033-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767049831.095190,VS0,VE201 vary: Accept-Encoding x-fastly-request-id: a38dd5adce9860a5a313694c83449818159ad1f9 content-length: 8165 The Platonic Representation Hypothesis

The Platonic Representation Hypothesis

Minyoung Huh*

Brian Cheung*

Tongzhou Wang*

Phillip Isola*

MIT

Position Paper in ICML 2024

Paper

Code

Outline

Our hypothesis

How to measure convergence?

Evidence of convergence

What is driving convergence?

What are we converging to?

The world (Z) can be viewed in many different ways: in images (X), in text (Y), etc. We conjecture that representations learned on each modality on its own will converge to similar representations of Z.

Conventionally, different AI systems represent the world in different ways. A vision system might represent shapes and colors, a language model might focus on syntax and semantics. However, in recent years, the architectures and objectives for modeling images and text, and many other signals, are becoming remarkably alike. Are the internal representations in these systems also converging?

We argue that they are, and put forth the following hypothesis:

Neural networks, trained with different objectives on different data and modalities, are converging to a shared statistical model of reality in their representation spaces.

The intuition behind our hypothesis is that all the data we consume -- images, text, sounds, etc -- are projections of some underlying reality. A concept like

"apple"

🍎

can be viewed in many different ways but the meaning, what is represented, is roughly^* the same. Representation learning algorithms might recover this shared meaning.

* Not exactly the same. The text "apple" does not tell whether the fruit is red or green, but an image can. Sufficiently descriptive text is necessary. See the limitations section of our paper for discussion of this point.

How to measure if representations are converging?

We characterize representations in terms of their kernels, i.e. how they measure distance/similarity between inputs. Two representations are considered the same if their kernels are the same for corresponding inputs. We then say the representations are aligned. For example, if a text encoder

f_{text}

is aligned with an image encoder

f_{img}

, then we would have relationships like:

sim(f_{text} ("apple"), f_{text} ("orange"))   ≈ sim(f_{img} (🍎), f_{img} (🍊))

\[ \text{sim}(f_{\text{text}}(\text{“apple"}), f_{\text{text}}(\text{“orange"})) \quad\approx\quad \text{sim}(f_{\text{img}}(\text{🍎}), f_{\text{img}}(\text{🍊})) \]

Kernel alignment metrics quantify the degree to which statements like the above are true, and we use these metrics to analyze if representations in different models are converging. Check out our code for implementations of such metrics, including several new ones we introduce.

Evidence of convergence

We survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Then, we demonstrate convergence across data modalities: as vision models and language models get larger, they measure distance between datapoints in a more and more alike way:

As LLMs get better at language modeling, they learn representations that are more and more aligned with vision models (and conversely, bigger vision models are also better aligned with LLM embeddings). Plotted using voronoi.

What is driving convergence?

We argue that task and data pressures, combined with increasing model capacity, can lead to convergence. One such pressure is visualized below: As we train models on more tasks, there are fewer representations that can satisfy our demands. As models become more general-purpose, they become more alike:

The more tasks we must solve, the fewer functions satisfy them all. Cao & Yamins term this the "Contravariance principle."

What representation are we converging to?

In a particular idealized world, we show that a certain family of learners will converge to a representation whose kernel is equal to the pointwise mutual information (PMI) function over the underlying events (Z) that cause our observations, regardless of modality. For example, in a world of colors, where events

z_{red}

and

z_{orange}

generate visual and textual observations, we would have:

sim(f ("red" 🟥), f ("orange" 🟧))   = PMI(z_{red}, z_{orange}) + const

\[ \text{sim}(f_{\text{text}}(\text{red"}), f_{\text{text}}(\text{“orange"})) \quad=\quad \text{PMI}(z_{\text{red}}, z_{\text{orange}}) + \text{const} \] \[ \text{sim}(f(\color{red}{\blacksquare}\color{black}), f(\color{orange}{\blacksquare}\color{black})) \quad=\quad \text{PMI}(z_{\text{red}}, z_{\text{orange}}) + \text{const} \]

This analysis makes various assumptions and should be read as a starting point for a fuller theory. Nonetheless, empirically, we do find that PMI over pixel colors recovers a similar kernel to human perception of colors, and this is also similar to the kernel that LLMs recover:

This analysis suggests that certain representation learning algorithms may boil down to a simple rule: find an embedding in which similarity equals pointwise mutual information.

Kernels visualized with multidimensional scaling (i.e. a visualization where nearby points are similar according to the kernel, and far apart points are dissimilar). The language experiment here is a replication of Abdou et al. 2021.

Implications and limitations

The final sections of our paper discuss implications and limitations of the hypothesis. Perhaps the primary implication is this: if there is indeed a platonic representation, then finding it, and fully characterizing it, is a research program worth pursuing.

However, like any good hypothesis, there are also numerous counterarguments one can make: what about the knowledge that is unique to each model and modality? What about specialist systems, that don't require general-purpose world representations? We hope this work sparks vigorous debate.

Other works that have made similar arguments:

[1] Allegory of the Cave, Plato, c. 375 BC

[2] Three Kinds of Scientific Realism, Putnam, The Philosophical Quarterly, 1982

[3] Contrastive Learning Inverts the Data Generating Process, Zimmermann, Sharma, Schneider, Bethge, Brendel, ICML 2021

[4] Revisiting Model Stitching to Compare Neural Representations, Yamini Bansal, Preetum Nakkiran, Boaz Barak, NeurIPS 2021

[5] Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color , Abdou, Kulmizev, Hershcovich, Frank, Pavlik, Søgaard, CoNLL 2021

[6] Explanatory models in neuroscience: Part 2 -- Constraint-based intelligibility, Cao, Yamins, Cognitive Systems Research, 2024

[7] Robust agents learn causal world models, Jonathan Richens, Tom Everitt, ICLR 2024

Plato imagined an "ideal" reality of which our observations are mere shadows. Putnam and others developed the idea of "convergent realism": scientists, via observation, converge on truth; our position is that deep nets work similarly. Zimmermann et al., Richens and Everitt, and many others have argued that certain representation learners recover statistical models of the latent causes of our observations. Bansal et al. hypothesized an "Anna Karenina scenario," in which all well-performing neural nets are alike. Abdou et al. showed that LLMs learn visual similarities from text alone (an experiment we have replicated). Cao and Yamins argue for a "Contravariance Principle," by which models and minds become aligned when tasked to solve hard problems. This is a curated list of close work. Please see our paper for more.

Accessibility

Original Source | Taken Source