Gluing Local Contexts into Global Meaning

We decompose transformer activations into content-stable (H⁰) and context-dependent (H¹) subspaces using sheaf cohomology. A cellular sheaf built over paraphrase graphs yields a Laplacian whose spectral structure separates phrasing-invariant directions from maximally varying ones, requiring no concept labels or supervised training. Across five models (124M–13B parameters), H¹ dimensions exert 3.5–26.5× greater causal influence on model output than variance-matched controls (Cohen's d = 2.3–14.3), H⁰ retrieves facts at 60–68% accuracy using only 20 dimensions, and the two subspaces produce opposite effects under ablation.

The decomposition also reveals architecture-dependent fragility: Llama-2-7B collapses under random perturbation (4.2% fact preservation) while all directed methods preserve facts at 12–14% (p < 10⁻¹⁰, n=1000); with architecture-specific restriction maps this gap widens to 31.0% vs. 4.2% (p < 10⁻⁵⁰). Robust models tolerate both perturbation types. Sheaf H⁰ outperforms LEACE concept erasure by nearly 2× on fact retrieval, and persistent homology reveals that topological complexity in transformer activations emerges layer-dependently, with deeper layers of larger models exhibiting more persistent H¹ structure than random baselines.

Gluing Local Contexts into Global Meaning: A Sheaf-Theoretic Decomposition of Transformer Representations

Abstract

BibTeX