0

arXiv:2601.03424v1 Announce Type: new
Abstract: Behavioral benchmarks tell us \textit{what} a model does, but not \textit{how}. We introduce a training-free mechanistic probe using attention-graph spectra. Treating each layer as a token graph, we compute algebraic connectivity ($\lambda_2$), smoothness, and spectral entropy. Across 12 models and 10 languages, these measures yield stable ``spectral fingerprints'' that expose discontinuities missed by standard evaluation.
We report four results. (1) Models undergoing specific curriculum transitions (e.g., code-to-chat) show an English-only, syntax-triggered connectivity failure on non-canonical constructions, reaching $\Delta\lambda_2 \approx -0.76$. We term this scar \textit{Passive-Triggered Connectivity Collapse} (PTCC). Analysis of the Phi lineage reveals that PTCC appears and resolves across developmental stages, implicating brittle curriculum shifts rather than synthetic data per se. (2) PTCC reflects a specialization trade-off: strengthened formal routing at the expense of stylistic flexibility. (3) We identify four recurrent processing strategies; simple frozen-threshold rules enable perfect forensic identification across lineages. (4) Mechanistically, PTCC localizes to a sparse Layer 2 ``compensatory patch'' of heads that fails under syntactic stress; activation steering can partially restore connectivity, recovering $\approx 38\%$ of lost information flow.
Finally, dominant topological regimes track tokenization density more than language identity, suggesting ``healthy'' geometry varies systematically across scripts. Overall, attention-graph spectra provide a practical tool for auditing and training-regime verification.