211
Bridging Training and Merging Through Momentum-Aware Optimization
arXiv:2512.17109v1 Announce Type: new
Abstract: Training large neural networks and merging task-specific models both exploit low-rank structure and require parameter importance estimation, yet these challenges have been pursued in isolation. Current workflows compute curvature information during tr…
Abstract: Training large neural networks and merging task-specific models both exploit low-rank structure and require parameter importance estimation, yet these challenges have been pursued in isolation. Current workflows compute curvature information during tr…