| Thread 1: Singular Learning Theory |
|
|
|
|
| Algebraic Geometry and Statistical Learning Theory |
S. Watanabe |
2009 |
Foundational monograph: RLCT, resolution, free energy asymptotics, WBIC |
Cambridge Univ. Press |
| Mathematical Theory of Bayesian Statistics |
S. Watanabe |
2018 |
Extended monograph: WAIC/WBIC derivations, phase transitions in Bayesian inference |
Routledge |
| Equations of States in Singular Statistical Estimation |
S. Watanabe |
2010 |
Fundamental asymptotic relations linking Bayes generalization error, training error, free energy |
arXiv:0712.0653 |
| A Widely Applicable Bayesian Information Criterion |
S. Watanabe |
2013 |
WBIC: singular-model-aware BIC replacement via tempered posterior average |
arXiv:1208.6338 |
| Recent Advances in AG and Bayesian Statistics |
S. Watanabe |
2022 |
20-year review of SLT: birational methods, renormalized posteriors, universal formula |
arXiv:2211.10049 |
| Review: Stat Mech–ML Equivalence |
S. Watanabe |
2024 |
Algebraic research program; phase transitions; AI alignment via free energy |
arXiv:2406.10234 |
| Deep Learning is Singular, and That’s Good |
Murfet, Wei et al. |
2021 |
Neural nets as singular models; SLT for DL; RLCT experiments |
arXiv:2010.11560 |
| The Local Learning Coefficient |
Lau et al. |
2023 |
Scalable SGLD-based RLCT estimator; detects phase transitions in transformers |
arXiv:2308.12108 |
| Classification of Real Hyperplane Singularities by RLCT |
Lau, Wiesmann |
2024 |
Combinatorial SageMath algorithm for RLCT of hyperplane arrangement polynomials |
arXiv:2411.13392 |
| Thread 2: Expressivity and Algebraic Complexity |
|
|
|
|
| Neuroalgebraic Geometry |
TBD |
2025 |
Expository overview: neuromanifolds, dimension/degree/singularities/fibers/EDD |
arXiv:2501.18915 |
| Geometry of Polynomial Neural Networks |
Kubjas et al. |
2024 |
Neurovariety dimension/degree; learning degree as training complexity |
arXiv:2402.00949 |
| Algebraic Complexity and Neurovariety of Linear Convolutional Networks |
Shahverdi |
2024 |
Neuromanifold of 1-D linear CNNs = semialgebraic set; EDD equals ED degree of Segre variety |
arXiv:2401.16613 |
| Activation Degree Thresholds and Expressiveness |
Finkel et al. |
2024 |
Activation threshold: minimal degree at which neurovariety achieves maximum dimension |
arXiv:2408.04569 |
| On the Expressive Power of Deep Learning: A Tensor Analysis |
Cohen et al. |
2016 |
Deep CNNs ↔︎ hierarchical Tucker decompositions; exponential depth separation via tensor rank |
arXiv:1509.05009 |
| On the Number of Linear Regions of Deep Neural Networks |
Montúfar et al. |
2014 |
Deep ReLU networks carve exponentially more linear regions than shallow; piecewise-linear complexity |
arXiv:1402.1869 |
| Benefits of Depth in Neural Networks |
Telgarsky |
2016 |
Sawtooth functions: depth-\(k^3\) expressible but require \(\Omega(2^k)\) width at depth \(O(k)\) |
arXiv:1602.04485 |
| The Euclidean Distance Degree of an Algebraic Variety |
Draisma et al. |
2016 |
Defines EDD; counts complex critical points of squared distance; bounds real optima |
arXiv:1309.0049 |
| An Introduction to Tropical Geometry |
Maclagan, Sturmfels |
2015 |
Tropical varieties and max-plus algebra; applied to ReLU linearization |
arXiv:1502.05950 |
| Algebraic Complexity Theory |
Bürgisser, Clausen, Shokrollahi |
1997 |
Tensor rank, circuit complexity, Strassen’s theorem |
Springer |
| Why Does Deep and Cheap Learning Work So Well? |
Lin et al. |
2017 |
Hierarchical polynomial structure in physics maps to efficient deep networks; RG group analogy |
arXiv:1608.08225 |
| Thread 3: Loss Landscape Geometry |
|
|
|
|
| Loss Surface of Deep Linear Networks via Algebraic Geometry |
Mehta et al. |
2018 |
Numerical AG (homotopy continuation) enumerates all stationary points; algebraic degree bounds |
arXiv:1810.07716 |
| Geometry of the Loss Landscape: Symmetries and Invariances |
Simsek et al. |
2021 |
Permutation symmetry generates structured manifold of minima; one extra neuron connects all minima |
arXiv:2105.12221 |
| Connectedness of Loss Landscapes via Morse Theory |
Akhtiamov, Thomson |
2023 |
Morse theory for mode connectivity; saddle-point index structure governs path-connectivity |
PMLR v197 |
| Symmetries of Neural Networks |
Brea, Gerstner, Urbanczik |
2019 |
Permutation and scaling symmetries; fiber structure of MLP parameter space |
arXiv:2106.10255 |
| Flat Minima |
Hochreiter, Schmidhuber |
1997 |
Flat-minima hypothesis with MDL/Bayesian argument; precursor to sharpness-aware geometry |
bioinf.jku.at |
| Morse Theory |
Milnor |
1963 |
Canonical reference: deformation retracts, index theory, CW-complex reconstruction |
Princeton Univ. Press |
| Thread 4: Algebraic Statistics |
|
|
|
|
| Algebraic Statistics |
Sullivant |
2018 |
Comprehensive AMS textbook: algebraic exponential families, MLE degree, identifiability |
AMS GSM 194 |
| Lectures on Algebraic Statistics |
Drton, Sturmfels, Sullivant |
2009 |
Compact introduction to MLE on varieties, likelihood geometry, graphical models |
Birkhäuser |
| Likelihood Geometry |
Huh, Sturmfels |
2014 |
ML degree as Euler characteristic of very affine variety; toric and determinantal models |
arXiv:1305.7462 |
| Algebraic Statistics for Computational Biology |
Pachter, Sturmfels |
2005 |
Gröbner bases, toric models, tropical geometry for genomics; graphical models and identifiability |
Cambridge Univ. Press |
| Learning Algebraic Varieties from Samples |
Breiding et al. |
2018 |
Persistent-homology pipeline to recover dimension/degree/equations of variety from point cloud |
arXiv:1802.09436 |
| Background / Cross-Cutting |
|
|
|
|
| Ideals, Varieties, and Algorithms |
Cox, Little, O’Shea |
2015 |
Standard intro: Gröbner bases, Nullstellensatz, primary decomposition, elimination theory |
Springer |
| Solving Systems of Polynomial Equations |
Sturmfels |
2002 |
Resultants, Bernstein’s theorem, Nash equilibria, AG of statistical models |
AMS CBMS 97 |
| Information, Physics, and Computation |
Mézard, Montanari |
2009 |
Stat mech of disordered systems; belief propagation, cavity method; phase transitions in large NNs |
Oxford Univ. Press |
| Tensor Decompositions and Applications |
Kolda, Bader |
2009 |
Tucker/CP decompositions; tensor rank; algebraic complexity |
SIAM Review |
| IPAM Workshop: AG — A Window to ML |
IPAM |
2024 |
Community overview: grokking, neural collapse, LoRA, network verification via AG |
IPAM |
| Understanding Deep Learning Requires Rethinking Generalization |
Zhang et al. |
2017 |
Empirical motivation for SLT: classical bounds fail for neural nets |
arXiv:1611.03530 |
| Grokking: Generalization Beyond Overfitting |
Power et al. |
2022 |
Delayed generalization; one of the IPAM open problems |
arXiv:2201.02177 |