Information Theory: Overview
This file is the index for the concepts/information-theory/ folder. It lists planned and written subtopic notes, organizes them by theme, and collects the canonical references for the field. Use it to decide what to write next without needing to re-survey the landscape.
Notes in This Folder
| File | Status | Topic |
|---|---|---|
entropy-and-divergences.md |
✅ Written | Entropy, KL divergence, f-divergences, mutual information, and core inequalities |
aep-and-typicality.md |
✅ Written | Asymptotic equipartition property, typical sets, lossless source coding |
channel-capacity.md |
✅ Written | Channel capacity, Fano’s inequality, Shannon’s noisy channel coding theorem |
rate-distortion.md |
✅ Written | Rate-distortion function, Blahut–Arimoto algorithm, connection to learned quantization |
quantization.md |
✅ Written | Scalar/vector quantization, Panter–Dite, product quantization, random rotation, JL lemma, TurboQuant |
maximum-entropy.md |
🔲 Planned | Jaynes’ maximum-entropy principle, exponential families, statistical mechanics bridge |
information-geometry.md |
🔲 Planned | Fisher–Rao metric, dual affine connections, α-connections, e/m-projections |
information-cohomology.md |
🔲 Planned | Baudot–Bennequin construction, information structures as ringed sites, higher-order mutual information |
Subtopic Map
Classical Shannon Theory
| Subtopic | Key Idea | Primary Source |
|---|---|---|
| Entropy and divergences | H(X) as average surprise; KL as relative information; data-processing inequality | Shannon (1948); Cover & Thomas ch. 2–3 |
| AEP and typicality | Almost all long sequences are typical; typical set has probability → 1 and size ≈ 2^{nH} | Cover & Thomas ch. 3; Polyanskiy–Wu ch. 5 |
| Channel capacity | C = max_{p(x)} I(X;Y); operational meaning via coding theorem | Shannon (1948); Gallager (1968); Csiszár–Körner |
| Rate-distortion | R(D) = min_{p(x̂ | x): E[d]≤D} I(X;X̂); connection to quantization |
Maximum-Entropy Methods
| Subtopic | Key Idea | Primary Source |
|---|---|---|
| Jaynes’ MaxEnt principle | Given moment constraints, choose the distribution maximising entropy | Jaynes (1957 I, II) |
| Exponential families | MaxEnt subject to linear constraints yields exponential families; natural parameters | Wainwright & Jordan (2008) |
| Variational inference | Mean-field, belief propagation, and the free-energy principle via dually-flat geometry | Wainwright & Jordan (2008) |
Information Geometry
| Subtopic | Key Idea | Primary Source |
|---|---|---|
| Fisher–Rao metric | Unique (up to scale) Riemannian metric on statistical manifolds, invariant under sufficient statistics | Rao (1945); Chentsov (1982) |
| Dual connections | Statistical manifolds carry a pair of flat dual connections (e- and m-connections) | Amari & Nagaoka (2000) |
| α-connections | One-parameter family interpolating between e- and m-connections; α=±1 are dually flat | Amari (2016) |
| Divergences and projections | Bregman divergences generalise KL; e/m-projections are orthogonal in dual senses | Amari & Nagaoka (2000) ch. 3 |
Information Cohomology
| Subtopic | Key Idea | Primary Source |
|---|---|---|
| Homological nature of entropy | Shannon entropy = unique 1-cocycle in H^1 of a simplicial probability space | Baudot & Bennequin (2015) |
| Information structures | Probability assignments form a presheaf; entropy is a natural transformation | Vigneaux (2017); Vigneaux thesis (2019) |
| Higher-order mutual information | I_k landscapes detect synergy/redundancy beyond pairwise mutual information | Baudot et al. (2019) |
Applications to Compression and Quantization
| Subtopic | Key Idea | Primary Source |
|---|---|---|
| Learned compression | Rate-distortion theory grounds VAE and flow-based image codecs | Yang, Mandt & Theis (2023) |
| Vector quantization | Near-optimal VQ via random rotation + scalar quantisation; TurboQuant | Zandieh et al. (2025) |
| Axiomatic entropy diversity | Entropy as magnitude of enriched category; category-theoretic unification | Leinster (2021) |
Dependency Graph
flowchart TD
A["Entropy and Divergences
entropy-and-divergences.md"]
B["AEP and Typicality
aep-and-typicality.md"]
C["Channel Capacity
channel-capacity.md"]
D["Rate-Distortion
rate-distortion.md"]
E["Maximum Entropy
maximum-entropy.md"]
F["Information Geometry
information-geometry.md"]
G["Information Cohomology
information-cohomology.md"]
A --> B
A --> C
A --> D
A --> E
B --> C
B --> D
E --> F
A --> F
A --> G
E --> G
Master References
| Reference | Authors | Year | What It Covers | Link |
|---|---|---|---|---|
| A Mathematical Theory of Communication | Shannon | 1948 | Foundational: entropy, mutual information, source and channel coding theorems | |
| Information Theory and Statistical Mechanics I | Jaynes | 1957 | MaxEnt as inference principle; statistical mechanics as information problem | |
| Information Theory and Statistical Mechanics II | Jaynes | 1957 | MaxEnt extended to quantum density matrices; canonical and grand canonical ensembles | DOI |
| Elements of Information Theory | Cover & Thomas | 2006 | Standard graduate textbook — entropy, AEP, channel capacity, rate-distortion | Wiley |
| Information Theory: From Coding to Learning | Polyanskiy & Wu | 2025 | Modern graduate textbook; sharp analytical style; includes learning-theoretic connections | |
| Information Theory: Coding Theorems for Discrete Memoryless Systems | Csiszár & Körner | 2011 | Rigorous channel capacity and rate-distortion; method of types | Cambridge |
| Information Theory and Reliable Communication | Gallager | 1968 | Rigorous foundations; channel coding; error exponents | MIT |
| Rate Distortion Theory | Berger | 1971 | Classic monograph on the rate-distortion theorem; Blahut–Arimoto algorithm | Archive.org |
| Methods of Information Geometry | Amari & Nagaoka | 2000 | Definitive reference: dual connections, Fisher–Rao metric, exponential/mixture geodesics | AMS |
| Information Geometry and Its Applications | Amari | 2016 | Self-contained introduction from divergences; applications to statistics and neural networks | Springer |
| An Elementary Introduction to Information Geometry | Nielsen | 2020 | Accessible 56-page survey of information manifolds and dual connections | arXiv |
| Statistical Decision Rules and Optimal Inference | Chentsov | 1982 | Uniqueness of the Fisher metric under Markov morphisms (Chentsov–Campbell theorem) | AMS |
| The Homological Nature of Entropy | Baudot & Bennequin | 2015 | Shannon entropy as degree-1 cohomology class; topos-theoretic framework | MDPI |
| Information Structures and Their Cohomology | Vigneaux | 2017 | Information structures as ringed sites; Shannon and Tsallis as degree-1 cocycles | arXiv |
| Topology of Statistical Systems (PhD thesis) | Vigneaux | 2019 | Full topos-theoretic treatment of information cohomology; discrete and quantum settings | HAL |
| Topological Information Data Analysis | Baudot et al. | 2019 | Computational I_k landscapes; higher-order mutual information applied to real data | MDPI |
| Graphical Models, Exponential Families, and Variational Inference | Wainwright & Jordan | 2008 | Unified variational framework for exponential families; dually-flat geometry and belief propagation | |
| Entropy and Diversity: The Axiomatic Approach | Leinster | 2021 | Entropy from enriched category theory; axiomatic unification of diversity measures | Cambridge |
| An Introduction to Neural Data Compression | Yang, Mandt & Theis | 2023 | Rate-distortion theory of learned compression; VAE codecs; diffusion compressors | arXiv |
| TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate | Zandieh et al. | 2025 | Random-rotation VQ achieves near-optimal MSE; 6× KV-cache compression on H100 | arXiv |
| MIT 6.441 Lecture Notes | Polyanskiy & Wu | 2016 | Concise rigorous lecture notes; precursor to the 2025 textbook | OCW |