Information Theory: Overview

This file is the index for the concepts/information-theory/ folder. It lists planned and written subtopic notes, organizes them by theme, and collects the canonical references for the field. Use it to decide what to write next without needing to re-survey the landscape.


Notes in This Folder

File Status Topic
entropy-and-divergences.md ✅ Written Entropy, KL divergence, f-divergences, mutual information, and core inequalities
aep-and-typicality.md ✅ Written Asymptotic equipartition property, typical sets, lossless source coding
channel-capacity.md ✅ Written Channel capacity, Fano’s inequality, Shannon’s noisy channel coding theorem
rate-distortion.md ✅ Written Rate-distortion function, Blahut–Arimoto algorithm, connection to learned quantization
quantization.md ✅ Written Scalar/vector quantization, Panter–Dite, product quantization, random rotation, JL lemma, TurboQuant
maximum-entropy.md 🔲 Planned Jaynes’ maximum-entropy principle, exponential families, statistical mechanics bridge
information-geometry.md 🔲 Planned Fisher–Rao metric, dual affine connections, α-connections, e/m-projections
information-cohomology.md 🔲 Planned Baudot–Bennequin construction, information structures as ringed sites, higher-order mutual information

Subtopic Map

Classical Shannon Theory

Subtopic Key Idea Primary Source
Entropy and divergences H(X) as average surprise; KL as relative information; data-processing inequality Shannon (1948); Cover & Thomas ch. 2–3
AEP and typicality Almost all long sequences are typical; typical set has probability → 1 and size ≈ 2^{nH} Cover & Thomas ch. 3; Polyanskiy–Wu ch. 5
Channel capacity C = max_{p(x)} I(X;Y); operational meaning via coding theorem Shannon (1948); Gallager (1968); Csiszár–Körner
Rate-distortion R(D) = min_{p(x̂ x): E[d]≤D} I(X;X̂); connection to quantization

Maximum-Entropy Methods

Subtopic Key Idea Primary Source
Jaynes’ MaxEnt principle Given moment constraints, choose the distribution maximising entropy Jaynes (1957 I, II)
Exponential families MaxEnt subject to linear constraints yields exponential families; natural parameters Wainwright & Jordan (2008)
Variational inference Mean-field, belief propagation, and the free-energy principle via dually-flat geometry Wainwright & Jordan (2008)

Information Geometry

Subtopic Key Idea Primary Source
Fisher–Rao metric Unique (up to scale) Riemannian metric on statistical manifolds, invariant under sufficient statistics Rao (1945); Chentsov (1982)
Dual connections Statistical manifolds carry a pair of flat dual connections (e- and m-connections) Amari & Nagaoka (2000)
α-connections One-parameter family interpolating between e- and m-connections; α=±1 are dually flat Amari (2016)
Divergences and projections Bregman divergences generalise KL; e/m-projections are orthogonal in dual senses Amari & Nagaoka (2000) ch. 3

Information Cohomology

Subtopic Key Idea Primary Source
Homological nature of entropy Shannon entropy = unique 1-cocycle in H^1 of a simplicial probability space Baudot & Bennequin (2015)
Information structures Probability assignments form a presheaf; entropy is a natural transformation Vigneaux (2017); Vigneaux thesis (2019)
Higher-order mutual information I_k landscapes detect synergy/redundancy beyond pairwise mutual information Baudot et al. (2019)

Applications to Compression and Quantization

Subtopic Key Idea Primary Source
Learned compression Rate-distortion theory grounds VAE and flow-based image codecs Yang, Mandt & Theis (2023)
Vector quantization Near-optimal VQ via random rotation + scalar quantisation; TurboQuant Zandieh et al. (2025)
Axiomatic entropy diversity Entropy as magnitude of enriched category; category-theoretic unification Leinster (2021)

Dependency Graph

flowchart TD
    A["Entropy and Divergences
entropy-and-divergences.md"] B["AEP and Typicality
aep-and-typicality.md"] C["Channel Capacity
channel-capacity.md"] D["Rate-Distortion
rate-distortion.md"] E["Maximum Entropy
maximum-entropy.md"] F["Information Geometry
information-geometry.md"] G["Information Cohomology
information-cohomology.md"] A --> B A --> C A --> D A --> E B --> C B --> D E --> F A --> F A --> G E --> G

Master References

Reference Authors Year What It Covers Link
A Mathematical Theory of Communication Shannon 1948 Foundational: entropy, mutual information, source and channel coding theorems PDF
Information Theory and Statistical Mechanics I Jaynes 1957 MaxEnt as inference principle; statistical mechanics as information problem PDF
Information Theory and Statistical Mechanics II Jaynes 1957 MaxEnt extended to quantum density matrices; canonical and grand canonical ensembles DOI
Elements of Information Theory Cover & Thomas 2006 Standard graduate textbook — entropy, AEP, channel capacity, rate-distortion Wiley
Information Theory: From Coding to Learning Polyanskiy & Wu 2025 Modern graduate textbook; sharp analytical style; includes learning-theoretic connections PDF
Information Theory: Coding Theorems for Discrete Memoryless Systems Csiszár & Körner 2011 Rigorous channel capacity and rate-distortion; method of types Cambridge
Information Theory and Reliable Communication Gallager 1968 Rigorous foundations; channel coding; error exponents MIT
Rate Distortion Theory Berger 1971 Classic monograph on the rate-distortion theorem; Blahut–Arimoto algorithm Archive.org
Methods of Information Geometry Amari & Nagaoka 2000 Definitive reference: dual connections, Fisher–Rao metric, exponential/mixture geodesics AMS
Information Geometry and Its Applications Amari 2016 Self-contained introduction from divergences; applications to statistics and neural networks Springer
An Elementary Introduction to Information Geometry Nielsen 2020 Accessible 56-page survey of information manifolds and dual connections arXiv
Statistical Decision Rules and Optimal Inference Chentsov 1982 Uniqueness of the Fisher metric under Markov morphisms (Chentsov–Campbell theorem) AMS
The Homological Nature of Entropy Baudot & Bennequin 2015 Shannon entropy as degree-1 cohomology class; topos-theoretic framework MDPI
Information Structures and Their Cohomology Vigneaux 2017 Information structures as ringed sites; Shannon and Tsallis as degree-1 cocycles arXiv
Topology of Statistical Systems (PhD thesis) Vigneaux 2019 Full topos-theoretic treatment of information cohomology; discrete and quantum settings HAL
Topological Information Data Analysis Baudot et al. 2019 Computational I_k landscapes; higher-order mutual information applied to real data MDPI
Graphical Models, Exponential Families, and Variational Inference Wainwright & Jordan 2008 Unified variational framework for exponential families; dually-flat geometry and belief propagation PDF
Entropy and Diversity: The Axiomatic Approach Leinster 2021 Entropy from enriched category theory; axiomatic unification of diversity measures Cambridge
An Introduction to Neural Data Compression Yang, Mandt & Theis 2023 Rate-distortion theory of learned compression; VAE codecs; diffusion compressors arXiv
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate Zandieh et al. 2025 Random-rotation VQ achieves near-optimal MSE; 6× KV-cache compression on H100 arXiv
MIT 6.441 Lecture Notes Polyanskiy & Wu 2016 Concise rigorous lecture notes; precursor to the 2025 textbook OCW