Information Theory: Overview

This file is the index for the concepts/information-theory/ folder. It lists planned and written subtopic notes, organizes them by theme, and collects the canonical references for the field. Use it to decide what to write next without needing to re-survey the landscape.

Notes in This Folder

File	Status	Topic
`entropy-and-divergences.md`	✅ Written	Entropy, KL divergence, f-divergences, mutual information, and core inequalities
`aep-and-typicality.md`	✅ Written	Asymptotic equipartition property, typical sets, lossless source coding
`channel-capacity.md`	✅ Written	Channel capacity, Fano’s inequality, Shannon’s noisy channel coding theorem
`rate-distortion.md`	✅ Written	Rate-distortion function, Blahut–Arimoto algorithm, connection to learned quantization
`quantization.md`	✅ Written	Scalar/vector quantization, Panter–Dite, product quantization, random rotation, JL lemma, TurboQuant
`maximum-entropy.md`	🔲 Planned	Jaynes’ maximum-entropy principle, exponential families, statistical mechanics bridge
`information-geometry.md`	🔲 Planned	Fisher–Rao metric, dual affine connections, α-connections, e/m-projections
`information-cohomology.md`	🔲 Planned	Baudot–Bennequin construction, information structures as ringed sites, higher-order mutual information

Subtopic Map

Classical Shannon Theory

Subtopic	Key Idea	Primary Source
Entropy and divergences	H(X) as average surprise; KL as relative information; data-processing inequality	Shannon (1948); Cover & Thomas ch. 2–3
AEP and typicality	Almost all long sequences are typical; typical set has probability → 1 and size ≈ 2^{nH}	Cover & Thomas ch. 3; Polyanskiy–Wu ch. 5
Channel capacity	C = max_{p(x)} I(X;Y); operational meaning via coding theorem	Shannon (1948); Gallager (1968); Csiszár–Körner
Rate-distortion	R(D) = min_{p(x̂	x): E[d]≤D} I(X;X̂); connection to quantization

Maximum-Entropy Methods

Subtopic	Key Idea	Primary Source
Jaynes’ MaxEnt principle	Given moment constraints, choose the distribution maximising entropy	Jaynes (1957 I, II)
Exponential families	MaxEnt subject to linear constraints yields exponential families; natural parameters	Wainwright & Jordan (2008)
Variational inference	Mean-field, belief propagation, and the free-energy principle via dually-flat geometry	Wainwright & Jordan (2008)

Information Geometry

Subtopic	Key Idea	Primary Source
Fisher–Rao metric	Unique (up to scale) Riemannian metric on statistical manifolds, invariant under sufficient statistics	Rao (1945); Chentsov (1982)
Dual connections	Statistical manifolds carry a pair of flat dual connections (e- and m-connections)	Amari & Nagaoka (2000)
α-connections	One-parameter family interpolating between e- and m-connections; α=±1 are dually flat	Amari (2016)
Divergences and projections	Bregman divergences generalise KL; e/m-projections are orthogonal in dual senses	Amari & Nagaoka (2000) ch. 3

Information Cohomology

Subtopic	Key Idea	Primary Source
Homological nature of entropy	Shannon entropy = unique 1-cocycle in H^1 of a simplicial probability space	Baudot & Bennequin (2015)
Information structures	Probability assignments form a presheaf; entropy is a natural transformation	Vigneaux (2017); Vigneaux thesis (2019)
Higher-order mutual information	I_k landscapes detect synergy/redundancy beyond pairwise mutual information	Baudot et al. (2019)

Applications to Compression and Quantization

Subtopic	Key Idea	Primary Source
Learned compression	Rate-distortion theory grounds VAE and flow-based image codecs	Yang, Mandt & Theis (2023)
Vector quantization	Near-optimal VQ via random rotation + scalar quantisation; TurboQuant	Zandieh et al. (2025)
Axiomatic entropy diversity	Entropy as magnitude of enriched category; category-theoretic unification	Leinster (2021)

Dependency Graph

flowchart TD
    A["Entropy and Divergences
entropy-and-divergences.md"]
    B["AEP and Typicality
aep-and-typicality.md"]
    C["Channel Capacity
channel-capacity.md"]
    D["Rate-Distortion
rate-distortion.md"]
    E["Maximum Entropy
maximum-entropy.md"]
    F["Information Geometry
information-geometry.md"]
    G["Information Cohomology
information-cohomology.md"]

    A --> B
    A --> C
    A --> D
    A --> E
    B --> C
    B --> D
    E --> F
    A --> F
    A --> G
    E --> G

Master References

Reference	Authors	Year	What It Covers	Link
A Mathematical Theory of Communication	Shannon	1948	Foundational: entropy, mutual information, source and channel coding theorems	PDF
Information Theory and Statistical Mechanics I	Jaynes	1957	MaxEnt as inference principle; statistical mechanics as information problem	PDF
Information Theory and Statistical Mechanics II	Jaynes	1957	MaxEnt extended to quantum density matrices; canonical and grand canonical ensembles	DOI
Elements of Information Theory	Cover & Thomas	2006	Standard graduate textbook — entropy, AEP, channel capacity, rate-distortion	Wiley
Information Theory: From Coding to Learning	Polyanskiy & Wu	2025	Modern graduate textbook; sharp analytical style; includes learning-theoretic connections	PDF
Information Theory: Coding Theorems for Discrete Memoryless Systems	Csiszár & Körner	2011	Rigorous channel capacity and rate-distortion; method of types	Cambridge
Information Theory and Reliable Communication	Gallager	1968	Rigorous foundations; channel coding; error exponents	MIT
Rate Distortion Theory	Berger	1971	Classic monograph on the rate-distortion theorem; Blahut–Arimoto algorithm	Archive.org
Methods of Information Geometry	Amari & Nagaoka	2000	Definitive reference: dual connections, Fisher–Rao metric, exponential/mixture geodesics	AMS
Information Geometry and Its Applications	Amari	2016	Self-contained introduction from divergences; applications to statistics and neural networks	Springer
An Elementary Introduction to Information Geometry	Nielsen	2020	Accessible 56-page survey of information manifolds and dual connections	arXiv
Statistical Decision Rules and Optimal Inference	Chentsov	1982	Uniqueness of the Fisher metric under Markov morphisms (Chentsov–Campbell theorem)	AMS
The Homological Nature of Entropy	Baudot & Bennequin	2015	Shannon entropy as degree-1 cohomology class; topos-theoretic framework	MDPI
Information Structures and Their Cohomology	Vigneaux	2017	Information structures as ringed sites; Shannon and Tsallis as degree-1 cocycles	arXiv
Topology of Statistical Systems (PhD thesis)	Vigneaux	2019	Full topos-theoretic treatment of information cohomology; discrete and quantum settings	HAL
Topological Information Data Analysis	Baudot et al.	2019	Computational I_k landscapes; higher-order mutual information applied to real data	MDPI
Graphical Models, Exponential Families, and Variational Inference	Wainwright & Jordan	2008	Unified variational framework for exponential families; dually-flat geometry and belief propagation	PDF
Entropy and Diversity: The Axiomatic Approach	Leinster	2021	Entropy from enriched category theory; axiomatic unification of diversity measures	Cambridge
An Introduction to Neural Data Compression	Yang, Mandt & Theis	2023	Rate-distortion theory of learned compression; VAE codecs; diffusion compressors	arXiv
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate	Zandieh et al.	2025	Random-rotation VQ achieves near-optimal MSE; 6× KV-cache compression on H100	arXiv
MIT 6.441 Lecture Notes	Polyanskiy & Wu	2016	Concise rigorous lecture notes; precursor to the 2025 textbook	OCW