Sparsity and Pruning in Deep Learning: Overview

This file is the index for the concepts/deep-learning-engineering/sparsity-pruning/ folder. It lists planned and written subtopic notes, organizes them by theme, and collects the canonical references for the field.


Notes in This Folder

File Status Topic
classical-pruning.md ✅ Written OBD, OBS, magnitude pruning — second-order saliency theory and iterative magnitude pruning
compression-pipelines.md ✅ Written Deep Compression (prune + quantize + Huffman) and EIE hardware accelerator
structured-pruning.md ✅ Written Filter/channel pruning, BN-scaling pruning, attention head pruning
sparse-training.md ✅ Written Lottery Ticket Hypothesis, SNIP, SET, SNFS, RigL — sparse training from scratch
llm-pruning.md ✅ Written Movement Pruning, SparseGPT, Wanda — LLM-scale compression

Subtopic Map

Classical Theory: Hessian-Based Saliency

Subtopic Key Idea Primary Source
Optimal Brain Damage Diagonal-Hessian saliency \(s_i = H_{ii} w_i^2 / 2\); prune low-\(s\) weights LeCun et al. 1990
Optimal Brain Surgeon Full inverse-Hessian; exact weight compensation \(\delta w = -\frac{w_q}{[H^{-1}]_{qq}} H^{-1} e_q\) Hassibi & Stork 1993
Iterative Magnitude Pruning Train → threshold → retrain; zeroth-order proxy; competitive at scale Han et al. 2015

Hardware-Aware Compression Pipelines

Subtopic Key Idea Primary Source
Deep Compression Prune (9–13×) → k-means quantization (5 bits) → Huffman coding; 35–49× total Han et al. 2016
EIE Accelerator Custom VLSI for compressed sparse FC layers; skips zero weights & activations Han et al. 2016
SCNN Exploits both weight and activation sparsity in a tiled dataflow microarchitecture Parashar et al. 2017

Structured Pruning

Subtopic Key Idea Primary Source
Filter pruning (ℓ₁) Rank filters by ℓ₁ norm; remove whole filters for cuDNN-compatible sparsity Li et al. 2016
BN scaling (Network Slimming) ℓ₁ sparsity on BN γ; prune channels with near-zero γ after training Liu et al. 2017
Attention head pruning Most heads are redundant; L0-gate pruning identifies specialized heads Michel et al. 2019; Voita et al. 2019
Rethinking pruning Fine-tuning pruned weights ≈ random init of pruned architecture for structured methods Liu et al. 2019

Sparse Training

Subtopic Key Idea Primary Source
Lottery Ticket Hypothesis Dense nets contain sparse “winning tickets” that train well from original init Frankle & Carlin 2019
SNIP Connection sensitivity saliency at init; prune before training begins Lee et al. 2019
SET Sparse Erdős–Rényi topology evolved during training; no dense model needed Mocanu et al. 2018
SNFS / Sparse Momentum Gradient-magnitude momentum drives topology reallocation; 5× faster training Dettmers & Zettlemoyer 2019
RigL Instantaneous gradient magnitudes update connectivity periodically; fixed FLOP budget Evci et al. 2020

LLM-Scale Pruning

Subtopic Key Idea Primary Source
Movement Pruning Fine-tuning saliency = weight × gradient movement; task-adaptive Sanh et al. 2020
SparseGPT Layerwise OBS with approximate inverse-Hessian via Cholesky; 50% sparsity at 175B Frantar & Alistarh 2023
Wanda Saliency = weight magnitude × activation ℓ₂ norm; no weight update needed Sun et al. 2023

Dependency Graph

flowchart TD
    A["Second-Order Methods
concepts/optimization-theory/second-order-methods.md"] B["Classical Pruning
classical-pruning.md"] C["Compression Pipelines
compression-pipelines.md"] D["Structured Pruning
structured-pruning.md"] E["Sparse Training
sparse-training.md"] F["LLM Pruning
llm-pruning.md"] G["Knowledge Distillation
concepts/deep-learning-engineering/knowledge-distillation/knowledge-distillation.md"] A --> B B --> C B --> D B --> E B --> F C --> F E --> F G -.->|"competing paradigm"| F

Master References

Reference Authors Year Sub-theme Key Contribution Link
Optimal Brain Damage LeCun, Denker, Solla 1990 Classical Diagonal-Hessian saliency scores; pruning as constrained loss minimization NeurIPS 1989
Optimal Brain Surgeon Hassibi, Stork 1993 Classical Full inverse-Hessian; exact closed-form weight compensation after pruning NeurIPS 1992
Learning Weights and Connections Han, Pool, Tran, Dally 2015 Magnitude IMP pipeline; 9× AlexNet, 13× VGG-16 compression, no accuracy loss NeurIPS 2015
Deep Compression Han, Mao, Dally 2016 Pipeline Prune + quantize + Huffman; 35–49× total compression; ICLR 2016 Best Paper ICLR 2016
EIE Han et al. 2016 Hardware Custom VLSI for compressed sparse FC; 189× CPU speedup ISCA 2016
SCNN Parashar et al. 2017 Hardware Dual weight+activation sparsity dataflow; ISCA 2017 ISCA 2017
Pruning Filters Li et al. 2016 Structured ℓ₁-norm filter pruning; hardware-compatible structured sparsity ICLR 2017
Network Slimming Liu et al. 2017 Structured BN γ sparsity regularization; channel pruning via near-zero γ ICCV 2017
Sixteen Heads Michel, Levy, Neubig 2019 Structured Most attention heads are redundant; head importance scoring NeurIPS 2019
Analyzing Self-Attention Voita et al. 2019 Structured L0-gate head pruning; specialized vs. redundant heads ACL 2019
Lottery Ticket Hypothesis Frankle, Carlin 2019 Sparse training Winning ticket subnetworks; IMP + weight rewinding; ICLR 2019 Best Paper ICLR 2019
Linear Mode Connectivity Frankle et al. 2020 Sparse training LTH stability requires rewinding to early training checkpoint, not step 0 ICML 2020
SNIP Lee, Ajanthan, Torr 2019 Sparse training Connection sensitivity at init; prune before any training ICLR 2019
SET Mocanu et al. 2018 Sparse training Sparse Erdős–Rényi topology evolved online; no dense model Nature Comms 2018
SNFS Dettmers, Zettlemoyer 2019 Sparse training Gradient-momentum-based topology reallocation; 5× faster NeurIPS 2019
RigL Evci et al. 2020 Sparse training Periodic gradient-magnitude topology updates; fixed FLOP training ICML 2020
Movement Pruning Sanh, Wolf, Rush 2020 LLM Fine-tuning-adaptive saliency; weight × gradient movement NeurIPS 2020
SparseGPT Frantar, Alistarh 2023 LLM Layerwise OBS at 175B scale; approximate inverse-Hessian via Cholesky ICML 2023
Wanda Sun et al. 2023 LLM Weight × activation-norm saliency; no weight update; matches SparseGPT arXiv 2023
State of Sparsity Gale, Elsen, Hooker 2019 Survey Magnitude pruning matches complex methods; sparse archs can’t train from scratch arXiv 2019
Pruning Survey Blalock et al. 2020 Survey 81-paper meta-survey; community lacks reproducible benchmarks; ShrinkBench MLSys 2020
Rethinking Pruning Liu et al. 2019 Survey For structured pruning, architecture > inherited weights; train from random init ICLR 2019