≃ Calvin's Notebook
home archive notes
  • Gradient Checkpointing ♻️
  • Memory-Bound Inference
  • Normalization-Free Transformers
  • NVIDIA GPU Hardware for Deep Learning
  • Deep Learning Engineering: Overview
  • Rotary Position Embeddings (RoPE) and Context Extension
  • Engineering Concerns for Deep Learning Training Loops
  • Weight Tying 🔗
Site proudly generated by Hakyll