DeepSeek V4 technical report: 484 days of architecture work
DeepSeek V4 through two main stories: 1M context made open and efficient, and an architecture stack built to make that possible. The key technical pieces are mHC for stable residual flow, hybrid compressed attention for long context, Muon as a main optimizer, and a training pipeline that openly describes both elegant methods and messy engineering compromises.