Search
❯
Oct 19, 20251 min read
https://www.essential.ai/blog/infra layer sharding for large scale training with muon