Mixture-of-Experts Pattern

Cameron Rohn · Episode: Ep 8 - Kimi2, Is RAG still a thing? and the coming SaaS bloodbath. · Category: frameworks_and_exercises

Using a sparse mixture-of-experts attention architecture activates only 32 B parameters at inference, enabling scaling to a trillion-parameter model cost-effectively.

Segment: Segment 4

Start Time: 04:12