Cameron Rohn · Episode: Ep 8 - Kimi2, Is RAG still a thing? and the coming SaaS bloodbath. · Category: frameworks_and_exercises
Using a sparse mixture-of-experts attention architecture activates only 32 B parameters at inference, enabling scaling to a trillion-parameter model cost-effectively.