Mixture-of-Experts Models

Mixture-of-Experts (MoE) is a design approach where a model is divided into many specialised sub-networks - the "experts" - and a routing mechanism decides which experts to activate for any given input. Instead of running the entire model for every request, only a small fraction of the network does the work each time. Think of it like a hospital: rather than every doctor seeing every patient, a triage system directs you to the right specialist. This means you can build models with enormous total capacity - trillions of parameters - while keeping the cost of each individual query manageable, because only a subset of those parameters are active at any time. GPT-4 and Mixtral are widely reported to use this approach. For businesses, MoE matters because it's one of the main ways AI providers manage the tension between model capability and running costs. A model can be very large on paper but relatively efficient in practice. The trade-off is complexity: routing needs to work well, and the experts need to develop genuine specialisation rather than all learning the same thing.