Compute Requirements & Cost Drivers

The compute required to train frontier AI models has been growing at a staggering rate - roughly tripling every year for the past decade. Training GPT-4 was estimated to have cost over $100 million in compute alone. The largest models now require clusters of tens of thousands of GPUs running for months. But training costs, while headline-grabbing, are only part of the picture. Inference - actually running a trained model to serve predictions or generate text - accounts for the majority of ongoing compute spending because it happens continuously at scale. The key cost drivers include model size (larger models need more compute per query), batch size (how many requests you process simultaneously), precision (using lower numerical precision reduces cost but may affect quality), and utilisation (keeping expensive hardware busy rather than idle). Optimisation techniques like quantisation, distillation, and pruning can dramatically reduce inference costs, sometimes by 80% or more with minimal quality loss. For organisations deploying AI, understanding your cost structure is essential. A model that works beautifully in a research setting may be economically unviable at production scale. The compute cost per query, multiplied by your expected volume, often determines whether an AI feature makes business sense - and this calculation should happen before you invest in building it, not after.