Training large AI models requires extraordinary amounts of energy. Estimates for training GPT-4-class models range from 50 to over 100 gigawatt-hours - enough to power tens of thousands of homes for a year. The energy comes from running thousands of GPUs at near-full capacity for weeks or months, plus the cooling systems needed to keep them from overheating. But training happens once (or a few times). Inference happens continuously, and at scale it accounts for the majority of AI's energy consumption. Each query to a large language model requires significantly more compute - and therefore more energy - than a traditional web search. As AI features are embedded in more products and used by more people, inference energy consumption is growing rapidly. The energy intensity varies enormously by model size and task. A small classification model running on a CPU uses negligible energy per inference. A large language model generating a detailed response on a GPU uses orders of magnitude more. Efficiency improvements - better hardware, optimised software, smaller models, smarter routing - are helping, but they're currently being outpaced by growth in demand. If you're deploying AI at scale, understanding the energy implications of your architecture choices is important both for managing costs and for meeting sustainability commitments.