Loss Functions & Objectives

A loss function is how you tell a model what "wrong" means. It's a mathematical formula that measures the gap between what the model predicted and what the correct answer was. A simple example: if the model predicts there's a 90% chance an email is spam, but it's actually not spam, the loss function assigns a high penalty. If it predicts 10% chance of spam for that same non-spam email, the penalty is low. The choice of loss function shapes everything about how the model learns. Different loss functions prioritise different things - some penalise big mistakes much more heavily than small ones, others treat all errors equally, and some focus on ranking outputs correctly rather than getting exact values right. For language models, the typical objective during pretraining is next-token prediction: given a sequence of words, predict what comes next. This deceptively simple objective, applied across trillions of words, produces models with remarkably broad capabilities. For business users, the key insight is that a model optimises for exactly what its loss function measures - nothing more. If the objective doesn't capture what you actually care about, the model might score well on its training metric while performing poorly at your actual task. This gap between training objective and real-world usefulness is a recurring theme in AI.