Overfitting, Underfitting & Regularisation

Overfitting is one of the most important concepts in machine learning: it's when a model memorises the training data rather than learning general patterns. An overfitted model performs brilliantly on data it's seen before but fails on anything new - like a student who memorises past exam papers but can't answer a novel question. Underfitting is the opposite: the model hasn't learned enough and performs poorly on everything. The goal is the sweet spot between the two, where the model captures genuine patterns without memorising noise and quirks specific to the training set. Regularisation refers to the various techniques used to prevent overfitting: adding penalties for overly complex models, randomly disabling parts of the network during training (dropout), stopping training before the model starts memorising, or augmenting the training data with variations. For large language models, the sheer volume of training data makes classical overfitting less of a concern - with trillions of tokens, the model rarely sees the same example twice. But subtler forms of overfitting still occur, particularly during fine-tuning on smaller datasets. Understanding this tradeoff helps explain why a model might work brilliantly in a demo but stumble on your specific data - the demo may have been inadvertently tuned to showcase its strengths.