Drift Detection & Retraining

The world doesn't stand still after you deploy a model. Customer behaviour shifts, product catalogues change, language evolves, and the statistical relationships your model learned gradually become less accurate. This phenomenon - called drift - is one of the most common reasons AI systems degrade over time. Data drift occurs when the distribution of incoming data changes compared to what the model was trained on. A model trained on pre-pandemic shopping behaviour will struggle with pandemic-era patterns. Concept drift occurs when the relationship between inputs and outcomes changes - the same customer features that predicted churn last year may not predict it this year. Detecting drift requires ongoing statistical monitoring of both inputs and outputs. Techniques range from simple distribution comparisons to more sophisticated methods like population stability indices and Kolmogorov-Smirnov tests. When drift is detected, the model typically needs retraining on more recent data. This can be done on a fixed schedule (retrain monthly regardless), triggered by drift detection (retrain when metrics drop below a threshold), or continuously (online learning that updates the model incrementally). Each approach has trade-offs around cost, freshness, and stability. The key is to have a retraining strategy before you deploy, not to discover the need for one when performance has already deteriorated.