Generalisation & Out-of-Distribution Robustness

Generalisation is a model's ability to handle situations it wasn't specifically trained on - which, in practice, is almost every real-world situation. Training data, however vast, can't cover every possible input a model will encounter. Out-of-distribution (OOD) robustness refers to how well a model performs when inputs differ significantly from its training data. A customer service model trained primarily on English-language queries might perform unpredictably when encountering Scots dialect, industry-specific jargon or messages mixing multiple languages. A model trained on well-structured business documents might struggle with informal social media posts. The challenge is that models typically don't know when they're out of their depth - they'll produce an output regardless, with no flag indicating that the input was unlike anything they've seen before. This matters for any business deploying AI across diverse user populations, geographic regions or use cases. Your test data might not represent the full range of real-world inputs your system will encounter. Practical mitigations include testing with deliberately diverse and unusual inputs, monitoring for distribution shifts in production data, building detection systems that flag when inputs look unusual and maintaining human fallback pathways for cases the AI isn't equipped to handle reliably.