Edge & On-Device Optimisation

Running AI models directly on devices - phones, laptops, embedded systems - rather than in the cloud offers compelling advantages: faster response times (no network round-trip), better privacy (data never leaves the device), offline capability and lower ongoing costs (no API fees). But device hardware is vastly less powerful than cloud data centres, so significant optimisation is required. This involves combining many of the techniques in this category: aggressive quantisation, pruning, distillation, and architecture choices specifically designed for mobile processors. Apple's on-device AI features, Google's on-device translation and voice assistants that work offline all rely on these techniques. The challenge is maintaining useful quality within tight hardware constraints - a model that runs on a phone has perhaps a thousandth of the computing power available to a cloud-hosted model. For businesses, edge AI is particularly relevant for applications where latency, privacy, or connectivity are critical factors. Real-time processing in manufacturing, health monitoring on wearable devices, and privacy-sensitive document analysis are all use cases where on-device AI makes more sense than cloud-based alternatives. The trade-off is always capability versus constraints, and the boundary of what's possible on-device shifts as both hardware and compression techniques continue to improve.