Consent & Privacy in Training Data

When AI models are trained on data that includes personal information, consent and privacy become critical concerns. Under regulations like GDPR, individuals generally have rights over how their data is used, including the right to be informed, to object, and in some cases to have their data deleted. Applying these principles to AI training data is genuinely difficult. A model trained on millions of data points doesn't store individual records in an easily separable way. If someone exercises their right to erasure, removing their specific contribution from a trained model may require retraining from scratch - an expensive and time-consuming process. Some organisations address this by anonymising or aggregating data before training, but true anonymisation is harder than it appears, especially with modern reidentification techniques. Others rely on legitimate interest or other legal bases, but these are being tested by regulators. The practical advice is to think about consent and privacy before you start collecting and training, not after. Building privacy-preserving practices into your data pipeline from the beginning is far cheaper than retrofitting them later. And if you're using a third-party AI product, ask pointed questions about what data it was trained on and whether appropriate consent was obtained.