Data Licensing & Copyright
The legal landscape around AI training data is unsettled and moving quickly. Major lawsuits - from news publishers, visual artists, authors, and software developers - are testing whether training an AI model on copyrighted material constitutes fair use or requires a licence. The outcomes will have enormous implications for the industry. In the meantime, organisations face practical decisions. Using publicly available data doesn't necessarily mean you have the right to train on it. Terms of service, copyright law, and database rights can all restrict what's permissible. Licensed datasets from commercial providers offer clearer legal footing but come with their own restrictions on use, redistribution, and derivative works. Some AI companies are proactively striking licensing deals with content creators and publishers. Others are relying on fair use arguments that haven't yet been tested in court. If you're building or procuring AI systems, understanding the licensing status of the training data is essential risk management. You should also pay attention to the output side: some licences restrict how model outputs can be used commercially, and the legal status of AI-generated content itself remains unclear in many jurisdictions.