Mozilla open-source tools help developers build ethical AI datasets

Mozilla has unveiled open-source tools to help developers build ethical AI datasets and avoid training models on copyrighted material.

The reliance of many popular large language models (LLMs) on vast datasets scraped from the internet, often encompassing copyrighted works used without permission, presents a significant ethical and legal challenge.

A growing contingent within the developer community believes creating high-quality, ethically sound alternatives is not only possible but necessary. This launch directly supports that movement.

The new toolkits – products of a year-long collaboration with EleutherAI – provide developers with practical workflows, code, and demonstrations hosted on the Mozilla.ai Blueprints platform (a space dedicated to helping developers prototype AI applications using open-source components.)

Ayah Bdeir, Senior Advisor for AI Strategy at the Mozilla Foundation, said: “Just like open-source software in its early days, today’s open data ecosystem depends on community contributions and shared values.

“These toolkits are part of an effort to create common resources, make dataset creation easier, and promote the infrastructure needed for ethical AI development. Our partnership with EleutherAI is grounded in our shared commitment to advance this mission.”