Why datasets built on public domain might not be enough for AI

Author: Stefano Maffulli
Source

Sponsored:

Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence - Audiobook


Uncover the true cost of artificial intelligence.

"Atlas of AI" by Kate Crawford exposes how power, politics, and profit extract from our planet, our labor, and our freedom.

From hidden mines to massive data empires, discover how AI is reshaping who we are—and who holds control.

Listen now, and see the system behind the screens before the future listens to you. = > Atlas of AI $0.00 with trial. Read by Larissa Gallagher


Common Corpus is a public domain dataset for training large language models (LLMs). Boasting 500 billion words in multiple languages, drawn from various cultural initiatives, it offers researchers a powerful tool to develop smaller and more efficient LLMs. It should not be abused as a tool to promote public policies that expand the reach of copyright law.

Read more