Common Crawl Foundation, Constellation Network, Seek to Bridge Blockchain, AI

The Common Crawl Foundation, a non-profit organization founded in 2007 and dedicated to providing a copy of the Internet to the public, and Constellation Network, a Web3 blockchain ecosystem notable for providing solutions to the US Department of Defense, this announced a strategic partnership aimed at democratizing and enhancing the accessibility and utility of web-crawled data on blockchain technology for artificial intelligence (AI) and data applications.

This collaboration will explore opportunities for improving large language models used by AI, starting with Common Crawl’s dataset, which is used by 80% of large language models, crawled more than 250 billion web pages to date (19 billion in 2024), and consists of an archive of nearly 9 petabytes of archived crawled data. By leveraging Constellation’s decentralized network, Hypergraph, to add immutability, provenance, and auditability around the data, the partnership aligns to provide joint solutions around responsible and transparent AI.

With AI projected to be a $3 trillion industry by 2030, there are growing demands for secure solutions to sharing common data sets being used for the training of large language models, improving storage of queried and cleaned data, monetization opportunities for data, and enhanced transparency with the source of data.

“This partnership represents a significant step forward in securing trusted distribution of Common Crawl,” said Rich Skrenta, executive director of the Common Crawl Foundation. “By combining our comprehensive web archive with Constellation’s proven implementation of blockchain technology, researchers and developers from around the world can trust what they’re getting from Common Crawl and have a model for authenticating large open data sets, such as those used for AI training.”

“The partnership between Constellation Network and Common Crawl highlights mainstream adoption of web3 solutions outside the echo chambers of crypto,” Ben Jorgensen, CEO of Constellation Network, added. “This alignment continues Constellation’s mission of our zero trust network being used as a public good for a data-focused future. Our aim is to further attract new developers by showcasing capabilities, such as integrating immutability throughout digital workflows, and thus further differentiate ourselves from earlier generations of blockchain technology.”

The two organizations will begin a phased approach to implement this initiative, starting with a customizable subnet called a metagraph, which will integrate a subset of Common Crawl’s data. This subnet is currently live on their test network and will soon be deployed to Constellation’s public network, Hypergraph. Further details of the live metagraph will be featured in the coming weeks, along with information on how organizations and developers can participate.



Sponsored Links by DQ Promote

 

 

 
Send this to a friend