Common Crawl maintains a free,open repository of web crawl data that can be used by anyone.
Common Crawl is a 501(c)(3) non–profit founded in 2007. We make wholesale extraction, transformation and analysis of open web data accessible to researchers.
The Common Crawl team attended the 2025 IIPC General Assembly and Web Archiving Conference in Oslo, presenting recent work and participating in discussions on web preservation.
Thom Vaughan
Thom is Principal Technologist at the Common Crawl Foundation.