WARCannon – catastrophically powerful parallel WARC processing
WARCannon was built to simplify and cheapify the process of ‘grepping the internet’. With WARCannon, you can: Build and test regex patterns against real Common Crawl data Easily load Common Crawl datasets for parallel...