WARCannon – High Speed/Low Cost CommonCrawl RegExp In Node.js
WARCannon was built to simplify and cheapify the process of ‘grepping the internet’. With WARCannon, you can: Build and test regex patterns against real Common Crawl data Easily load Common Crawl datasets for parallel...