Create known_crawler_lists

This commit is contained in:
Patrick McCann 2024-10-01 12:00:58 -04:00 committed by GitHub
parent a960e5f2ed
commit 371ea77be7
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

5
known_crawler_lists Normal file
View File

@ -0,0 +1,5 @@
https://github.com/privacy-tech-lab/gpc-web-crawler/blob/main/selenium-optmeowt-crawler/full-crawl-set.csv
https://github.com/InteractiveAdvertisingBureau/adstxtcrawler/blob/master/adstxt_domains_2018-02-13.txt
https://github.com/kaustubhd93/adstxt-crawler/tree/master/archives
https://github.com/zer0h/top-1000000-domains/blob/master/top-10000-domains
https://github.com/zer0h/top-1000000-domains/blob/master/top-100000-domains