Update known_crawler_lists

This commit is contained in:
Patrick McCann 2024-10-01 12:17:26 -04:00 committed by GitHub
parent 371ea77be7
commit 4cf2d7a446
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -1,5 +1,9 @@
# not all formatted the same, for reference
https://github.com/privacy-tech-lab/gpc-web-crawler/blob/main/selenium-optmeowt-crawler/full-crawl-set.csv
https://github.com/InteractiveAdvertisingBureau/adstxtcrawler/blob/master/adstxt_domains_2018-02-13.txt
https://github.com/kaustubhd93/adstxt-crawler/tree/master/archives
https://github.com/zer0h/top-1000000-domains/blob/master/top-10000-domains
https://github.com/zer0h/top-1000000-domains/blob/master/top-100000-domains
https://github.com/Jirehlov/cfranking/blob/main/20240311-20240318/cloudflare-radar-domains-top-50000-20240311-20240318.csv
https://github.com/duckduckgo/tracker-radar/blob/main/build-data/generated/domain_map.json