A new web crawler launched by Meta last month is quietly scraping the web for AI training data

lemme in@lemm.ee · 22 days ago

A new web crawler launched by Meta last month is quietly scraping the web for AI training data

GarrulousBrevity@lemmy.world · edit-2 22 days ago

Does that mean this new bot is ignoring sites’ robots.txt files? The Internet works because of web crawlers, and I’m not sure how this one is different

Edited to add: Apparently one would need to add Meta-ExternalAgent to their robots file unless they had a wildcard rule, so this isn’t as widely blocked by virtue of being new. Letting it run for a few months before letting anyone know it exists is kinda shady.