Will Meta scrape and crawl through all our data now?

LilDumpy@lemmy.world · 1 year ago

Will Meta scrape and crawl through all our data now?

marsara9@lemmy.world · 1 year ago

Using threads / ActivityPub does make it easier though.

If they’re using a traditional crawler you could in theory block them at the user agent level (i.e. Cloudflare). If they’re using the public APIs, they’d have to write an interface for each distinct piece of software (Lemmy, Kbin, Mastodon, etc…) (How my search engine works)

But with ActivityPub were essentially just sending them the data in near real-time all using the same rough structure. Individual instances may block them but it wouldn’t be hard to setup proxies/relays that the community as a whole just isn’t aware of. (i.e. a new “Lemmy” instance comes online that just looks like a single user server, but it’s actually just a relay to Meta). The only real gotcha with ActivityPub is that there’s no real way to get historical data (nothing from the past).

Now I still have mixed feelings about Meta joining the fediverse, but if we’re just talking about blocking them from getting the content we have here, then things get difficult.