AnActOfCreation@programming.dev to Technology@lemmy.worldEnglish · 1 year ago‘Reddit can survive without search’: company reportedly threatens to block Googlewww.theverge.comexternal-linkmessage-square329fedilinkarrow-up11.23Karrow-down122cross-posted to: technologyredditRedditMigration@kbin.socialnews@lemmy.worldtechnology@lemmy.worldsnoocalypsetechnology@feddit.chreddit@lemmy.worldreddit@lemdro.id
arrow-up11.2Karrow-down1external-link‘Reddit can survive without search’: company reportedly threatens to block Googlewww.theverge.comAnActOfCreation@programming.dev to Technology@lemmy.worldEnglish · 1 year agomessage-square329fedilinkcross-posted to: technologyredditRedditMigration@kbin.socialnews@lemmy.worldtechnology@lemmy.worldsnoocalypsetechnology@feddit.chreddit@lemmy.worldreddit@lemdro.id
minus-squareonlinelinkfedilinkEnglisharrow-up7arrow-down2·edit-21 year agoSpeaking of this, what parts of the fediverse have added the option to block training generative AI to their respective robots.txt? https://blog.google/technology/ai/an-update-on-web-publisher-controls/ https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers https://techcrunch.com/2023/09/28/medium-hints-at-a-nascent-media-coalition-to-block-ai-crawlers/ It looks like there’s a handful of these lines you’d have to add to robots.txt Is there anywhere that keeps a comprehensive list of these?
minus-squarekingthrillgorelinkfedilinkEnglisharrow-up2arrow-down1·1 year agoI’ve been trying to find a list as well to no avail. The ones I do know are on my own robots.txt, at volcanolair.co/robots.txt
minus-squareonlinelinkfedilinkEnglisharrow-up1arrow-down1·1 year agoSomeone should make a github just to make it easier for people to find them all in one place with sources and update the list as we get new ones.
Speaking of this, what parts of the fediverse have added the option to block training generative AI to their respective robots.txt?
https://blog.google/technology/ai/an-update-on-web-publisher-controls/ https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers https://techcrunch.com/2023/09/28/medium-hints-at-a-nascent-media-coalition-to-block-ai-crawlers/
It looks like there’s a handful of these lines you’d have to add to robots.txt
Is there anywhere that keeps a comprehensive list of these?
I’ve been trying to find a list as well to no avail. The ones I do know are on my own robots.txt, at volcanolair.co/robots.txt
Someone should make a github just to make it easier for people to find them all in one place with sources and update the list as we get new ones.