Bluesky users discuss plans around user data and AI training
Social network Bluesky recently Posts a GitHub proposal Outlining new options can give users to indicate whether they want their publications and data to be scraped for things like generative training and public archiving.
Executive Director Jay Graber discussed the proposal earlier this week, While on stage south of southwest but she attracted new attention on Friday night after she Posted for this on BlueskyS Some users responded with an alarm to the company’s plans that they saw as turning the previous insistence of blushes Will not sell advertiser user data and Won’t train AI on user postsS
“Oh hell!” The Sketchette user wroteS “The beauty of this platform was not sharing information. Especially gene AI. Not the cave now. “
Grave replied These generative AI companies are “already scraping public data from the whole network”, including Bluesky, as “everything on Bluesky is public as a website is public.” Therefore, she said Bluezki tries to create a “new standard” to manage this scraping, like robots.txt File websites use to communicate their web creepers.
II and Copyright Training Debate have Drag robots.txt into the spotlightAmong other things that emphasize the fact that it is not legally applicable. Bluesky framed its proposed standard as the one that would have a similar “mechanism and expectations”, providing “machine -readable format that is expected to observe and carry ethical weight but are not legally applicable.”
According to the proposal, the users of the Bluesky app or other applications that use the main ATPROTOCOLIt can enter their settings and allow or prohibit the use of their Bluesky data in four categories: generative AI, bridge protocols (ie connecting different social ecosystems), volumetric data sets and web backup (such as the internet archives return machine).
If the user indicates that he does not want their data used to train generative AI, the proposal says: “Companies and research teams building AI training are expected to observe this intention when they see it, or when they scrap websites, or make bulk transfers using the protocol itself.”
Molly White who writes that a quotation newsletter is needed, and Web3 is running a great blog, described this As a “good suggestion” and said it was “strange to see people flaming blues for it” as it is not so “welcoming when scraping AI,” but more recently, “trying to add a signal to allow users to announce the scraping preferences that is already happening.”
“I think the weakness with this and (creative municipalities) such a proposal for” Preference signals “is that they rely on scrapers to respect these signals from some desire to be good actors,” White continued. “We have already seen some of these companies blow out exactly past robots.Txt or pirate scraping materials.”