Bluesky may have said it won’t use user data to train generative AI, but someone else just published a dataset of million Bluesky posts for “machine learning research”. Already very popular dataset, your data may be scraped
Bluesky may have said it won’t use user data to train generative AI, but someone else just published a dataset of million Bluesky posts for “machine learning research”. Already very popular dataset, your data may be scraped
The same can and will happen with the Fediverse right?
Probably already happened
deleted by creator
I see. Probably mastodon.social gets scraped, then 🫣
Is that a problem for a proper scraper? Give the machine a list of domains and some hints about the relevant protocols, and then the computer runs until the end of the list.