Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

Nexy · edit-2 1 month ago

Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'

@KurtVonnegut@mander.xyz · 1 month ago

The same can and will happen with the Fediverse right?

@GeneralEmergency@lemmy.world · 1 month ago

Probably already happened

@Viking_Hippie@lemmy.world · 1 month ago

deleted by creator

@KurtVonnegut@mander.xyz · 1 month ago

I see. Probably mastodon.social gets scraped, then 🫣

@ladicius@lemmy.world · 1 month ago

Is that a problem for a proper scraper? Give the machine a list of domains and some hints about the relevant protocols, and then the computer runs until the end of the list.