OpenAI says it’s “impossible” to create useful AI models without copyrighted material

@sculd@beehaw.org · 11 months ago

OpenAI says it’s “impossible” to create useful AI models without copyrighted material

FaceDeer · 11 months ago

It’s actually the other way around, Bing does websearches based on what you’ve asked it and then the answer it generates can incorporate information that was returned by the websearching. This is why you can ask it about current events that weren’t in its training data, for example - it looks the information up, puts it into its context, and then generates the response that you see. Sort of like if I asked you to write a paragraph about something that you didn’t know about, you’d go look the information up first.

but humans also can differentiate between copyrighted and public works

Not really. Here’s a short paragraph about sailboats. Is it copyrighted?

Sailboats, those graceful dancers of the open seas, epitomize the harmonious marriage of nature and human ingenuity. Their billowing sails, like ethereal wings, catch the breath of the wind, propelling them across the endless expanse of the ocean. Each vessel bears the scars of countless journeys, a testament to the resilience of both sailor and ship.

@AndrasKrigare@beehaw.org · 11 months ago

Bing does, but it still has a pre trained model that it’s using in its answer; you can give it prompts that it will answer without having to perform a search at all. That’s not a huge distinction, but I think the majority of the concern is on those types of responses. If it’s just responding with the results of a web search, I don’t think anyone is particularly concerned.

I was being specific with my word choice there, and should have emphasized more. Humans can differentiate between them, not humans always can differentiate. Copyright as a concept is something we have awareness of than (to my knowledge) is not part of the major AI models. I don’t know that an AI needs to be better than a human at that task.