Apparently, stealing other people’s work to create product for money is now “fair use” as according to OpenAI because they are “innovating” (stealing). Yeah. Move fast and break things, huh?

“Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials,” wrote OpenAI in the House of Lords submission.

OpenAI claimed that the authors in that lawsuit “misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”

    • @maynarkh@feddit.nl
      link
      fedilink
      910 months ago

      The two big arguments are:

      • Substantial reproduction of the original work, you can get back substantial portions of the original work from an AI model’s output.
      • The AI model replaces the use of the original work. In short, a work that uses copyrighted material under fair use can’t be a replacement for the initial work.
      • @intensely_human@lemm.ee
        link
        fedilink
        110 months ago

        you can get back substantial portions of the original work from an AI model’s output

        Have you confirmed this yourself?

        • @chaos@beehaw.org
          link
          fedilink
          510 months ago

          In its complaint, The New York Times alleges that because the AI tools have been trained on its content, they sometimes provide verbatim copies of sections of Times reports.

          OpenAI said in its response Monday that so-called “regurgitation” is a “rare bug,” the occurrence of which it is working to reduce.

          “We also expect our users to act responsibly; intentionally manipulating our models to regurgitate is not an appropriate use of our technology and is against our terms of use,” OpenAI said.

          The tech company also accused The Times of “intentionally” manipulating ChatGPT or cherry-picking the copycat examples it detailed in its complaint.

          https://www.cnn.com/2024/01/08/tech/openai-responds-new-york-times-copyright-lawsuit/index.html

          The thing is, it doesn’t really matter if you have to “manipulate” ChatGPT into spitting out training material word-for-word, the fact that it’s possible at all is proof that, intentionally or not, that material has been encoded into the model itself. That might still be fair use, but it’s a lot weaker than the original argument, which was that nothing of the original material really remains after training, it’s all synthesized and blended with everything else to create something entirely new that doesn’t replicate the original.

          • FaceDeer
            link
            fedilink
            110 months ago

            You said:

            Substantial reproduction of the original work, you can get back substantial portions of the original work from an AI model’s output.

            If an AI is trained on a huge number of NYT articles and you’re only able to get it to regurgitate one of them, that’s not a “substantial portion of the original work.” That’s a minuscule portion of the original work.

          • @intensely_human@lemm.ee
            link
            fedilink
            110 months ago

            So that’s a no? Confirming it yourself here means doing it yourself. Have you gotten it to regurgitate a copyrighted work?