Its less about copying the work, its more like looking at patterns that appear in a work.
To bring a very rudimentary example, if I wanted a word and the first letter was Q, what would the second letter be.
Of course, statistically, the next letter is u, and its not common for words starting with Q to have a different letter after that. ML/AI is like taking these small situations, but having a ridiculous amount of parameters to come up with something based on several internal models. These paramters of course generally have some context.
Its like if you were told to read a book thoroughly, and then after was told to reproduce the same book. You probably cannot make it 1:1, but could probably get the general gist of a story. The difference between you and the machine is the machine read a lot of books, and contextually knows patterns so that it can generate something similar faster and more accurate, but not exactly the original one for one thing.
When you download Vicuna or Stable Diffusion XL, they’re a handful of gigabytes. But when you go download LAION-5B, it’s 240TB. So where did that data go if it’s being copy/pasted and regurgitated in its entirety?
Exactly! If it were just out putting exact data they wouldn’t care about making new works and just pivot as the world’s greatest source of compression.
Though there is some work researchers have done to heavily modify these models to over fit to do exactly this.
deleted by creator
deleted by creator
Its less about copying the work, its more like looking at patterns that appear in a work.
To bring a very rudimentary example, if I wanted a word and the first letter was Q, what would the second letter be.
Of course, statistically, the next letter is u, and its not common for words starting with Q to have a different letter after that. ML/AI is like taking these small situations, but having a ridiculous amount of parameters to come up with something based on several internal models. These paramters of course generally have some context.
Its like if you were told to read a book thoroughly, and then after was told to reproduce the same book. You probably cannot make it 1:1, but could probably get the general gist of a story. The difference between you and the machine is the machine read a lot of books, and contextually knows patterns so that it can generate something similar faster and more accurate, but not exactly the original one for one thing.
When you download Vicuna or Stable Diffusion XL, they’re a handful of gigabytes. But when you go download LAION-5B, it’s 240TB. So where did that data go if it’s being copy/pasted and regurgitated in its entirety?
Exactly! If it were just out putting exact data they wouldn’t care about making new works and just pivot as the world’s greatest source of compression.
Though there is some work researchers have done to heavily modify these models to over fit to do exactly this.