Stephen King: My Books Were Used to Train AI::One prominent author responds to the revelation that his writing is being used to coach artificial intelligence.

  • sab@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    6
    ·
    1 year ago

    Nah.

    By default an AI will draw from its entire memory, and so will have lots of different influences. But by tuning your prompt (or restricting your input dataset) you can make it so specific, it’s basically creating near perfect clones. And contrary to a human, it can then produce similar works hundreds of times per minute.

    But even that is beside the point. Those works were sold under the presumption that people will read them. Not to ingest them into a LLM or text-to-image model. And now, companies like openai and others profit from the models they trained without permission from the original author. That’s just wrong.

    • AEsheron@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      Have you used Stable Diffusion. I defy you to make a perfect clone of any image. Take a whole week to try and refine it if you want. It is basically impossible by definition, unless you only trained it on that one image.

    • stephen01king@lemmy.zip
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      If you wanna make the claim that AI can make perfect clones, you gotta provide more proof than just your own words. I personally has never managed to make that happen.

    • BetaDoggo_@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      Obviously restricting the input will cause the model to overfit, but that’s not an issue for most models where Billions of samples are used. In the case of stable diffusion this paper had a ~0.03% success rate extracting training data after 500 attempts on each image, ~6.23E-5% per generation. And that was on a targeted set with the highest number of duplicates in the dataset.

      The reason they were sold doesn’t matter, as long as the material isn’t being redistributed copyright isn’t being violated.