• Riskable@programming.dev
    link
    fedilink
    English
    arrow-up
    25
    arrow-down
    14
    ·
    edit-2
    2 days ago

    They’re not illegally harvesting anything. Copyright law is all about distribution. As much as everyone loves to think that when you copy something without permission you’re breaking the law the truth is that you’re not. It’s only when you distribute said copy that you’re breaking the law (aka violating copyright).

    All those old school notices (e.g. “FBI Warning”) are 100% bullshit. Same for the warning the NFL spits out before games. You absolutely can record it! You just can’t share it (or show it to more than a handful of people but that’s a different set of laws regarding broadcasting).

    I download AI (image generation) models all the time. They range in size from 2GB to 12GB. You cannot fit the petabytes of data they used to train the model into that space. No compression algorithm is that good.

    The same is true for LLM, RVC (audio models) and similar models/checkpoints. I mean, think about it: If AI is illegally distributing millions of copyrighted works to end users they’d have to be including it all in those files somehow.

    Instead of thinking of an AI model like a collection of copyrighted works think of it more like a rough sketch of a mashup of copyrighted works. Like if you asked a person to make a Godzilla-themed My Little Pony and what you got was that person’s interpretation of what Godzilla combined with MLP would look like. Every artist would draw it differently. Every author would describe it differently. Every voice actor would voice it differently.

    Those differences are the equivalent of the random seed provided to AI models. If you throw something at a random number generator enough times you could–in theory–get the works of Shakespeare. Especially if you ask it to write something just like Shakespeare. However, that doesn’t meant the AI model literally copied his works. It’s just doing it’s best guess (it’s literally guessing! That’s how work!).

    • Nate Cox@programming.dev
      link
      fedilink
      English
      arrow-up
      19
      arrow-down
      9
      ·
      2 days ago

      The problem with being like… super pedantic about definitions, is that you often miss the forest for the trees.

      Illegal or not, seems pretty obvious to me that people saying illegal in this thread and others probably mean “unethically”… which is pretty clearly true.

      • Riskable@programming.dev
        link
        fedilink
        English
        arrow-up
        12
        arrow-down
        5
        ·
        edit-2
        2 days ago

        I wasn’t being pedantic. It’s a very fucking important distinction.

        If you want to say “unethical” you say that. Law is an orthogonal concept to ethics. As anyone who’s studied the history of racism and sexism would understand.

        Furthermore, it’s not clear that what Meta did actually was unethical. Ethics is all about how human behavior impacts other humans (or other animals). If a behavior has a direct negative impact that’s considered unethical. If it has no impact or positive impact that’s an ethical behavior.

        What impact did OpenAI, Meta, et al have when they downloaded these copyrighted works? They were not read by humans–they were read by machines.

        From an ethics standpoint that behavior is moot. It’s the ethical equivalent of trying to measure the environmental impact of a bit traveling across a wire. You can go deep down the rabbit hole and calculate the damage caused by mining copper and laying cables but that’s largely a waste of time because it completely loses the narrative that copying a billion books/images/whatever into a machine somehow negatively impacts humans.

        It is not the copying of this information that matters. It’s the impact of the technologies they’re creating with it!

        That’s why I think it’s very important to point out that copyright violation isn’t the problem in these threads. It’s a path that leads nowhere.

    • Gerudo@lemm.ee
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      1
      ·
      2 days ago

      The issue I see is that they are using the copyrighted data, then making money off that data.

      • Riskable@programming.dev
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        5
        ·
        2 days ago

        …in the same way that someone who’s read a lot of books can make money by writing their own.

        • Vittelius@feddit.org
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 day ago

          I hate to be the one to break it to you but AIs aren’t actually people. Companies claiming that they are “this close to AGI” doesn’t make it true.

          The human brain is an exception to copyright law. Outsourcing your thinking to a machine that doesn’t actually think makes this something different and therefore should be treated differently.

        • blind3rdeye@lemm.ee
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 day ago

          Do you know someone who’s read a billion books and can write a new (trashy) book in 5 mins?

          • Vespair@lemm.ee
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 day ago

            No, but humans have differences in scale also. Should a person gifted with hyper-fast reading and writing ability be given less opportunity than a writer who takes a year to read a book and a decade to write one? Imo if the argument comes down to scale, it’s kind of a shitty argument. Is the underlying principle faulty or not?

            • blind3rdeye@lemm.ee
              link
              fedilink
              English
              arrow-up
              1
              ·
              17 hours ago

              Part of my point is that a lot of everyday rules do break down at large scale. Like, ‘drink water’ is good advice - but a person can still die from drinking too much water. And having a few people go for a walk through a forest is nice, but having a million people go for a walk through a forest is bad. And using a couple of quotes from different sources to write an article for a website is good; but using thousands of quotes in an automated method doesn’t really feel like the same thing any more.

              That’s what I’m saying. A person can’t physically read billions of books, or do the statistical work to put them together to create a new piece of work from them. And since a person cannot do that, no law or existing rule currently takes that possibility into account. So I don’t think we can really say that a person is ‘allowed to’ do that. Rather, it’s just an undefined area. A person simply cannot physically do it, and so the rules don’t have to consider it. On the other hand, computer systems can now do it. And so rather than pointing to old laws, we have to decide as a society whether we think that’s something we are ok with.

              I don’t know what the ‘best’ answer is, but I do think we should at least stop to think about it carefully; because there are some clear downsides that need to be considered - and probably a lot of effects that aren’t as obvious which should also be considered!

    • Mavvik@lemmy.ca
      link
      fedilink
      English
      arrow-up
      3
      ·
      2 days ago

      This is an interesting argument that I’ve never heard before. Isn’t the question more about whether ai generated art counts as a “derivative work” though? I don’t use AI at all but from what I’ve read, they can generate work that includes watermarks from the source data, would that not strongly imply that these are derivative works?

      • Riskable@programming.dev
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        4
        ·
        2 days ago

        If you studied loads of classic art then started making your own would that be a derivative work? Because that’s how AI works.

        The presence of watermarks in output images is just a side effect of the prompt and its similarity to training data. If you ask for a picture of an Olympic swimmer wearing a purple bathing suit and it turns out that only a hundred or so images in the training match that sort of image–and most of them included a watermark–you can end up with a kinda-sorta similar watermark in the output.

        It is absolutely 100% evidence that they used watermarked images in their training. Is that a problem, though? I wouldn’t think so since they’re not distributing those exact images. Just images that are “kinda sorta” similar.

        If you try to get an AI to output an image that matches someone else’s image nearly exactly… is that the fault of the AI or the end user, specifically asking for something that would violate another’s copyright (with a derivative work)?

        • Prandom_returns@lemm.ee
          link
          fedilink
          English
          arrow-up
          6
          arrow-down
          1
          ·
          2 days ago

          Sounds like a load of techbro nonsense.

          By that logic mirroring an image would suffice to count as derivative work since it’s “kinda sorta similar”. It’s not the original, and 0% of pixels match the source.

          “And the machine, it learned to flip the image by itself! Like a human!”

          It’s a predictive keyboard on steroids, let’s not pretent that it can create anything but noise with no input.