So taking data without permission is bad, now?

Iā€™m not here to say whether the R1 model is the product of distillation. What I can say is that itā€™s a little rich for OpenAI to suddenly be so very publicly concerned about the sanctity of proprietary data.

The company is currently involved in several high-profile copyright infringement lawsuits, including one filed by The New York Times alleging that OpenAI and its partner Microsoft infringed its copyrights and that the companies provide the Timesā€™ content to ChatGPT users ā€œwithout The Timesā€™s permission or authorization.ā€ Other authors and artists have suits working their way through the legal system as well.

Collectively, the contributions from copyrighted sources are significant enough that OpenAI has said it would be ā€œimpossibleā€ to build its large-language models without them. The implication being that copyrighted material had already been used to build these models long before these publisher deals were ever struck.

The filing argues, among other things, that AI model training isnā€™t copyright infringement because it ā€œis in service of a non-exploitive purpose: to extract information from the works and put that information to use, thereby ā€˜expand[ing] [the worksā€™] utility.ā€™ā€

This kind of hypocrisy makes it difficult for me to muster much sympathy for an AI industry that has treated the swiping of other humansā€™ work as a completely legal and necessary sacrifice, a victimless crime that provides benefits that are so significant and self-evident that itā€™s wasnā€™t even worth having a conversation about it beforehand.

A last bit of irony in the Andreessen Horowitz comment: Thereā€™s some handwringing about the impact of a copyright infringement ruling on competition. Having to license copyrighted works at scale ā€œwould inure to the benefit of the largest tech companiesā€”those with the deepest pockets and the greatest incentive to keep AI models closed off to competition.ā€

ā€œA multi-billion-dollar company might be able to afford to license copyrighted training data, but smaller, more agile startups will be shut out of the development race entirely,ā€ the comment continues. ā€œThe result will be far less competition, far less innovation, and very likely the loss of the United Statesā€™ position as the leader in global AI development.ā€

Some of the industryā€™s agita about DeepSeek is probably wrapped up in the last bit of that statementā€”that a Chinese company has apparently beaten an American company to the punch on something. Andreessen himself referred to DeepSeekā€™s model as a ā€œSputnik momentā€ for the AI business, implying that US companies need to catch up or risk being left behind. But regardless of geography, it feels an awful lot like OpenAI wants to benefit from unlimited access to othersā€™ work while also restricting similar access to its own work.

  • maplebar@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    1
    Ā·
    3 days ago

    Data is replicable, doesnā€™t matter if you call it ā€œworkā€ or ā€œideasā€.

    Your mistake is thinking that ā€œdataā€ and ā€œcopyrightā€ or ā€œownshipā€ are the same thing. They arenā€™t

    You can download a song, and thus be in possession of the data of that song, and you can even copy the file within the parameters of copyright law.

    However, simply having the data is not the same thing as owning or holding a license to the song itself, and so you are in violation of the law (where I live, at least) if you try to distribute that song or use it in a non-fair-use context.

    IF you were to copy my work and exploit it in a for-profit context for millions of dollars (and you happened to be operating in a region in which applicable copyright laws happen to apply) youā€™re damn right I would come after you for a slice of the pie, and I would almost certainly win. Just copying what I say and pasting it in a quote isnā€™t something that I can prove damages on, because it isnā€™t something youā€™re profiting on in any way, so the idea of ā€œenforcingā€ it is irrelevant and obviously not worth it.

    I agree with you, corpos shouldnā€™t have this amount of power. But you wonā€™t get there by trying to protect the work of artists writers etc with the exact same scheme corpos pulled to protect their power and interests. Like, it didnā€™t work, did it?

    This is where we are going to have to disagree. I am absolutely willing to fight fire with fire by using the copyright system against big tech. I donā€™t make the rules, but IF rules are to exist in terms of what is or is not fair use of copyrighted material, then I DO expect those rules to apply equitably. (Whether they will or not remains to be seen, but letā€™s see what precedent gets set and Iā€™ll adapt from there.)

    No copyright for me, thanks

    Can I ask you a personal question: what do you create, and do you submit it to the public domain?

    As for me, I write music, create art, make games and write computer code and do a number of other things that I absolutely claim ownership over. So, when I write a song or paint a picture who the fuck is anyone else to try to take that away from me or claim it as something that they own and control? Iā€™ve written thousands of lines of GPL code and contributed to many hippy-dippy open source free software projects over my lifetime, and even in that kind of copyleft context we still maintain a copyright over the code we right (as seen at the top of every source and header file).

    I only ask because I find that the people who are most pro-AI and most anti-copyright are generally people who have never created anything of their ownā€“theyā€™ve written no songs, theyā€™ve drawn no pictures, theyā€™ve written no storiesā€“and now they incorrectly generative see AI as something that ā€œevens the playing fieldā€ by compensating for their lack of skills and drive.

    But Iā€™ll repeat myself, AI isnā€™t ushering us into a post-copyright world where the little guy is empowered in anyway. Itā€™s just a punch of useful idiots downloading completely proprietary binary blobs from the biggest, richest corporations, fooling themselves into thinking that theyā€™re being empowered to create things when in reality theyā€™re just beta testing a plagiarism machine on a industrial scale thatā€™s designed to enrich the richest.

    • dreadbeef@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      3
      Ā·
      3 days ago

      Iā€™m a software engineer and I have been playing guitar nearly every day since I was 8 years old. I release everything GPL/AGPL or CC-BY-SA that I own and can. Heck, I am racking every day trying to figure out ideas that can hopefully make me a living while also giving everything I have away. I donā€™t want to own my shit man, I just want to share what I have and hope itā€™s useful, and I donā€™t want people being assholes so I opt for the copyleft instead of liberal licenses.

      • maplebar@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        Ā·
        2 days ago

        Even in GPL and CC-BY-SA context you still retain copyright ownership over your work. I write GPL code for a modest living, and my real name copyright goes on everything I write. Likewise, your still asking to be credited in your CC-BY-SA music. Nothing wrong with that.

        The point being is that we are making a conscious decision to license the things we create in a permissive way. Neither of us are anonymously dumping our work into the public domain because clearly we do care about ownership and copyright. Thatā€™s well within our rights as creators.

        Generative AI is exploiting our work and not even doing the bare minimum of following the licenses that we shared them under.

        • dreadbeef@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          1
          Ā·
          edit-2
          2 days ago

          I only care about copyright while it exists. If copyright is abolished and thereā€™s no such thing as intellectual property then Iā€™m happy with that as well.

          No one should have the right of using police with guns to maintain the ownership of ideas and our culture. I donā€™t want this power. I donā€™t want you to have this power over my neighbors. I donā€™t want the Disney corporation having this power over my friends and family. But if capitalists wield that power against the common man, then Iā€™m not against the common man wielding it against the capitalist.

          Iā€™m okay with using copyright against only those that have more power than me. I donā€™t ever want to threaten a normal human being with it, only capitalists.

          Iā€™m not against all cases of generative ai, code or visual. Iā€™ve had legit use cases for them, and in a post scarce world we wouldnā€™t care about these things being made in a completely floss way. If copyright were to last only 15 years after publication then I think the world would be much better and we wouldnā€™t be having this conversation. But I wonā€™t argue ai stuff as it currently standsā€”it is a grift or a stockpiling.

    • proceduralnightshade@lemmy.ml
      link
      fedilink
      English
      arrow-up
      3
      Ā·
      3 days ago

      I just donā€™t like the premise of a market where one has to sell their artistic labor in order to survive, or thrive. Iā€™m on board with noncommercial licenses and everything because the reality looks different, but that was not my point. And neither was it the point of the original comment you replied to.