The biggest problem about AI is not intrinsic to AI. It’s to do with the fact that it’s owned by the same few people, and I have less and less interest in what those people think, and more and more criticisms of what the effect of their work has been.
Folks, it’s just capitalism. Always has been.
I’m on board with Eno here. There are issues with the current wave of generative AI, but the main reason they are such huge problems is the people who are pushing these technologies have zero concern for the potential social harms they could cause or exacerbate.
I’m very much in agreement with Eno here, actually. I could imagine a world very easily in which LLMs and image generators didn’t just “have use cases,” but was actually revolutionary in more than a few of those cases. A world in which it was used well, for good things.
But we don’t live in that world. We live in one where it was almost entirely born under and shaped by megacorps. That’s never healthy to anything at all, be it new tech or be it the people using it. The circumstances in which LLMs and generative models were developed was such that nobody should be surprised that we got what we did.
I think that in a better world, image generation could’ve been used for prototyping, fun, or enabling art from those without the time to dedicate to a craft. It could’ve been a tool like any other. LLMs could’ve had better warnings against their hallucinations, or simply have been used less for overly-serious things due to a lack of incentive for it, leaving only the harmless situations. Some issues would still exist – I think training a model off small artists’ work without consent is still wrong, for example – but no longer would we face so much of things like intense electrical usage or de-facto corporate bandwagon-jumping and con-artistry, and the issues that still happened wouldn’t be happening at quite such an industrial scale.
It reminds me how before the “AI boom” hit, there was a fair amount of critique against copyright from leftists or FOSS advocates. There still is, to be sure; but it’s been muddied now by artists and website owners who, rightfully so, want these companies to not steal their work. These two attitudes aren’t incompatible, but it shows a disconnect all the same. And in that disconnect I think we can do well to remember an alternate chain of events wherein such a dissonance might’ve never occurred to begin with.
I think that in a better world, image generation could’ve been used for prototyping, fun, or enabling art from those without the time to dedicate to a craft. It could’ve been a tool like any other. LLMs could’ve had better warnings against their hallucinations, or simply have been used less for overly-serious things due to a lack of incentive for it, leaving only the harmless situations.
I think that world still exists, but it’s going to take corporations tripping over their hundreds of billions of dollars and burning it in a dumpster fire, before they realize that the technology isn’t theirs to keep and own.
No. It’s one of the many, many problems, even if it wasn’t shovelling money into rich wide dude’s mouths and trained on stolen data, it would still use absurd amounts of energy, could be used for propaganda / further eradicate what is true, make creative people loose their jobs, make normal people rely on machines that sell hallucinations for facts etc. etc. This shit is mostly terrible.
They don’t even use absurd amounts of energy. I literally have one running on my computer at home
Yep, I can generate a bunch of funny pictures on Stable Diffusion in a couple of minutes.
So, make it not owned by the same people.
Support open-source models. Most of this shit are research papers.
The definition of “open source” AI sucks. It could just mean that the generated model weights are shared under an open source license. If you don’t have the code used to train the model under an open source license, or you can’t fully reproduce the model using the code they share and open source datasets, then calling a model “open source” feels weird as hell to me.
At the same time, I don’t know of a single modern model that only used training data that was taken with informed consent from the creators of that data.
tbf the widely used nomenclature for them is “open weights”, specifically to draw that distinction. There are genuinely open source models, in that the training data and everything is also documented, just not as many.
The OSI doesn’t require open access to training data for AI models to be considered “open source”, unfortunately. https://opensource.org/ai/open-source-ai-definition
I agree that “open weights” is a more apt description, though
If you are familiar with the concept of an NP-complete problem, the weights are just one possible solution.
The Traveling Salesman Problem is probably the easiest analogy to make. It’s as though we’re all trying to find the shortest path through a bunch of points (ex. towns), and when someone says “here is a path that I think is pretty good”, that is analogous to sharing network weighs for an AI. We can then all openly test that solution against other solutions and determine which is “best”.
What they aren’t telling you is whether people traveling that path somehow benefits them (maybe they own all the gas stations on that path. Or maybe they’ve hired highway men to rob people on that path). And figuring out if that’s the case in a hyper-dimensional space is non-trivial.
uh sure. My point is that sharing weights is analogous to sharing a compiled binary, not source code.
Yes, and I don’t like the common comparison to binary blobs, and I’m attempting to explain why.
It is inherently safer to blindly run weights than it is to blindly execute a binary. The issues only arrise if you are then blindly trusting the outputs from the AI. But you should already have something in place to sanitize outputs and limit permissions, even for the most trustworthy weights.
It’s basically like hiring someone and wondering if they’re Hydra; no matter how deep your background check is, they could always decide to spontaneously defect and try to sabotage you. But that won’t matter if their decisions are always checked against enough other non-Hydra employees.
I don’t know of a single modern model that only used training data that was taken with informed consent from the creators of that data.
I don’t know of a single modern human that only used training data that was taken with informed consent from the creators of that data.
Sorry, just to be clear, are you equating a human learning to an organization scraping creative works as inputs for their software?
Everything is a remix of a copy of derivative works. They learn from other people, from other artists, from institutions that teach entire industries. Some of it had “informed consent”, some of it was “borrowed” from other ideas without anybody’s permission. We see legal battles on a monthly basis as to whether these four notes are too similar to these other four notes. Half of the movies are reboots, and some of them were actually itself another reboot a few times over.
“Good artist copy, great artist steal.”
No human has truly had an original thought in their head ever. It’s all stolen knowledge, whether we realize it’s stolen or not. In the age of the Internet, we steal now more than ever. We pass memes to each other with stolen material. Journalists copy entire quotes from other journalists, who then create other articles about some guy’s Twitter post, who actually got the idea from Reddit, and that article gets posted on Facebook. And then when it reaches Lemmy, we post the whole article because we don’t want the paywall.
We practice Fair Use on a regular basis by using and remixing images and videos into other videos, but isn’t that similar to an AI bot looking at an image, figuring out some weights from it, and throwing it away? I don’t own this picture of Brian Eno, but it’s certainly sitting in my browser cache, and Brian Eno and Getty Images didn’t grant me “informed consent” to do anything with it. Did I steal it? Is it legally or morally unethical to have it sitting on my hard drive? If I photoshop it and turn it into a dragon with a mic on a mixing board, and then pass it around, is that legally or morally unethical? Fair Use says no.
It’s folly to pretend that AI should be held to a standard that we ourselves aren’t upholding.
A: “Hey, did you just download all the movies of 2024?”
B: “Yeah”
A: “That’s illegal”
B: “No no, it’s not illegal. I’m gonna train my AI model. So it’s fine”
A: “With your i3 PC with 4 GB of RAM?”
B: “I’ll get better hardware later on. I might have to watch them to understand what to use for training”
A: “As long as you use it for training, it’s fine. Make sure to not watch it for your entertainment”
B: “Yeah sure, bye”
Legally, the one hosting the material is the one who is punished, not the downloader. Though, if they are using torrent software, they are both downloading and hosting. Copyright law doesn’t give a shit why they’re watching it.
If I downloaded pictures on the internet by visiting a web site, I’m not going to suddenly get punished because it’s copyrighted. Otherwise, every one of us is now in trouble because of the linked article.
are there any models where the training software, hyperparameters, seed and full training set is released?
No? Then open source models don’t exist.
Stable Diffusion. Why do you think all of these models exist?
The biggest problem with Ai is copyright.
The biggest problem
with Aiis copyright.FTFY. Copyright is a tool of the rich. It is not for you.
What a dumb statement. Copyright has nothing to do with money. Copyright is for everyone.
enforcing it generally requires money
People and small indie companies have lost entire fortunes trying to enforce or defend completely obvious copyright cases.
Something we desperately need to abolish.
I disagree. Copyright is not bad.
Not without throwing a whole lot of other concepts in the trash (like capitalism and paywalling basic human necessities)
Capitalism will outlast copyright. They’ll adapt. The rich have existed for millennia and there’s no reason to think they would roll over as soon as copyright dies.
But, the more they push AI as their digital savior, the more copyright-free content they create. AI is just hastening copyright’s demise.