Brian Eno: “The biggest problem about AI is not intrinsic to AI. It’s to do with the fact that it’s owned by the same few people”

cm0002@lemmy.world · 2 days ago

Brian Eno: “The biggest problem about AI is not intrinsic to AI. It’s to do with the fact that it’s owned by the same few people”

naught@sh.itjust.works · 2 days ago

AI scrapers illegally harvesting data are destroying smaller and open source projects. Copyright law is not the only victim

https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/

cyd@lemmy.world · 2 days ago

That article is overblown. People need to configure their websites to be more robust against traffic spikes, news at 11.

Disrespecting robots.txt is bad netiquette, but honestly this sort of gentleman’s agreement is always prone to cheating. At the end of the day, when you put something on the net for people to access, you have to assume anyone (or anything) can try to access it.

naught@sh.itjust.works · 1 day ago

You think Red Hat & friends are just all bad sysadmins? Source hut maybe…

I think there’s a bit of both: poorly optimized/antiquated sites and a gigantic spike in unexpected and persistent bot traffic. The typical mitigations do not work anymore.

Not every site is and not every site should have to be optimized for hundreds of thousands of requests every day or more. Just because they can be doesn’t mean that it’s worth the time effort or cost.

interdimensionalmeme@lemmy.ml · 2 days ago

In this case they just need to publish the code as a torrent. You wouldn’t setup a crawler if there was all the data in a torrent swarm.