• cyd@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      2
      ·
      2 days ago

      That article is overblown. People need to configure their websites to be more robust against traffic spikes, news at 11.

      Disrespecting robots.txt is bad netiquette, but honestly this sort of gentleman’s agreement is always prone to cheating. At the end of the day, when you put something on the net for people to access, you have to assume anyone (or anything) can try to access it.

      • naught@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 day ago

        You think Red Hat & friends are just all bad sysadmins? Source hut maybe…

        I think there’s a bit of both: poorly optimized/antiquated sites and a gigantic spike in unexpected and persistent bot traffic. The typical mitigations do not work anymore.

        Not every site is and not every site should have to be optimized for hundreds of thousands of requests every day or more. Just because they can be doesn’t mean that it’s worth the time effort or cost.

    • interdimensionalmeme@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      2
      ·
      2 days ago

      In this case they just need to publish the code as a torrent. You wouldn’t setup a crawler if there was all the data in a torrent swarm.