Hover Text:

Wait, forgot to escape a space. Wheeeeee[taptaptap]eeeeee!

Transcript

[in a yellow box:]
Whenever I learn a new skill I concoct elaborate fantasy scenarios where it lets me save the day.

Megan: Oh no! The killer must have followed her on vacation!
[Megan points to computer.]
Megan: But to find them we’d have to search through 200 MB of emails looking for something formatted like an address!
Cueball: It’s hopeless!

Off-panel voice: Everybody stand back.

Off-panel voice: I know regular expressions.

[A man swings in on a rope, toward the computer.]

tap tap
The word PERL! appears in a bubble.

[The man swings away, and the other characters cheer.]

  • otacon239@feddit.de
    link
    fedilink
    arrow-up
    33
    ·
    9 months ago

    I learned regex in a data entry job when receiving spreadsheets of user info they entered themselves. The number of times I went from a 40-minute by-hand solution to a 30-second one was astounding.

    Need to clean up capitalization? Regex.

    Need to fix all the phone numbers with dashes in them? Regex.

    Need to make sure all the emails are valid? Regex.

    So many hours of tedium saved. It really is one of the most powerful tools to have in your back pocket.

        • elvith@feddit.de
          link
          fedilink
          arrow-up
          16
          ·
          9 months ago

          Who’s gonna tell them? I’d do it but I’m still busy parsing HTML with regex… it’s working any minute now!

          • otacon239@feddit.de
            link
            fedilink
            arrow-up
            2
            ·
            9 months ago

            What am I missing? I typically used it as a sanity check and would vet the changes. Never as a one-click modify. Or is there something else I should know about?

              • otacon239@feddit.de
                link
                fedilink
                arrow-up
                6
                ·
                9 months ago

                Ah, yeah. It was never meant to be a be all and all. Just something to clean up the complete trash before I started proofreading. Besides, these were emails the customer provided and could easily be changed afterwords. Their fault if we get bad emails in the list ¯\_(ツ)_/¯

                • elvith@feddit.de
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  edit-2
                  9 months ago

                  You’re completely correct. In practice, it’s usually good enough to just check for “.+@.+” or “.+@.+\…+”. Why? It’s broad enough to allow almost everything and it rejects the most obvious typos. And in the end, the final verification would be to send an email there which contains a link, that one has to click to finalize the signup/change. Even if you had a regex that could filter every possible adress that’s possible according to the standard, you still wouldn’t know whether it really exists.

              • dev_null@lemmy.ml
                link
                fedilink
                arrow-up
                4
                ·
                9 months ago

                I wrote a regex that matches 100% of email addresses and had no problems using it. It’s “.+@.+”

                • Feathercrown@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  edit-2
                  9 months ago

                  Meme aside that’s what I’d use tbh. Or the ultimate email validation: just sending the signup email and if they typed an invalid email it won’t send

      • pantyhosewimp@lemmynsfw.com
        link
        fedilink
        arrow-up
        10
        ·
        9 months ago

        Looks like we found the intern who coded the check that rejects “trailingunderscoreisallowedyouasshole_@example.com”.

        Follow up by tech support successfully emailing me at that address to tell me to use a different email address.

      • anton@lemmy.blahaj.zone
        link
        fedilink
        arrow-up
        2
        ·
        9 months ago

        While email addresses are technically a regular language but I have seen a the regex that takes up a whole page claiming to be the first standard compliant one.

  • SatanicNotMessianic@lemmy.ml
    link
    fedilink
    arrow-up
    16
    ·
    9 months ago

    I’ve been in the industry for about 30 years now, and I still will break out some command line Perl when I need to regex something.

  • Diplomjodler@feddit.de
    link
    fedilink
    arrow-up
    10
    arrow-down
    6
    ·
    edit-2
    9 months ago

    I’ve never bothered to learn regex because I can do this sort of stuff easily in Python. For any non-trivial job it’s better to have things like error handling, statistics, logs and *gasp* readability.

        • BetterDev@programming.dev
          link
          fedilink
          arrow-up
          14
          ·
          9 months ago

          Regex is fast and useful though. It’s a tool in your toolbox that make certain situations extremely easy as the comic depicts.

            • DoYouNot@lemmy.world
              link
              fedilink
              arrow-up
              9
              ·
              9 months ago

              I guess I’m just not sure what your doing in Python that is equivalent in some way to regex without using the re library. Like, do you mean you’re using looped ifs and raw strings to do something similar?

        • pivot_root@lemmy.world
          link
          fedilink
          arrow-up
          3
          ·
          edit-2
          9 months ago

          Learning basic regex takes less than a day. Learning extended regex takes a day or two.

          If you’re often writing Python scripts to scan strings and match patterns, I can almost guarantee you’re actually wasting more of your time avoiding having to use regex.

          In fact, if you’re working with large datasets, you’re wasting even more time waiting for your scripts to finish. Python’s regular expression engine is written in C, which is considerably faster than plain Python.