• thedeadwalking4242@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    11 hours ago

    Try breaking each character in the string into its own token, it’ll have an easier time because it’ll actually know what the string is

  • Snot Flickerman@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    77
    ·
    edit-2
    1 day ago

    Management: Fuck it, ship it.


    The people at the top honestly don’t give a fuck if it barely works as long as it’s an excuse to cut costs. In things like Customer Service, barely working is a bonus, because it makes customers give up before they try to get their issue solved.

  • gravitas_deficiency@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    29
    ·
    edit-2
    1 day ago

    You know what? If your management is telling you to use AI generated code to “go faster”, just go ahead and do it. But fork the repo first, in case you’re still around when they get fired and someone sensible says to put it back how it was before.

  • kitnaht@lemmy.world
    link
    fedilink
    arrow-up
    51
    ·
    edit-2
    1 day ago

    I mean, I bet it failed at making a regex that worked much faster than you could fail at writing a regex that worked. Sounds like progress! :D

    • Karyoplasma@discuss.tchncs.de
      link
      fedilink
      arrow-up
      22
      ·
      1 day ago

      I am always suspicious if a regex I write doesn’t throw some form of pattern compilation error. It usually means I’m not even close to the correct solution.

  • andrew_bidlaw@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    8
    ·
    1 day ago

    As it learns from our data, no wonder it fucks up at regexps. They are the arcane knowledge not accessible to us mere mortals, nor to LLMs.

    • ryathal@sh.itjust.works
      link
      fedilink
      arrow-up
      9
      arrow-down
      1
      ·
      1 day ago

      If you know even a little about how an LLM works it’s obvious why regex is basically impossible for it. I suspect perl has similar problems, but no one is capable of actually validating that.

      • Ignotum@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        17 hours ago

        What do you mean it’s impossible for it? I know how LLMs work but I don’t know if any such limitations

        Write me a regex that matches a letter repeated four times, followed by a 3 or 4 digit number

        Here’s your regex: ([a-zA-Z])\1{3}\d{3,4}

        • ryathal@sh.itjust.works
          link
          fedilink
          arrow-up
          2
          ·
          16 hours ago

          They aren’t context aware, it’s using statistical probability. It can replicate things it’s seen a lot of like a tutorial regex. It can’t apply that to make a more complicated one. Regex in the wild isn’t really standard at all, because it’s rarely used to solve common problems. It has a bunch of random regexs from code it analyzed and will spit something out that looks similar.

  • fmstrat@lemmy.nowsci.com
    link
    fedilink
    English
    arrow-up
    7
    ·
    1 day ago

    I love regex. I know, most don’t, but I do. GPT/Claude can write some convincing code, but their regexes can be spotted a mile away.

  • cm0002@lemmy.world
    link
    fedilink
    arrow-up
    16
    ·
    edit-2
    1 day ago

    Just outta curiosity:

    Full o1 model

    “\\id:\[]]+\\\\[]]+\\\”

    Claude 3.5 Haiku:

    Never used elisp, no idea of any of this is right lmao

    • ChaoticNeutralCzech@feddit.org
      link
      fedilink
      English
      arrow-up
      8
      ·
      edit-2
      1 day ago

      o1 without Markdown misformatting:

      \\id:\\[^]]+\\\\\[^]]+\\\
      

      No idea what the rectangles are supposed to be, I just copy-pasted it

      • marcos@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        1 day ago

        They are valid unicode points that your font doesn’t know about.

        … or at least they represent that, but I think there’s a character that looks like one too.

        • ChaoticNeutralCzech@feddit.org
          link
          fedilink
          English
          arrow-up
          2
          ·
          24 hours ago

          It’s U+E001 from a Private Use Area. The UnicodePad app renders it as something between 鉮 and 鋁 (separate boxes stricken through; I wasn’t able to find it even with Google Lens)

    • Skullgrid@lemmy.world
      link
      fedilink
      arrow-up
      1
      arrow-down
      2
      ·
      1 day ago

      I swear to god,someone must have written an intermediary language between regex and actual programming, or I’m going to eventaully do it before I blow my fucking brains out.

      • balsoft
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        14 hours ago

        Any self-respecting language (that is, a functional one) has something like this. Here are some libraries: Haskell OCaml elisp

      • BassTurd@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        1 day ago

        How do you think that would look? Regex isn’t particularly complicated, just a bit to remember. I’m trying to picture how you would represent a regex expression in a higher level language. I think one of its biggest benefits is the ability to shove so much information into a random looking string. I suppose you could write functions like, startswith, endswith, alpha(4), or something like that, but in the end, is that better?

        • frezik@midwest.social
          link
          fedilink
          arrow-up
          4
          ·
          1 day ago

          People have unironically done that. No, it isn’t better. The fundamental mental model is the same.

          • balsoft
            link
            fedilink
            arrow-up
            1
            arrow-down
            1
            ·
            13 hours ago

            I honestly think it can be a lot more readable, especially when the regex would have been in the thousands of characters.

            • frezik@midwest.social
              link
              fedilink
              arrow-up
              1
              arrow-down
              1
              ·
              13 hours ago

              There’s a built-in feature that Perl has that only a few of the languages claiming PCRE have actually done, and it makes things a lot more readable. The /x modifier lets you put in whitespace and comments. That alone helps a lot if you stick to good indentation practices.

              If all other code was written like an obfuscated C contest, it would be horrible. For some reason, we put up with this on regex, and we don’t have to.

              https://wumpus-cave.net/post/2022/06/2022-06-06-how-to-write-regexes-that-are-almost-readable/index.html

              • balsoft
                link
                fedilink
                arrow-up
                1
                arrow-down
                1
                ·
                11 hours ago

                I agree, but then there’s also some other niceties that come from expression parsers in the language itself (as noted in the article): syntax highlighting, LSP, a more complete AST for editors like helix.

                • frezik@midwest.social
                  link
                  fedilink
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  11 hours ago

                  Syntax highlighting works fine as long as your language has a way to distinguish regexes from common strings. Another place where Perl did it right decades ago and the industry ignored it.

          • Skullgrid@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            1 day ago

            I want to see their unironic attempts, maybe they’re useful to me at least if they’re not better.

            The fundamental mental model is the same.

            It’s not the fundemental model that I have a problem with for Regex, it’s the fucking brainfuck tier syntax

        • Skullgrid@lemmy.world
          link
          fedilink
          arrow-up
          5
          arrow-down
          1
          ·
          1 day ago

          I suppose you could write functions like, startswith, endswith, alpha(4), or something like that,

          yes.

          but in the end, is that better?

          YES.

          startswith('text');
          lengthMustBe(5);
          onlyContain(CHARSETS.ALPHANUMERICS); 
          endswith('text');
          

          is much more legible than []],[.<{}>,]‘text’[[]]][][)()(a-z,0-9){}{><}<>{}‘text’{}][][

          • BassTurd@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            1 day ago

            Assuming “text” in your example is a placeholder for a 5 digit alpha string, it can be written like this in regex: /[a-zA-Z0-9]{5}/

            If ”text" is literal, then your statement is impossible.

            I think that when it gets to more complex expressions like a phone number with country code that accepts different formats, the verbosity of a higher level language will be more confusing, or at least more difficult to take in quickly.

            • frezik@midwest.social
              link
              fedilink
              arrow-up
              2
              ·
              13 hours ago

              Exactly. It’s a lot like Java to me. Looks readable on the surface, but it’s actually adding a bunch of crap you don’t need and does not help anything.

              They also have to implement a long list of features. These projects tend to focus on the handful of features the authors specifically use, and the rest get sent by the wayside. Taking the Melody language that was mentioned in another message, it hasn’t even fully implemented [^A] or [abc]. We’re not even talking about somewhat obscure stuff like zero width assertions or lookaheads. These are very basic.

  • dayna@lemmygrad.ml
    link
    fedilink
    arrow-up
    1
    arrow-down
    2
    ·
    1 day ago

    Gpt4-mini, the model that’s worse at everything but like 100 times smaller or whatever. Really cutting insight you have here.

    This is like interviewing the child of a programmer and hiring based off of that.

    Next you’ll be making hiring decisions based off of optical illusions and riddles.

    • MajorHavoc@programming.dev
      link
      fedilink
      arrow-up
      1
      ·
      1 day ago

      This is like interviewing the child of a programmer and hiring based off of that.

      I could land a job that way, but I’m just that fucking good. Lol.