• Lvxferre
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      1 year ago

      He tried it, in a rather dumb way, comparing whole strings; e.g. 123 Main St, Brooklyn, NY 11217 vs. 124 Main St, Brooklyn, NY 11217.

      It’s silly because his whole approach to the problem was assumptive. It’s fine to say “I don’t know”, or to code a program that does it. And yet he’s trying to dichotomise the program’s output to “same” vs. “different”.

      • superfes@beehaw.org
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        I’ve never done Levenshtein on numbers, it seems like a silly thing to do.

        Somehow I had skipped over that part of the text, danke.

        • Lvxferre
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          1 year ago

          Yup - it’s stupid. The catch is that text is yet another example of people hyping generative bots and trying to “sell” the idea as the solution for everything and a bit more; and one of the ways to do that is to make the alternative look worse than it is, for example incorrectly using the other tools at your disposal.

          Even then I wouldn’t use fuzzy string matching here, it’s bound to introduce more false positives than it’s worth. Such as Ant Street and Aunt Street matching (Levenshtein distance = 1). In those cases it’s simply better to say “dunno”.