• Lvxferre
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    1 year ago

    He tried it, in a rather dumb way, comparing whole strings; e.g. 123 Main St, Brooklyn, NY 11217 vs. 124 Main St, Brooklyn, NY 11217.

    It’s silly because his whole approach to the problem was assumptive. It’s fine to say “I don’t know”, or to code a program that does it. And yet he’s trying to dichotomise the program’s output to “same” vs. “different”.

    • superfes@beehaw.org
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      I’ve never done Levenshtein on numbers, it seems like a silly thing to do.

      Somehow I had skipped over that part of the text, danke.

      • Lvxferre
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        1 year ago

        Yup - it’s stupid. The catch is that text is yet another example of people hyping generative bots and trying to “sell” the idea as the solution for everything and a bit more; and one of the ways to do that is to make the alternative look worse than it is, for example incorrectly using the other tools at your disposal.

        Even then I wouldn’t use fuzzy string matching here, it’s bound to introduce more false positives than it’s worth. Such as Ant Street and Aunt Street matching (Levenshtein distance = 1). In those cases it’s simply better to say “dunno”.