Our data engineer insists in lowercasing everything and removing some other formatting like new lines on free text fields.

They say it’s “better for elastic search”.

To me that makes no sense and loses information that can’t be added back. But I couldn’t really convince them otherwise. So far no real problem has come out of it but it makes for a worse experience for the user. Like company names that are acronyms show up as all lowercase. (ibm, llc, etc.) or free text fields that we miss when the user wrote in caps or added paragraphs.

What are your thoughts on this?

Disclaimer, I’m not a data engineer. Just a PM from a data related product.

  • CaptainBuckleroy@lemm.ee
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    10 months ago

    The answer to your question is extremely use-case specific, and sounds like something to discuss with others at your workplace.

    • Taringano@lemm.eeOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      That’s fair.

      When would that be useful?

      Consider we have no space restriction nor need for absurd speeds. All our competitors stpre the data as it was originally inputted (we share data sources, theirs display nice ours displays all lowercase and etc, as mentioned.)

      • CaptainBuckleroy@lemm.ee
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        10 months ago

        Got it, useful info.

        I’m a software engineer, but here’s a bunch of stuff to consider, in no particular order.

        Maybe the data engineer isn’t the one to convince?

        If it saves time, how much time? Would tools (I’m using the term tools broadly here) you use work differently? (Such as analytics for IBM Ibm and ibm counting differently).

        Is there a solution that’s the best of both worlds? If space isn’t an issue can the text be preserved somehow linked to each entry? The formatted text is used for elastic search, but the original text is preserved?

        Maybe “convincing” isn’t the right approach, but learning is?