• LvxferreOP
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    2 years ago

    I mentioned this in another discussion, but even if Reddit is trying to milk the people paying to access the API for data models for NLP, this is still a bad move. Even the ones who stay behind in Reddit will be less engaged; and lack of engagement in this case means shorter, decontextualised sentences, conveying almost nothing. It’s the difference between a well-thought reply and a “lol”.

    So the data will quickly become useless, and even the ones who might pay for it at the start will eventually say “why bother? Reddit only adds noise to our models.” and stop paying.

    And it’s a bit off-topic, but about NLP, I think that brute-forcing (feed it more data) is counter-productive in the long run, too. Humans actually learn language (how to use it, not just how to utter grammatically sound but meaningless sentences) with considerably less exposure.

    • gish
      link
      fedilink
      arrow-up
      2
      ·
      2 years ago

      Yes, very good points. I am not a ML expert by any means, but it does seem like companies are in a bit of an arms race right now, and are just trying to grow large models without doing it properly.