• dualmindblade [he/him]@hexbear.netOP
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Interesting. Just a clarification, overfitting and memorization are not quite the same thing to my understanding. Overfitting is when a model memorizes rather than generalizing, but very large models can and will do both. If you ask an image generator for “a reproduction of starry night by van gogh hanging on the wall”, or a LLM to complete “to be or not to be, that is _” you are referring to something very specific that you’d like reproduced exactly. If the model outputs what you wanted you would call that memorization but not overfitting. Still you may want to suppress memorization and you certainly don’t want overfitting. Side note, massively overparameterized models are better at both memorization and generalization and are naturally resistant to overfitting as I define it, that last thing would have surprised early ML researchers since they had noticed the opposite trend, but that trend reverses when you go large enough. Also, they will sometimes memorize on a single pass through the data, even if there’s no duplication, which is quite remarkable.