• @k_o_t
    26 months ago

    certainly more weights contain more general information, which is pretty useful if you’re using a model is a sort of secondary search engine, but models can be very performant in certain benchmarks while containing little general data

    this isn’t really by design, up until now (and it’s still continuing to be that way), it’s just that we don’t know how to create an LLM, which can generate coherent text without absorbing a huge portion of the training material

    i’ve tried several models based on facebook’s llama LLMs, and i can say that the 13B and definitely 30B versions are comparable to chatGPT in terms of quality (maybe not in terms of the amount of information it has access to, but definitely in other regards)