People are talking about the new Llama 3.3 70b release, which has generally better performance than Llama 3.1 (approaching 3.1’s 405b performance): https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_3
However, something to note:
Llama 3.3 70B is provided only as an instruction-tuned model; a pretrained version is not available.
Is this the end of open-weight pretrained models from Meta, or is Llama 3.3 70b instruct just a better-instruction-tuned version of a 3.1 pretrained model?
Comparing the model cards: 3.1: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md 3.3: https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md
The same knowledge cutoff, same amount of training data, and same training time give me hope that it’s just a better finetune of maybe Llama 3.1 405b.
Thank you so much, that exactly answers my question with the official response (that guy works at Meta) that confirms it’s the same base model!
I was concerned primarily because in the release notes it strangely didn’t mention it anywhere, and I thought it would have been important enough to mention.