I’d say the fact that it was trained in FP8 is even bigger of a deal. Spending less than 6 million dollars to train something this good kinda changes how training is approached. Training has been a big problem because of the costs until now but this can make a big impact on how training can be approached.
For sure, it’s revolutionary in several ways at once. In general, this shows that we’ve really only scratched the surface with this tech. It’s hard to predict what other tricks people find in the coming years. Another very interesting approach I learned about recently is neurosymbolic architecture which combines deep learning with symbolic logic. The deep learning system is used to analyze raw data and identify patterns, then encode it into tokens that a symbolic logic system can use to do actual reasoning on the data. That addresses the key weakness of LLMs making it possible to actually have the system explain how it arrived at a solution.
I’d say the fact that it was trained in FP8 is even bigger of a deal. Spending less than 6 million dollars to train something this good kinda changes how training is approached. Training has been a big problem because of the costs until now but this can make a big impact on how training can be approached.
For sure, it’s revolutionary in several ways at once. In general, this shows that we’ve really only scratched the surface with this tech. It’s hard to predict what other tricks people find in the coming years. Another very interesting approach I learned about recently is neurosymbolic architecture which combines deep learning with symbolic logic. The deep learning system is used to analyze raw data and identify patterns, then encode it into tokens that a symbolic logic system can use to do actual reasoning on the data. That addresses the key weakness of LLMs making it possible to actually have the system explain how it arrived at a solution.