1-bit LLM performs similarly to full-precision Transformer LLMs with the same model size and training tokens but is much more efficient in terms of latency, memory, throughput, and energy consumption.

☆ Yσɠƚԋσʂ ☆ · 11 months ago

1-bit LLM performs similarly to full-precision Transformer LLMs with the same model size and training tokens but is much more efficient in terms of latency, memory, throughput, and energy consumption.

will_a113 · 11 months ago

It’s actually 1.58bits weirdly. The addition of 0 here was the significant change/improvement in this experiment. The paper isn’t too dense and has some decent tables that explain things fairly accessibly.