Training a model takes more power than what? Generating a single poem? Using it to generate an entire 4th grade class’s essays? To answer all questions in Hawaii for 6th months? What is the scale? The break even point for training is far far less than total usage.
Have you ever used one locally? Depending on your hardware it’s anywhere between glacially to a morgue’s AC slow. To the average person on the average computer it is nearly unusable, relative to the instant gratification of the web interface.
That gives you a sense of the resources required to do the task at all, but it doesn’t scale linearly. 2 computers aren’t twice as fast as one. It’s logarithmic. With diminishly returns. In the end, this means one 100 word response uses the equivalent of 3 bottles of water.
How many queries are made per hour? How does that scale over time with increased usage of the same model? More than training a model. A lot more.
Training a model takes more power than what? Generating a single poem? Using it to generate an entire 4th grade class’s essays? To answer all questions in Hawaii for 6th months? What is the scale? The break even point for training is far far less than total usage.
Have you ever used one locally? Depending on your hardware it’s anywhere between glacially to a morgue’s AC slow. To the average person on the average computer it is nearly unusable, relative to the instant gratification of the web interface.
That gives you a sense of the resources required to do the task at all, but it doesn’t scale linearly. 2 computers aren’t twice as fast as one. It’s logarithmic. With diminishly returns. In the end, this means one 100 word response uses the equivalent of 3 bottles of water.
How many queries are made per hour? How does that scale over time with increased usage of the same model? More than training a model. A lot more.