I read the guy’s blog post on SuperHOT and it sounded like it didn’t increase perplexity and kept perplexity super low with large contexts. I could have read it wrong but I thought it wasn’t supposed to increase perplexity.
The increase in perplexity is very small, but there is still some with 8K content. But it seems like with 2K its much larger. I could be misunderstanding something myself. But my little test with 2K context does suggest there’s something going on with 2K contexts on SuperHOT models
I read the guy’s blog post on SuperHOT and it sounded like it didn’t increase perplexity and kept perplexity super low with large contexts. I could have read it wrong but I thought it wasn’t supposed to increase perplexity.
The increase in perplexity is very small, but there is still some with 8K content. But it seems like with 2K its much larger. I could be misunderstanding something myself. But my little test with 2K context does suggest there’s something going on with 2K contexts on SuperHOT models