I eat words to AI • 1 year agoThe GPT-3 Architecture, on a Napkindugas.chmessage-square2fedilinkarrow-up128arrow-down11file-textcross-posted to: machinelearning
arrow-up127arrow-down1external-linkThe GPT-3 Architecture, on a Napkindugas.chI eat words to AI • 1 year agomessage-square2fedilinkfile-textcross-posted to: machinelearning
minus-squareBehohippylinkfedilink4•10 months agoI’ve got a background in deep learning and I still struggle to understand the attention mechanism. I know it’s a key/value store but I’m not sure what it’s doing to the tensor when it passes through different layers.
I’ve got a background in deep learning and I still struggle to understand the attention mechanism. I know it’s a key/value store but I’m not sure what it’s doing to the tensor when it passes through different layers.