puzzling out the proofs for concepts so utterly fundamental to math by myself that it’s like if Genesis 1:3 was And God said, ‘Let there be integer,’ and there was integer
puzzling out the proofs for concepts so utterly fundamental to math by myself that it’s like if Genesis 1:3 was And God said, ‘Let there be integer,’ and there was integer
huh, yeah, i guess that’s kinda how humans do long division?
he’s hopping right past the most interesting part, though. sure, doing math with byte logic is tricky, so you have to make approximations. so in order to divide by 9 the processor does some fixed point math and says to divide by 9 we’re actually going to multiply by 2^33 / 9 which equals that long number. but how does the processor know what 2^33/9 equals? how’s it doing that? sounds like it’s very good at division, because it would take me a while to work out that value even if i started with trying to find 8589934592/9, y’know?
more to the point, how does it do 0b1000000000000000000000000000000000 / 0b1001 = 0b111000111000111000111000111001 so quickly?
You should understand that “computers very bad at division” is a relative term, on modern desktop x86 chips a single 64-bit division takes somewhere on the order of 20-40 clock cycles (which seems pretty fast when you think about how they’re running at like 4 billion cycles per second, but these same chips can do integer multiplication in 2-3 cycles so division is painfully slow by comparison).
The reason multiplying by 2^33 / 9 is faster is because you don’t have to actually compute what 2^33 is and then divide by 9 every single time - the compiler can compute that value the “slow” way a single time and bake that into the machine code, and then when the program is running it only has to deal with multiplication.
While we don’t know exactly what techniques are being used in a modern Intel/AMD chip, as far as I know the current best approach is basically just long division.
EDIT: The reason long division is slow is that all the partial results depend on the results of previous steps. This means that your division circuit ends up being a really long line of comparator circuits chained end-to-end, and long circuits are bad because it takes a long time for the signal to actually reach the end. Multiplication is fast by comparison because you can compute all the partial products in parallel, and then add them together in a kind of tree shape. The end result is a circuit which is super wide, but the “propagation delay” (the time it takes until the last input signal reaches the end) is pretty low since there’s no path which passes through more than a few adders.