Encryption-breaking, password-leaking bug in many AMD CPUs could take months to fix

pe1uca@lemmy.pe1uca.dev · 2 years ago

Encryption-breaking, password-leaking bug in many AMD CPUs could take months to fix

AbelianGrape@beehaw.org · edit-2 2 years ago

I think the mitigations are acceptable, but for people who don’t want to worry about that, yes, it could put them off choosing AMD.

To reiterate what Tavis Ormandy (who found the bug) and other hardware engineers/enthusiasts say, getting these things right is very hard. Modern CPUs apply tons of tricks and techniques to go fast, and some of them are so beneficial that we accept that they lead to security risks (see Spectre and Hertzbleed for example). We can fully disable those features if needed, but the performance cost can be extreme. In this case, the cost is not so huge.

Plus, even if someone were to attack your home computer specifically, they’d have to know how to interpret the garbage data that they are reading. Sure, there might be an encryption key in there, but they’d have to know where (and when) to look*. Indeed, mitigations for attacks like spectre and hertzbleed typically include address space randomization, so that an attacker can’t know exactly where to look.

With Zenbleed, the problem is caused by something relatively simple, which amounts to a use-after-free of an internal processor resource. The recommended mitigation at the moment is to set a “chicken bit,” which makes the processor “chicken out” of the optimization that allocates that resource in the first place. I took a look at one of AMD’s manuals and I’d guess for most code, setting the chicken bit will have almost no impact. For some floating-point heavy code, it could potentially be major, but not catastrophic. I’m simplifying by ignoring the specifics but the concept is actually entirely accurate.

* If they are attacking a specific encrypted channel, they can just try every value they read, but this requires the attack to be targeted at you specifically. This is obviously more important for server maintainers than for someone buying a processor for their new gaming PC.

SatanicNotMessianic · 2 years ago

For some floating-point heavy code, it could potentially be major, but not disastrous.

That’s a really interesting point (no pun intended)

I had run into a few situations where a particular computer architecture (eg, the Pentiums for a time) had issues with floating point errors and I remember thinking about them largely the same way. It wasn’t until later that I started working in complexity theory, by which time I completely forgot about those issues.

a one of the earliest discoveries in what would eventually become chaos and complexity theory was the butterfly effect. Edward Lorenz was doing weather modeling back in the 60s. The calculations were complex enough that the model could have to be run over several sessions, starting and stopping with partial results at each stage. Internally, the computer model used six significant figures for floating point data. When Lorenz entered the parameters to continue his runs, he used three sig figs. He found that the trivial difference in sig digs actually led to wildly different results. This is due to the nature of systems that use present states to determine next states and which also have feedback loops and nonlinearities. Like most complexity folks, I learned and told the story many times over the years.

I’ve never wondered until just now whether anyone working on those kinds of models ran into problems with floating point bugs. I can imagine problematic scenarios, but I don’t know if it ever actually happened or if it would have been detected. That would make for an interesting study.

AbelianGrape@beehaw.org · edit-2 2 years ago

These would be performance regressions, not correctness errors. Specifically, some false dependencies between instructions. The result of that is that some instructions which could be executed immediately may instead have to wait for a previous instruction to finish, even though they don’t actually need its result. In the worst case, this can be really bad for performance, but it doesn’t look like the affected instructions are too likely to be bottlenecks. I could definitely be wrong though; I’d want to see some actual data.

The pentium fdiv bug, on the other hand, was a correctness bug and was a catastrophic problem for some workloads. It was absolutely detected - it was found outside of Intel!

SatanicNotMessianic · 2 years ago

Thanks for the clarification!

I remember having to learn about fp representations in a numerical analysis class and some of the things you had to worry about back then, but by the time I ended up doing work where I’d actually have to worry about it, most of the gotchas had been taken care of so I largely stopped paying attention to the topic.