Unfortunately, due to the complexity and specialized nature of AVX-512, such optimizations are typically reserved for performance-critical applications and require expertise in low-level programming and processor microarchitecture.

  • zod000
    link
    fedilink
    arrow-up
    5
    ·
    edit-2
    16 days ago

    Someone else in the comments mentioned it is about 40% faster than the AVX-2 code and slightly more than twice as fast as the SSE3 code. That’s still a nice boost, but hopefully no one was relying on the radically slow unoptimized baseline.

    • thingsiplay@beehaw.org
      link
      fedilink
      arrow-up
      1
      ·
      16 days ago

      But my question is, how much faster is it that its written in assembly rather than “high” level language like C or Rust. I mean if the AVX-512 code was written in C, would it be 40% faster than AVX-2?