Several years ago, we decided that it was time to support SIMD code in .NET. We introduced the System.Numerics namespace with Vector2, Vector3, Vector4, Vector<T>, and related types. These types expose a general-purpose API for creating, accessing, and operating on them using hardware vector instructions (when available). They also provide a software fallback for when the hardware does not provide the appropriate instructions. This enabled a number of common algorithms to be vectorized, often with only minor refactorings. However, the generality of this approach made it difficult for programs to take full advantage of all vector instructions available on modern hardware. Additionally, modern hardware often exposes a number of specialized non-vector instructions that can dramatically improve performance. In this blog post, I’m exploring how we’ve addressed this limitation in .NET Core 3.0.