Intro

Having read through the macros section of “The Book” (Chapter 19.6), I thought I would try to hack together a simple idea using macros as a way to get a proper feel for them.

The chapter was a little light, and declarative macros (using macro_rules!), which is what I’ll be using below, seemed like a potentially very nice feature of the language … the sort of thing that really makes the language malleable. Indeed, in poking around I’ve realised, perhaps naively, that macros are a pretty common tool for rust devs (or at least more common than I knew).

I’ll rant for a bit first, which those new to rust macros may find interesting or informative (it’s kinda a little tutorial) … to see the implementation, go to “Implementation (without using a macro)” heading and what follows below.

Using a macro

Well, “declarative macros” (with macro_rules!) were pretty useful I found and easy to get going with (such that it makes perfect sense that they’re used more frequently than I thought).

  • It’s basically pattern matching on arbitrary code and then emitting new code through a templating-like mechanism (pretty intuitive).
  • The type system and rust-analyzer LSP understand what you’re emitting perfectly well in my experience. It really felt properly native to rust.

The Elements of writing patterns with “Declarative macros”

Use macro_rules! to declare a new macro

Yep, it’s also a macro!

Create a structure just like a match expression

  • Except the pattern will match on the code provided to the new macro
  • … And uses special syntax for matching on generic parts or fragments of the code
  • … And it returns new code (not an expression or value).

Write a pattern as just rust code with “generic code fragment” elements

  • You write the code you’re going to match on, but for the parts that you want to capture as they will vary from call to call, you specify variables (or more technically, “metavariables”).
    • You can think of these as the “arguments” of the macro. As they’re the parts that are operated on while the rest is literally just static text/code.
  • These variables will have a name and a type.
  • The name as prefixed with a dollar sign $ like so: $GENERIC_CODE.
  • And it’s type follows a colon as in ordinary rust: $GENERIC_CODE:expr
    • These types are actually syntax specifiers. They specify what part of rust syntax will appear in the fragment.
    • Presumably, they link right back into the rust parser and are part of how these macros integrate pretty seamlessly with the type system and borrow checker or compiler.
    • Here’s a decent list from rust-by-example (you can get a full list in the rust reference on macro “metavariables”):
      • block
      • expr is used for expressions
      • ident is used for variable/function names
      • item
      • literal is used for literal constants
      • pat (pattern)
      • path
      • stmt (statement)
      • tt (token tree)
      • ty (type)
      • vis (visibility qualifier)

So a basic pattern that matches on any struct while capturing the struct’s name, its only field’s name, and its type would be:

macro_rules! my_new_macro {
    (
        struct $name:ident {
            $field:ident: $field_type:ty
        }
    )
}

Now, $name, $field and $field_type will be captured for any single-field struct (and, presumably, the validity of the syntax enforced by the “fragment specifiers”).

Capture any repeated patterns with + or *

  • Yea, just like regex
  • Wrap the repeated pattern in $( ... )
  • Place whatever separating code that will occur between the repeats after the wrapping parentheses:
    • EG, a separating comma: $( ... ),
  • Place the repetition counter/operator after the separator: $( ... ),+

Example

So, to capture multiple fields in a struct (expanding from the example above):

macro_rules! my_new_macro {
    (
        struct $name:ident {
            $field:ident: $field_type:ty,
            $( $ff:ident : $ff_type: ty),*
        }
    )
}
  • This will capture the first field and then any additional fields.
    • The way you use these repeats mirrors the way they’re captured: they all get used in the same way and rust will simply repeat the new code for each repeated captured.

Writing the emitted or new code

Use => as with match expressions

  • Actually, it’s => { ... }, IE with braces (not sure why)

Write the new emitted code

  • All the new code is simply written between the braces
  • Captured “variables” or “metavariables” can be used just as they were captured: $GENERIC_CODE.
  • Except types aren’t needed here
  • Captured repeats are expressed within wrapped parentheses just as they were captured: $( ... ),*, including the separator (which can be different from the one used in the capture).
    • The code inside the parentheses can differ from that captured (that’s the point after all), but at least one of the variables from the captured fragment has to appear in the emitted fragment so that rust knows which set of repeats to use.
    • A useful feature here is that the repeats can be used multiple times, in different ways in different parts of the emitted code (the example at the end will demonstrate this).

Example

For example, we could convert the struct to an enum where each field became a variant with an enclosed value of the same type as the struct:

macro_rules! my_new_macro {
    (
        struct $name:ident {
            $field:ident: $field_type:ty,
            $( $ff:ident : $ff_type: ty),*
        }
    ) => {
        enum $name {
            $field($field_type),
            $( $ff($ff_type) ),*
        }
    }
}

With the above macro defined … this code …

my_new_macro! {
    struct Test {
        a: i32,
        b: String,
        c: Vec<String>
    }
}

… will emit this code …

enum Test {
    a(i32),
    b(String),
    c(Vec<String>)
}

Application: “The code” before making it more efficient with a macro

Basically … a simple system for custom types to represent physical units.

The Concept (and a rant)

A basic pattern I’ve sometimes implemented on my own (without bothering with dependencies that is) is creating some basic representation of physical units in the type system. Things like meters or centimetres and degrees or radians etc.

If your code relies on such and performs conversions at any point, it is way too easy to fuck up, and therefore worth, IMO, creating some safety around. NASA provides an obvious warning. As does, IMO, common sense and experience: most scientists and physical engineers learn the importance of “dimensional analysis” of their calculations.

In fact, it’s the sort of thing that should arguably be built into any language that takes types seriously (like eg rust). I feel like there could be an argument that it’d be as reasonable as the numeric abstractions we’ve worked into programming??

At the bottom I’ll link whatever crates I found for doing a better job of this in rust (one of which seemed particularly interesting).

Implementation (without using a macro)

The essential design is (again, this is basic):

  • A single type for a particular dimension (eg time or length)
  • Method(s) for converting between units of that dimension
  • Ideally, flags or constants of some sort for the units (thinking of enum variants here)
    • These could be methods too
#[derive(Debug)]
pub enum TimeUnits {s, ms, us, }

#[derive(Debug)]
pub struct Time {
    pub value: f64,
    pub unit: TimeUnits,
}

impl Time {
    pub fn new<T: Into<f64>>(value: T, unit: TimeUnits) -> Self {
        Self {value: value.into(), unit}
    }

    fn unit_conv_val(unit: &TimeUnits) -> f64 {
        match unit {
            TimeUnits::s => 1.0,
            TimeUnits::ms => 0.001,
            TimeUnits::us => 0.000001,
        }
    }

    fn conversion_factor(&self, unit_b: &TimeUnits) -> f64 {
        Self::unit_conv_val(&self.unit) / Self::unit_conv_val(unit_b)
    }

    pub fn convert(&self, unit: TimeUnits) -> Self {
        Self {
            value: (self.value * self.conversion_factor(&unit)),
            unit
        }
    }
}

So, we’ve got:

  • An enum TimeUnits representing the various units of time we’ll be using
  • A struct Time that will be any given value of “time” expressed in any given unit
  • With methods for converting from any units to any other unit, the heart of which being a match expression on the new unit that hardcodes the conversions (relative to base unit of seconds … see the conversion_factor() method which generalises the conversion values).

Note: I’m using T: Into<f64> for the new() method and f64 for Time.value as that is the easiest way I know to accept either integers or floats as values. It works because i32 (and most other numerics) can be converted lossless-ly to f64.

Obviously you can go further than this. But the essential point is that each unit needs to be a new type with all the desired functionality implemented manually or through some handy use of blanket trait implementations

Defining a macro instead

For something pretty basic, the above is an annoying amount of boilerplate!! May as well rely on a dependency!?

Well, we can write the boilerplate once in a macro and then only provide the informative parts!

In the case of the above, the only parts that matter are:

  • The name of the type/struct
  • The name of the units enum type we’ll use (as they’ll flag units throughout the codebase)
  • The names of the units we’ll use and their value relative to the base unit.

IE, for the above, we only need to write something like:

struct Time {
    value: f64,
    unit: TimeUnits,
    s: 1.0,
    ms: 0.001,
    us: 0.000001
}

Note: this isn’t valid rust! But that doesn’t matter, so long as we can write a pattern that matches it and emit valid rust from the macro, it’s all good! (Which means we can write our own little DSLs with native macros!!)

To capture this, all we need are what we’ve already done above: capture the first two fields and their types, then capture the remaining “field names” and their values in a repeating pattern.

Implementation of the macro

The pattern

macro_rules! unit_gen {
    (
        struct $name:ident {
            $v:ident: f64,
            $u:ident: $u_enum:ident,
            $( $un:ident : $value:expr ),+
        }
    )
}
  • Note the repeating fragment doesn’t provide a type for the field, but instead captures and expression expr after it, despite being invalid rust.

The Full Macro

macro_rules! unit_gen {
    (
        struct $name:ident {
            $v:ident: f64,
            $u:ident: $u_enum:ident,
            $( $un:ident : $value:expr ),+
        }
    ) => {
        #[derive(Debug)]
        pub struct $name {
            pub $v: f64,
            pub $u: $u_enum,
        }
        impl $name {
            fn unit_conv_val(unit: &$u_enum) -> f64 {
                match unit {
                $(
                    $u_enum::$un => $value
                ),+
                }
            }
            fn conversion_factor(&self, unit_b: &$u_enum) -> f64 {
                Self::unit_conv_val(&self.$u) / Self::unit_conv_val(unit_b)
            }
            pub fn convert(&self, unit: $u_enum) -> Self {
                Self {
                    value: (self.value * self.conversion_factor(&unit)),
                    unit
                }
            }
        }
        #[derive(Debug)]
        pub enum $u_enum {
            $( $un ),+
        }
    }
}

Note the repeating capture is used twice here in different ways.

  • The capture is: $( $un:ident : $value:expr ),+

And in the emitted code:

  • It is used in the unit_conv_val method as: $( $u_enum::$un => $value ),+
    • Here the ident $un is being used as the variant of the enum that is defined later in the emitted code
    • Where $u_enum is also used without issue, as the name/type of the enum, despite not being part of the repeated capture but another variable captured outside of the repeated fragments.
  • It is then used in the definition of the variants of the enum: $( $un ),+
    • Here, only one of the captured variables is used, which is perfectly fine.

Usage

Now all of the boilerplate above is unnecessary, and we can just write:

unit_gen!{
    struct Time {
        value: f64,
        unit: TimeUnits,
        s: 1.0,
        ms: 0.001,
        us: 0.000001
    }
}

Usage from main.rs:

use units::Time;
use units::TimeUnits::{s, ms, us};

fn main() {

    let x = Time{value: 1.0, unit: s};
    let y = x.convert(us);

    println!("{:?}", x);
    println!("{:?}", x);
}

Output:

Time { value: 1.0, unit: s }
Time { value: 1000000.0, unit: us }
  • Note how the struct and enum created by the emitted code is properly available from the module as though it were written manually or directly.
  • In fact, my LSP (rust-analyzer) was able to autocomplete these immediately once the macro was written and called.

Crates for unit systems

I did a brief search for actual units systems and found the following

dimnesioned

dimensioned documentation

  • Easily the most interesting to me (from my quick glance), as it seems to have created the most native and complete representation of physical units in the type system
  • It creates, through types, a 7-dimensional space, one for each SI base unit
  • This allows all possible units to be represented as a reduction to a point in this space.
    • EG, if the dimensions are [seconds, meters, kgs, amperes, kelvins, moles, candelas], then the Newton, m.kg / s^2 would be [-2, 1, 1, 0, 0, 0, 0].
  • This allows all units to be mapped directly to this consistent representation (interesting!!), and all operations to then be done easily and systematically.

Unfortunately, I’m not sure if the repository is still maintained.

uom

uom documentation

  • This might actually be good too, I just haven’t looked into it much
  • It also seems to be currently maintained

F#

Interestingly, F# actually has a system built in!

  • SorteKanin@feddit.dk
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 months ago

    Right you can always do vec![Length<YourUnitHere>; 5] and then you have a vec of 5 length values with a certain unit (or you could do a simple array but then you can’t change the length). But you won’t be able to have different units in different values in that array/vec, since that would be different types and you can’t have different types in an array/vec. You could have different types in a tuple, but then you can’t vary the length.

    These limitations are usually not a problem, especially not for something like units where you can easily convert between them.

    • maegul (he/they)OPM
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      Oh yea … that works too of course!!

      But you won’t be able to have different units in different values in that array/vec,

      Oh that wasn’t the aim. The idea behind thinking about arrays was more to head in the direction of having unit-types along with numpy style arrays (ndarray being the only crate I know of for such a tool) … so that calculations and arithmetic with scalars and arrays can get pretty seamless, but with the safety of a units system too.