• LalSalaamComradeOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      16 days ago

      No, I am interested in only implementations. I’ve come across a few such, for example:

      • the Pascal string (also probably known as the 2-byte length string)
      • the alt-byte terminated (used by CDC and ZX80)
      • the bit method (in pre-60s era mainframes)
      • the record method (aka the struct method you were talking about, which is probably the default C++/Rust implementation), etc.

      Are there any other custom data structures that are faster and also at the same time safer than the default? What about ropes?

      • Arthur BesseA
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        16 days ago

        CBOR uses variable-sized length prefixes. Strings zero to 23 bytes long require just one byte of overhead, after that it becomes two bytes for strings up to length 255, and 3 bytes of overhead for strings up to 65535. Above that, it requires 5 bytes of overhead, which is probably enough for strings up to at least a few hundred GB, though I didn’t test that far.

        click to see how i empirically determined those numbers

        $ python -c 'import cbor; overhead=0; print({ length:overhead for length in range(65537) if overhead < (overhead:=len(cbor.dumps("a"*length))-length) })'

        {0: 1, 24: 2, 256: 3, 65536: 5}

  • smpl@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    1
    ·
    15 days ago

    Yes and no. Sometimes a NUL terminated string allow you to make the simplest algorithms. Apart from NUL terminated strings I use structs with a buffer pointer and length or one with a start and an end pointer when that makes the implementation simpler. NULL terminated arrays are also often an efficient way to make your algorithms simple. Go for the data representation that allow you to make the simplest algorithms.