• LalSalaamComradeOP
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    edit-2
    14 days ago

    I am talking about modern, or slightly dated-but-easy-to-implement alternatives to C string, like for example, the pointer+length encoding method in Rust, (which is also called record method, I think?), or the Pascal string method.

    • Blue_Morpho@lemmy.world
      link
      fedilink
      arrow-up
      7
      ·
      14 days ago

      You answered your own question. Strings with length are better than null terminated. It is a mistake in the original C language library and probably a hack because the pdp11 used asciz format.

      • letsgo@lemm.ee
        link
        fedilink
        arrow-up
        2
        ·
        14 days ago

        Lower performance though. At each iteration through the string you need to compare the length with a counter, which if you want strings longer than 255 characters will have to be multibyte. With NTS you don’t need the counter or the multibyte comparison, strings can be indefinitely long, and you only need to check if the byte you just looked at is zero, which most CPUs do for free so you just use a branch-if-[not-]zero instruction.

        The terminating null also gives you a fairly obvious visual clue where the end of the string is when you’re debugging with a memory dump. Can you tell where the end of this string is: “ABCDEFGH”? What about now: “ABCD\0EFGH”?

        • Blue_Morpho@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          14 days ago

          It’s lower performance in the one situation of iterating on an 8bit ASCII string for programs written 30 years ago but faster in more common uses. Multibyte doesn’t matter when everything is 64 bit. A 64 bit length counter is long enough for everything but the most edgy of edge cases. You take a performance hit if you aren’t aligned.

          Can you tell where the end of this string is: “ABCDEFGH”? What about now: “ABCD\0EFGH”?

          No because unicode and binary formats means a string can contain anything.

    • SubArcticTundra
      link
      fedilink
      arrow-up
      3
      ·
      14 days ago

      Another alternative I’ve seen is strings that are not null terminated but where the allocated memory actually begins at ptr[-1] and contains the length of the string. The benefit is that you still get a char array starting at ptr[0].

      • LalSalaamComradeOP
        link
        fedilink
        English
        arrow-up
        3
        ·
        14 days ago

        But wouldn’t this be potentially unsafe? What programming language has this type of implementation, by the way?

      • tunetardis@lemmy.ca
        link
        fedilink
        English
        arrow-up
        1
        ·
        14 days ago

        This reminds me of when I had to roll my own dynamic memory allocator for an obscure platform. (Something I never want to do again!) I stuck metadata in the negative space just before the returned pointer like you say. In my case, it was complicated by the fact that you had to worry about the memory alignment of the returned pointer to make sure it works with SIMD and all that. Ugh. But I guess with strings (or at least 8-bit-encoded strings), alignment should not be an issue.