help-circle
  • @brombek
    link
    22 years ago

    Good post, even if dated, things are still very relevant.

    MS is still using BOMs and UCS-2 for most things (including queries and strings in SQL Server!), Java uses UCS-2 for strings in memory. Rust uses UTF-8 as the main string type and can convert them to 32bit Unicode code points and back.

    I sometimes get CP-1252 encoded events from Windows 2019 Server that I have to convert to UTF-8 before storing in logs DB! I also have lots of old emails encoded in who knows what sequence of encodings.

    Plan 9 is all UTF-8 as it is where it was first created/used. Most FLOSS is now using UTF-8 by default or is compatible.

    Also same as with HTML the email body can contain encoding header, make sure it is sent before Subject header or things may not render correctly.