I don’t know how Python 3.10’s string works internally. Is it choosing between 8-bit, 16-bit, and 32-bit per character in runtime?
For example:
for line in open('read1.py'):
print(line)
Can the line string be an 8-bit, 16-bit, or 32-bit character string in each iteration? Should the line be 8-bit by default and become a 32-bit string if that line has an emoji?
If they used UTF-8 internally, they wouldn’t need 4 versions of the split function.
https://github.com/python/cpython/blob/1402d2ceca8ccef8c3538906b3f547365891d391/Objects/unicodeobject.c#L9757