[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nmh-workers] nmh architecture discussion: format engine character s
From: |
Ken Hornstein |
Subject: |
Re: [Nmh-workers] nmh architecture discussion: format engine character set |
Date: |
Tue, 11 Aug 2015 12:28:39 -0400 |
>I am in no way an expert on this. But, I won't let that stop me.
Welcome to the club! I think we're all in the same boat in that
regards.
>It seems to me that the only solution is to use Unicode internally.
>Disgusting as it seems to those of us who are old enough to hoard
>bytes, we might want to consider using something other than UTF-8
>for the internal representation. Using UTF-16 wouldn't be horrible
>but I recall that the Unicode folks made a botch of things so that
>one really needs 24 bits now, which really means using 32 internally.
AFAICT ... there is probably no advantage in using UTF-16 or UTF-32
versus UTF-8.
People might think that you gain something because with UTF-16 two
bytes == 1 character. But that's only true for things in the Basic
Multilingual Plane, and people are now telling us 🖕 because they want
to send emoji in email which are NOT part of the BMP, which means we
have to start dealing with 💩 like surrogate pairs. And really, even
with just the BMP combining characters toss that idea out of the window
UTF-32 lets you say 4 bytes == 1 character ... but do we care about
'characters' or 'column positions'?
So given that, I think sticking with UTF-8 is preferrable; it has the
nice property that we can represent text as C strings and it's just
ASCII if you're living in a 7-bit world.
>On the output side, we just have to do the best we can if characters in
>the input locale can't be represented in the output locale. This is
>independent of the internal representation.
Well, this works great if your locale is UTF-8. But ... what happens
if your email address contains UTF-8, and your locale setting is
ISO-8859-1?
--Ken
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, (continued)
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Christian Neukirchen, 2015/08/12
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Anthony J. Bentley, 2015/08/10
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Ken Hornstein, 2015/08/10
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Oliver Kiddle, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Ken Hornstein, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Jon Steinhart, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set,
Ken Hornstein <=
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Ken Hornstein, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Earl Hood, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Earl Hood, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Jon Steinhart, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Ken Hornstein, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Ralph Corderoy, 2015/08/12
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Ken Hornstein, 2015/08/12
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Ralph Corderoy, 2015/08/12
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Earl Hood, 2015/08/11
- Re: [Nmh-workers] nmh architecture discussion: format engine character set, Anthony J. Bentley, 2015/08/11