nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] EAI?


From: Ken Hornstein
Subject: Re: [Nmh-workers] EAI?
Date: Sun, 09 Aug 2015 01:58:18 -0400

>Should nmh try to get out in front with email address
>internationalzation (EAI)?  See resources below.

I've thought about what it would take.

>From the MUA perspective, IIUC, it relies on native support on the
>host to handle unencoded UTF-8 addresses.  Would nmh support just be a
>matter of 1) not encoding addresses (controlled by a switch) in
>outgoing messages and 2) when showing a message, indicating that an
>address couldn't be displayed?

I think it's slightly more complicated than that (see below).

>Does anyone have experience using it?  Gmail supports it, according
>to the article below.

I think the lack of people with such an address means it's pretty uncommon
still, right?

Lyndon writes later:

>Since we require a Posix environment, that means utf8 locale support must 
>be in place, thus all the OS bits are there waiting to be used.
>
>But to do this properly we really need to overhaul the code base to 
>process everything internally as utf8.  That's not a trivial task, but we 
>have to do it, sooner or later.

Here are my unformed thoughts:

- It's not so easy to deal with characters that aren't in your native locale
  using the POSIX API; xlocale make this easier, but it's a pain.

- A super-brief scan suggests to me that SMTPUTF8 support is not widespread
  at this point.  But that will no doubt change.

- Right now our address parser will reject stuff that contains 8-bit
  characters; we need to fix that.  In fact, we need to throw out that
  address parser and get a new one; I made some progress on that using
  flex and bison.

- It's unclear to me how much UTF-8 verification a MUA is supposed to deal
  with; are we, for example, supposed to check for overlong UTF-8 encodings?
  Valid UTF-8 sequences?

- I do not believe we have to process everything internally as UTF-8, but I
  could be persuaded I'm wrong.  The real kicker is the format engine;
  right now we sort-of cheat a lot. %(decode) basically does a one-stop
  decoding and conversion to the native character set.  This has a lot of
  advantages, but also means we need to sit down and decide what the
  format engine is really supposed to be working on; for example, is the
  format engine supposed to be dealing with strings pre or post RFC-2047
  decoding?

- SMTPUTF8 looks relatively straightforward to implement, at least.

- I would rather not make ICU or IDN a build requirement, but it may be
  unavoidable.

--Ken



reply via email to

[Prev in Thread] Current Thread [Next in Thread]