[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mail-extract-address-components extract modified full name

From: Simon Josefsson
Subject: Re: mail-extract-address-components extract modified full name
Date: Tue, 27 Jul 2004 18:28:09 +0200
User-agent: Gnus/5.110003 (No Gnus v0.3) Emacs/21.3.50 (gnu/linux)

Stefan Monnier <address@hidden> writes:

>> like the approach you propose.  XEmacs users have reported even
>> Latin-2 problems with the current implementation (Emacs do not have
>> those problems, though, but it suggest the implementation could be
>> improved).
> [ The below is all "IIRC". ]
> The function is supposed to receive ASCII input, so it's no wonder it might
> break in other circumstances.  Why ASCII input?
> Because the way things are defined in the RFCs, you should split the address
> before doing the un-quoting of base64 and QP thingies.
> I.e. after unquoting, the string might not be parsable any more (because
> one of the QP chars could be a ", a \, a <, or something like that).
> So the usual answer is that if you call the function with non-ASCII input,
> you're not using it properly.  But of course, it's not that simple since you
> might want to call that function e.g. on an email message that is being
> written and that hasn't been QP-encoded yet.

I agree, and have been arguing the same thing when people complain
that mail-extr* cannot handle their weird input.

Unfortunately, it is a losing discussion, since I can't claim that
mail-extr* is only intended for use with all-ASCII valid RFC 822
input, since that isn't what it implement.  It is just a big hack, and
could be massaged into behaving (badly) for any purpose.

One example is that BBDB reportedly uses mail-extr* to split the
e-mail addresses it store locally, in ~/.bbdb, which naturally aren't
QP encoded.  This probably illustrate a class of applications, that
deal with mail addresses, but aren't proper mail reader or writer, so
it wouldn't make sense for them to use QP.

IMHO, there should be two packages:

1) Proper RFC (2)822 parser.  There is rfc822.el but it is
   insufficient, and I'm not sure it is correct -- it uses regexp's a
   lot, but I recall that the "correct" 2822 grammar, expressed as
   regexp's, is much more complex than what rfc822.el does.
   Naturally, it should only accept valid RFC 822 input, which is
   ASCII only.

   (Incidentally, the QP encoder/decoder need to use this package,
   since QP must only be applied to certain RFC 2822 grammatical
   terminals, not all text, and I believe the current QP
   encoder/decoder doesn't do this properly.)

2) Ad-hoc approach that split real world textual e-mail address,
   including non-ASCII, into its components.  Might use the proper
   parser, at least partially.  Perhaps similar to what Katsumi
   Yamaoka proposed.

When these two packages exist, each current uses of mail-extr* should
be investigated to find out what is really intended there.

At some point in time, I counted the number of functions in Emacs that
implement something similar than the mail-extr* functions do
(e.g. take a textual e-mail address and split it up) and found ~5-10
versions, all with their own problems.

Sadly, I keep writing rants about the situation instead of working on
solving it...  Perhaps partly that is because it is not straight
forward to solve this; you will probably have to implement one API
first, tinker with it to get experience with it, and then rewrite it
slightly, and so on.  Sounds like real work.  Perhaps someone else has
a clearer vision on how to implement it, and time to try it out.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]