Re: proper names

From: Paul Eggert
Subject: Re: proper names
Date: Wed, 06 Sep 2006 11:40:58 -0700
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)

Nice idea.  A couple of thoughts that I hope can help improve it.

First, the problem occurs not only with proper names, but with any
text string that is English but not ASCII.  English mostly uses plain
ASCII, but there are exceptions.  Many of the exceptions are words
like naïve that most people will accept in ASCII versions, but some
English words (e.g., soupçon) simply do not look right in ASCII.

I assume that the module, perhaps with some relatively-minor changes
(please see below), can handle this more general problem as well, so
perhaps its name should be more general as well.  E.g., "nonascii"
rather than "propername".

Second, this code does not look right:

>       /* See whether the translation contains the original name.  */
>       if (strstr (translation, name) != NULL)

Some translations might contain the name accidentally.
As an extreme case, the proper name "Z" (see
might have a translation that contains the letter "Z".

Conversely, perhaps the translation might contain the original proper
name using slightly-different letters, which look nicer but which
strstr thinks are different.  For example, the original proper name
might be "Georgia O'Keefe" but the translation might say "<something
or other> (Georgia O’Keefe)", with a right single quotation mark
(U+2019) rather than an apostrophe (U+0027).  We don't want the output
to be "<something or other> (Georgia O’Keefe) (Georgia O'Keefe)".

If the convention is that translations of the form "X (Y)" are to be
treated specially, I suggest that code simply look for the last
character being ')', rather than looking for an instance of "Y" in the
translation.  But I think it'd be simpler and better overall just to
ask translators to supply the original if they desire it.  After all,
"(" and ")" might not be correct anyway, or it might not be correct to
put the original name second, or whatever.

This change would simplify the code and allow it to be the
more-general, so that it can handle arbitrary non-ASCII strings rather
than just proper names.

