Re: ".*utf\\(-?8\\)\\>" versus ".*[._]utf" versus "address@hidden>"

From: Dave Love
Subject: Re: ".*utf\\(-?8\\)\\>" versus ".*[._]utf" versus "address@hidden>"
Date: 21 Dec 2001 15:15:25 +0000
>>>>> Paul Eggert writes:

 > The regular expression ".*utf\\(-?8\\)\\>" in
 > locale-charset-language-names seems to be inconsistent with the
 > regular expression ".*[._]utf" in locale-preferred-coding-systems.
 > Shouldn't one or the other regular expression (or both) be changed?

 > I think the regular expression should contain [._]; I'm not so sure
 > about the \\(-?8\\)\\> part.

The utf-8 part is consistent with the other entries, isn't it?  I
assume it's appropriate to match a specification of simply `utf-8' to
set up the generic utf-8 language environment, like `iso-8859-1' & al.

I've seen suggestions that `utf' is sometimes used as a synonym for
`utf-8'; obviously I should have noted the source.  I doubt it's a big
deal to remove it if it's likely to cause problems.

 > Also, locale-charset-language-names ends with this:

 >      ("address@hidden>" . "Latin-9")
 >      (".*utf\\(-?8\\)\\>" . "UTF-8")))

 > Shouldn't the UTF-8 pattern come before the euro pattern and the other
 > patterns?  It seems to me that the current order mishandles locales
 > like "address@hidden", which are present on Solaris 8.

I guess so.  I think I just added it to the end without considering
the issue.  You're the expert.

Locales are a mess...

