Re: RFE: Please allow unicode ID chars in identifiers

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFE: Please allow unicode ID chars in identifiers

From:	Chet Ramey
Subject:	Re: RFE: Please allow unicode ID chars in identifiers
Date:	Tue, 13 Jun 2017 20:26:25 -0400
User-agent:	Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.1.1

On 6/13/17 7:58 PM, L A Walsh wrote:
> 
> 
> Chet Ramey wrote:
>> On 6/2/17 6:23 PM, L A Walsh wrote:
>>
>>  
>>> As for unsupported systems, there is a reason they are no longer
>>> supported.  The world is already using UTF-8.  It's only a few
>>> luddites clinging to ascii as a last refuge. ;-)
>>>
>>> What display/OS do you have that you can't run UTF-8 on?
>>>     
>>
>> This is a red herring. de_DE.UTF-8 and zh_KH.UTF-8 don't use the same
>> character set.
>>   
> ---
>    The use the same encoding.  Whether or not they use
> the same character set is up to someone's preference.  Looking at my
> local fonts, it looks like 'Code2000' covers both of those ranges:
> Germany and Khmer ?  But using 1 font for both isn't necessary.

That's not relevant to the issue of whether or not a particular character
is classified as alphabetic in one locale and not another. That is the
largest issue with locale-specific identifiers.  I'm not as concerned
with how a particular character displays; that's not important to
interpreting the script.

> Forgive me if I'm misremembering, but hasn't Greg argued against
> the ability to supply "libraries" of re-usable scripts due to
> the ease with which names could conflict with each other and cause
> script incompatibilities?

I'm sure he has. It's a genuine problem without namespaces, so you have
to adopt some naming convention that provides pseudo-namespace
functionality.

> If it is the case that script libraries had access to unicode
> var & func names (and used it), wouldn't that significantly
> decrease the the chances of conflict?  Right now, what, ...
> maybe A-Za-z_0-9 + maybe a few others == that's about 64 chars?

63 x as many characters are in your identifier. You can easily choose
some prefix (like readline uses _rl_ and rl_) and reduce the potential
for clashes.

> Even if a character doesn't display in your locality, doesn't
> mean it wouldn't work -- i.e. if I don't have a Cryllic font
> installed, that doesn't mean the script wouldn't work -- as
> the characters would still be encoded as their Unicode values.

A character that is classified as being a valid alphabetic in one
locale may not be such in another, regardless of its encoding.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://cnswww.cns.cwru.edu/~chet/

[Prev in Thread]

Current Thread

[Next in Thread]

Re: RFE: Please allow unicode ID chars in identifiers, (continued)

Prev by Date: Re: Patch for unicode in varnames...
Next by Date: Re: RFE: Please allow unicode ID chars in identifiers
Previous by thread: Re: RFE: Please allow unicode ID chars in identifiers
Next by thread: Re: RFE: Please allow unicode ID chars in identifiers
Index(es):
- Date
- Thread