help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to compare strings?


From: David Kastrup
Subject: Re: How to compare strings?
Date: Mon, 30 Apr 2007 00:25:59 +0200
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.98 (gnu/linux)

Eli Zaretskii <eliz@gnu.org> writes:

>> From: David Kastrup <dak@gnu.org>
>> Date: Sun, 29 Apr 2007 18:23:09 +0200
>> 
>> how do I compare strings in the sort order of the current language
>> environment?
>
> I don't understand the question.  I'm sure you are aware that in the
> Emacs internal representation of strings, each character has a
> distinct codepoint.  That is, unlike outside Emacs, where the same
> code can stand for different characters depending on the locale
> (because each locale assumes a certain default encoding of text),
> inside Emacs Latin-1 è and Latin-2 č are two different characters
> represented by two different codes, even though their respective 8-bit
> encodings are identical (\350 or hex E8).

And?

> In the above example, these two internal codes are 2280 and 2408
> decimal.  (In Emacs 23, these codes will change, but will still be
> different.)
>
> Thus, as long as the string was decoded correctly, comparing such
> strings is a simple matter of using string< and its ilk.

But it does not establish the sort order of a language, but rather the
sort order of Unicode (or MULE) code points.  Something entirely
different.

>> Does Emacs have a concept of sort order depending on language?  If
>> not, why not?
>
> Because characters that have different order depending on the
> language have different codepoints inside Emacs, and thus the issue
> doesn't exist.
>
> Or am I missing something?

You are seemingly talking about something entirely different.  I can't
even make sense of your explanations.

Different languages have different orders of sorting characters.  Look
up the man pages of strcoll and strxfrm.  Pick up sometelephone
directories or dictionaries of such languages.  Please note that this
is only partly related to the coding scheme (utf-8/latin-1 etc).

For example, in some languages, accented letters will be right behind
the corresponding unaccented letter.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum


reply via email to

[Prev in Thread] Current Thread [Next in Thread]