Re: strip accents and sorting [was: BibTeX issues]

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strip accents and sorting [was: BibTeX issues]

From:	Roland Winkler
Subject:	Re: strip accents and sorting [was: BibTeX issues]
Date:	Fri, 30 Aug 2019 11:27:33 -0500

On Thu Aug 29 2019 martin rudalics wrote:
>  > But (string-lessp "ä-umlaut" "ö-combine") gives nil
> 
> But (string-collate-lessp "ä-umlaut" "ö-combine") gives t

...not for me, which is likely due to my locale LC_COLLATE=C

I could use instead, say, LC_COLLATE=en_US.utf8.  Then the above
call of string-collate-lessp yields t.  But this also implies case
folding and ignoring dots in directory listings, which is not what I
want.  In other words, these locales have too many features bundled
together.

Maybe these feature sets of different locales are documented
*somewhere* in a neat way, and there is a locale with a feature set
that does exactly what I want.  But to the best of my knowledge this
documentation resides outside emacs so that things get rather
complicated when this affects an emacs session in important or
possibly subtle ways.

> so it should be fairly easy to fix `sort-lines' and friends
> accordingly.

In that sense I am not sure I would like to see `sort-lines' and
friends be fixed "accordingly".  If at all, I'd vote for a user
option that likely I'd use to disable such things.

On the other hand, as Eli pointed out in his reply about accented
characters being represented via a single character as compared to
using combining characters

> The Unicode Standard mandates that they be handled identically,
> including in searching and sorting.  We don't yet implement that
> 100%, but see char-fold.el for a partial (and not very efficient)
> implementation during search.

So I would assume that the locale should not matter at all in the
context of unicode combining characters. (Or there should be a way
to control exactly this aspect of unicode combining characters with
no additional (mis)features bundled with it.)

I understand that it is a different matter how accented characters
are sorted relative to each other and also relative to un-accented
characters.  So it can make a lot of sense to have different locales
for that aspect.

Maybe I am missing something here.  (And I have not yet looked in
more detail at char-fold.el mentioned by Eli, which could be a
better way to go within the emacs world.)

Roland

[Prev in Thread]

Current Thread

[Next in Thread]

BibTeX issues, Joost Kremers, 2019/08/27
- Re: BibTeX issues, Roland Winkler, 2019/08/28
  - Re: BibTeX issues, Eli Zaretskii, 2019/08/28
    - strip accents and sorting [was: BibTeX issues], Roland Winkler, 2019/08/28
    - Re: strip accents and sorting [was: BibTeX issues], martin rudalics, 2019/08/29
    - Re: strip accents and sorting [was: BibTeX issues], Roland Winkler <=
    - Re: strip accents and sorting [was: BibTeX issues], Eli Zaretskii, 2019/08/30
    - Re: strip accents and sorting [was: BibTeX issues], Eli Zaretskii, 2019/08/30
    - Re: strip accents and sorting [was: BibTeX issues], Roland Winkler, 2019/08/30
    - Re: strip accents and sorting [was: BibTeX issues], Eli Zaretskii, 2019/08/30
    - Re: strip accents and sorting [was: BibTeX issues], Roland Winkler, 2019/08/30
    - Re: strip accents and sorting [was: BibTeX issues], Eli Zaretskii, 2019/08/31
    - Re: strip accents and sorting [was: BibTeX issues], Eli Zaretskii, 2019/08/29
    - Re: strip accents and sorting [was: BibTeX issues], Roland Winkler, 2019/08/30
  - Re: BibTeX issues, Joost Kremers, 2019/08/29
    - Re: BibTeX issues, Roland Winkler, 2019/08/30

Prev by Date: Re: [Emacs-diffs] master a4144af 1/2: Prefer ~/.config/emacs to ~/.emacs.d if neither exists
Next by Date: Re: strip accents and sorting [was: BibTeX issues]
Previous by thread: Re: strip accents and sorting [was: BibTeX issues]
Next by thread: Re: strip accents and sorting [was: BibTeX issues]
Index(es):
- Date
- Thread