bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uniq i18n implementation


From: Paul Eggert
Subject: Re: uniq i18n implementation
Date: Thu, 10 Aug 2006 16:21:49 -0700
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)

Pádraig Brady <address@hidden> writes:

> I was also using the string length comparison
> shortcut on the wide string. I'm unsure whether
> this is valid (on all platforms).

Me too, which is why the current code is cautious about this sort of thing.

>> Sorry, I'm not familiar with the ICU code.  Is it free software and is
>> it well maintained?  Where else is it being used, outside ICU itself?
>
> I am not familiar with it myself, but note
> it's used for various things in python, mozilla, openoffice, ...

OK, well, when we know more about it perhaps we can consider using it.

>> we might have "X" < "Y" < "Z" (using C-locale comparison), but "Z"
>> < "X" (using some other locale's comparison).  This will lead to
>> inconsistencies, which will be hard to document and will confuse
>> users.
>
> Garbage In Garbage Out.

Subject to memory limits programs like "sort" and "uniq" should work
on all inputs, not just the "nice" ones.

> As for confusing users my solution was to print
> a warning indicating the invalid input.

If that is the best we can do (and it is done in some places already)
then we'll do that.  But I'd prefer a more-general approach.

>> Worse, it can
>> even lead to buffer overruns: e.g., qsort has undefined behavior if
>> you pass it a comparison function that is not a total order.
>
> Thanks for pointing that out.
> I'll look into that.

The current coreutils code avoids the problem by using 'exit' or
'longjmp' to break out of 'qsort'/etc. when strcoll reports an error;
this avoid the undefined behavior.  It's kind of ugly.  This is partly
why I'd like the cleaner solution.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]