bug-textutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: incorrect output from JOIN


From: Bob Proulx
Subject: Re: incorrect output from JOIN
Date: Fri Dec 6 06:38:03 2002
User-agent: Mutt/1.4i

Barry Gould <address@hidden> [2002-12-05 10:15:08 -0800]:
> 
> >What does the output of 'locale' say?
> 
> LANG=en_US.iso885915

First off let me say that I am not very knowledgeable about what the
intention of specific locale behavior is supposed to be.  But as an
outsider looking into locales it seems completely insane to me the
behavior of the en_US locale.  I don't know if this is specific to RH
systems or endemic to en_US in general as I have never compared it to
non-RH systems.  I just avoid en_US in general.

So I can't really say if the behavior is expected or not.  I just know
it is unreasonable.  And doubly unreasonable that a vendor might set
that for the user without letting them know that they are going to be
getting non-standard, and weirdly non-standard behavior at that, by
default.  And that they can only get normal behavior by specifically
saying they don't want the weird behavior.  That is just not friendly.

If you search the list archives of this list you will see that this
single issue accounts for a huge part of the entire list volume.  It
affects any program that has anything to do with sorting.

> >  export LC_ALL=POSIX
> 
> That solves the problem. (either LANG= or LC_ALL=POSIX)
> 
> Is it safe to permanently set that to POSIX?

Yes.  However, unsetting LANG should be sufficient as well and will
have identical effect.  The problem is not LANG itself but with the
en_US locale to which it was set.  Other locales appear fine.  At
least I have not heard of any problems with other locales.  But I
can't prove the nonexistence of something either so that statement is
meaningless.

A caveat is that setting LC_ALL=POSIX is a very US-centric view of the
world.  If you are not a native US english person then you will not be
able to see program messages in your native language.  Don't think
that it shouldn't be set to your native language.  The entire purpose
of locales is to enable internationalization.  As near as I can
determine this works for other locales.  There is nothing
intrinsically wrong with setting LANG.  Just make sure it is set to a
working locale.

It is a shame that en_US sounds like something an english speaking
person in the US would want.  But in reality it only causes trouble
and is the one locale that should be avoided.  What an irony!

> I just d/l'd & compiled coreutils 4.5.3 from gnu.org...
> it also has the same problem output.

Ahem, the problem is not with coreutils but rather with either the
setting of LANG because it was set to en_US at all, or with the en_US
tables on the operating system, neither of which are part of
coreutils.  I hope you don't mind me having a knee-jerk reaction to
that statement.

> Also, if I SORT the files, then I only get 2 lines output from join!
> (they were already sorted in some sense, but not by sort)
> Then, if I export LC_ALL=POSIX, and run join again, I only get 1 line of 
> output!
> (re-sorting fixes that)
> 
> I'm a bit confused as I would have assumed that if both files were sorted 
> in the same manner, then join shouldn't have any reason to have trouble 
> with them.

A very odd behavior!  At first guess I agree with you and would have
thought the same thing.  Oh well, it is broken, enough said.

> BTW, can someone let RedHat know about this? (Last time I emailed them 
> about a bug they were somewhat hostile as I don't have paid support.)

They don't listen to me either.

In any case, I am glad you have an understanding of the problem and
also know how to fix it.

Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]