bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16468: join


From: Eric Blake
Subject: bug#16468: join
Date: Thu, 16 Jan 2014 11:10:11 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0

[re-adding the list, with permission]

On 01/16/2014 10:46 AM, barry kesner wrote:
> Eric,
>   Thanks for response.
>  I now realize it wants sorted alpha input not numerical.  999 1000 1001 is
> how it is sorted.

I think there have been requests in the past to enhance 'join' so that
it can have more fine-tuned control over how its fields are selected.
Maybe something like sharing code so that 'join -1 k1,1n' would behave
like it were using 'sort -k1,1n' sorting on file 1.  But right now, that
functionality doesn't exist.

> 
>   How do you tell join this without resorting.  The files are huge!

Unfortunately, there isn't any really good way, short of re-processing
the files to make the data appear sorted in the order join expects.
That said, it certainly appears that for your given data, you can write
a sed filter that can reprocess on a line-by-line basis, and feed that
into join, without the penalty of having to re-sort the entire file and
without having to have the processed file stored in your file system all
at once.  It also seems possible to write a post filter to get back to
the style of the line in the original file.  Here, extensions such as bash's
  join <(infilter file1) <(infilter file2) | outfilter
make it easier to type (where the trick is to now write the correct sed
scripts to serve as infilter and outfilter) than the alternative of
having to use named fifos for limiting yourself to just POSIX semantics.

> 
> I can't find LC_COLLATE?

It's an environment variable, like LC_ALL, that affects your locale.
Running 'locale' will show you your current locale settings, including
LC_COLLATE.  Setting LC_ALL in the environment is shorthand that forces
all other categories to behave the same, so it's easier to test whether
'LC_ALL=C command' has an effect than it is to figure out which locale
category(ies) matter.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]