[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#26029: Problems with join
From: |
Assaf Gordon |
Subject: |
bug#26029: Problems with join |
Date: |
Thu, 9 Mar 2017 17:20:43 +0000 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
Hello Reuti and all,
Reuti wrote:
[…] The strange thing seems to be, that "-j1 2" is handled like "-1
2".
My investigations revealed: on a Mac the man page of `join` explains
the behavior. The options -j, -j1 and -j2 are listed with the BSD
version of `join` as being there for compatibility. This leads to the
assumption, that nowadays -1 and -2 should better be used.
Thanks for investigating and pointing this out!
Join's manual section was recently expanded, I wish I was aware of
this nuance before I wrote the patch. I will send a patch with
improved documentation.
On Thu, Mar 09, 2017 at 05:29:13PM +0100, Reuti wrote:
Reuti wrote:
Am 09.03.2017 um 16:32 schrieb Peter Kluge <address@hidden>:
I prefer the "POSIX"-Standard teaching to my participants.
Aha, I didn't check this. Then the "-j" option should be moved to a new section
"Deprecated" in the man/info page of the coreutils version too. (And mention the special
handling of -j1 resp. -j2, while -j3 … works as one expects.)
I would humbly suggest other wording: I'm not sure '-j' is deprecated.
It is useful, and does work as expected in most cases.
But, it should be better documented to warn against this edge-case.
Reuti wrote:
-j FIELD equivalent to '-1 FIELD -2 FIELD'
does not work in all cases essentially.
It 'just works' in most cases, but indeed we should improve the
documentation about edge cases.
First,
this is the relevant section that handles the '-j' parameter:
https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/join.c#n1079
Second,
Let's ensure '-jN' works in the common cases,
when it is *not* followed by a number:
Two input files:
$ cat a.txt
1 2 3 aaa
2 3 4 bbb
$ cat b.txt
1 2 3 XXX
2 3 4 YYY
'-j1' alone is equivalent to '-1 1 -2 1':
$ join -1 1 -2 1 a.txt b.txt
1 2 3 aaa 2 3 XXX
2 3 4 bbb 3 4 YYY
$ join -j1 a.txt b.txt
1 2 3 aaa 2 3 XXX
2 3 4 bbb 3 4 YYY
'-j2' alone is equivalent to '-1 2 -2 2':
$ join -1 2 -2 2 a.txt b.txt
2 1 3 aaa 1 3 XXX
3 2 4 bbb 2 4 YYY
$ join -j2 a.txt b.txt
2 1 3 aaa 1 3 XXX
3 2 4 bbb 2 4 YYY
'-j3' alone is equivalent to '-1 3 -2 3':
$ join -1 3 -2 3 a.txt b.txt
3 1 2 aaa 1 2 XXX
4 2 3 bbb 2 3 YYY
$ join -j3 a.txt b.txt
3 1 2 aaa 1 2 XXX
4 2 3 bbb 2 3 YYY
So, in the most common cases, '-jN' works for all Ns
(for "all" being 1,2,3 but really, who needs more than 3 numbers? :) ).
This is perhaps not like BSD's join.
Now comes the tricky part:
If the '-j1' or '-j2' is followed by another parameter,
and that parameter turns out *not* to be an valid field number,
It is treated like '-j 1' (or '-1 1 -2 1'), and join just "does the
right thing":
$ join -j2 -i a.txt b.txt
2 1 3 aaa 1 3 XXX
3 2 4 bbb 2 4 YYY
This is implemented here:
https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/join.c#n1171
And the result is that most of the time, join "just works" (IMHO, but
other opinions welcomed).
If the '-j1' or '-j2' is followed by a number, this is were the
unexpected behaviour occurs, as it sets the key field for that file
alone. E.g. '-j1 2' is equivalent to '-1 2' (and the key for the second
file is not set, thus defaults to 1):
$ join -j1 2 a.txt b.txt
2 1 3 aaa 3 4 YYY
$ join -1 2 a.txt b.txt
2 1 3 aaa 3 4 YYY
Is the above a satisfactory explanation?
If so, it'll be more-or-less what I'll add to the manual.
I see that this has been implemented back in 2005, here:
https://git.savannah.gnu.org/cgit/coreutils.git/commit/src/join.c?id=f9118c1c2e35b
with the comment:
"Parse obsolete options -j1 and -j2
so that it is a pure extension to POSIX 1003.1-2001."
I can perhaps guestimate that since this usage is never
mentioned anywhere, it is considered undocumented and discouraged usage
(and indeed, I don't think I've ever encountered it, or previously
saw a bug-report or question about it - so it's rather rare).
We could add a warning to the man page - what do others think?
regards,
- assaf
bug#26029: Problems with join, Bernhard Voelker, 2017/03/08