[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [coreutils] join feature: auto-format
From: |
Pádraig Brady |
Subject: |
Re: [coreutils] join feature: auto-format |
Date: |
Thu, 06 Jan 2011 12:05:01 +0000 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 |
On 07/10/10 19:25, Pádraig Brady wrote:
> On 07/10/10 18:43, Assaf Gordon wrote:
>> Pádraig Brady wrote, On 10/07/2010 06:22 AM:
>>> On 07/10/10 01:03, Pádraig Brady wrote:
>>>> On 06/10/10 21:41, Assaf Gordon wrote:
>>>>>
>>>>> The "--auto-format" feature simply builds the "-o" format line
>>>>> automatically, based on the number of columns from both input files.
>>>>
>>>> Thanks for persisting with this and presenting a concise example.
>>>> I agree that this is useful and can't think of a simple workaround.
>>>> Perhaps the interface would be better as:
>>>>
>>>> -o {all (default), padded, FORMAT}
>>>>
>>>> where padded is the functionality you're suggesting?
>>>
>>> Thinking more about it, we mightn't need any new options at all.
>>> Currently -e is redundant if -o is not specified.
>>> So how about changing that so that if -e is specified
>>> we operate as above by auto inserting empty fields?
>>> Also I wouldn't base on the number of fields in the first line,
>>> instead auto padding to the biggest number of fields
>>> on the current lines under consideration.
>>
>> My concern is the principle of "least surprise" - if there are existing
>> scripts/programs that specify "-e" without "-o" (doesn't make sense, but
>> still possible) - this change will alter their behavior.
>>
>> Also, implying/forcing 'auto-format' when "-e" is used without "-o" might be
>> a bit confusing.
>
> Well seeing as -e without -o currently does nothing,
> I don't think we need to worry too much about changing that behavior.
> Also to me, specifying -e EMPTY implicitly means I want
> fields missing from one of the files replaced with EMPTY.
>
> Note POSIX is more explicit, and describes our current operation:
>
> -e EMPTY
> Replace empty output fields in the list selected by -o with EMPTY
>
> So changing that would be an extension to POSIX.
> But I still think it makes sense.
> I'll prepare a patch soon, to do as I describe above,
> unless there are objections.
The attached changes `join` (from what's done on other platforms) so that...
`join -e` will automatically pad missing fields from one file
so that the same number of fields are output from each file.
Previously -e was only used for missing fields specified with -o or -j.
With this change join now does:
$ cat file1
a 1 2
b 1
d 1 2
$ cat file2
a 3 4
b 3 4
c 3 4
$ join -a1 -a2 -1 1 -2 1 -e. file1 file2
a 1 2 3 4
b 1 . 3 4
c . . 3 4
d 1 2 . .
$ join -a1 -a2 -1 1 -2 4 -e. file1 file2
. . . . a 3 4
. . . . b 3 4
. . . . c 3 4
a 1 2 . .
b 1 .
d 1 2 . .
$ join -a1 -a2 -1 4 -2 1 -e. file1 file2
. a 1 2 . . .
. b 1 . .
. d 1 2 . . .
a . . 3 4
b . . 3 4
c . . 3 4
$ join -a1 -a2 -1 4 -2 4 -e. file1 file2
. a 1 2 a 3 4
. a 1 2 b 3 4
. a 1 2 c 3 4
. b 1 . a 3 4
. b 1 . b 3 4
. b 1 . c 3 4
. d 1 2 a 3 4
. d 1 2 b 3 4
. d 1 2 c 3 4
While -e without -o was previously a noop, and so could safely be extended IMHO,
this will also change the behavior when with -e and -j are specified.
Previously if -j > 1 was specified, and that field was missing,
then -e would be used in its place, rather than the empty string.
This still does that, but also does the padding.
Without the -j issue I'd be 80:20 for just extending -e to auto pad,
but given -j I'm 50:50. The alternative it to select this with
say '-o padded', but that's less discoverable, and complicates
the interface somewhat.
cheers,
Pádraig.
join-auto-format.diff
Description: Text Data
- Re: [coreutils] join feature: auto-format,
Pádraig Brady <=