[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[coreutils] join feature: auto-format
From: |
Assaf Gordon |
Subject: |
[coreutils] join feature: auto-format |
Date: |
Wed, 06 Oct 2010 16:41:09 -0400 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.9) Gecko/20100918 Icedove/3.1.4 |
Hello,
I'd like to (re)suggest a feature for the join program - the ability to
automatically build an output format line (similar but easier than using "-o").
I've previously mentioned it here (but got no favorable responses):
http://lists.gnu.org/archive/html/bug-coreutils/2009-11/msg00151.html
Several people have been using this option for a year now (on our local
servers), so I thought I might try to suggest it again.
The full patch is attached, and also available here:
http://cancan.cshl.edu/labmembers/gordon/files/join_auto_format_2010_10_06.patch
Here's the common use case:
Given two tabular files, with a common key at first column, and many numeric
(or other) values on other columns, the user wants to join them together easily.
One requirement is that empty/missing values should be populated with "00".
File 1
======
bar 10 13 15 16 11 32
foo 10 10 11 12 13 14
File 2
======
bar 99 91 90 93 91 93
baz 90 91 99 96 97 95
Desired joined output
==============
bar 10 13 15 16 11 32 99 91 90 93 91 93
baz 00 00 00 00 00 00 90 91 99 96 97 95
foo 10 10 11 12 13 14 00 00 00 00 00 00
There is no technical problem in achieving this, the parameters would be:
"-a1 -a2 -e 00 -o 0,1.2,1.3,1.4,1.5,1.6,1.7,2.2,2.3,2.4,2.5,2.6,2.7"
But building the "-o" parameter is cumbersome, and error-prone (imaging files
with dozens of columns, which is very common in my case).
The "--auto-format" feature simply builds the "-o" format line automatically,
based on the number of columns from both input files.
The auto-generated format order is: Key-column, all columns (except key) from
first file, all columns (except key) from second file.
The parameters for the above use case become:
"-a1 -a2 -e 00 --auto-format"
If "--auto-format" is not specified, there's no change to the rest of the
workflow.
If both "--auto-format" and "-o XXXX" are specified, the "-o" takes precedence.
Let me know what you think about it.
Please let me know what you think about it.
Best regards,
-gordon
join_auto_format_2010_10_06.patch
Description: Text Data
- [coreutils] join feature: auto-format,
Assaf Gordon <=