bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

join with header line support


From: Assaf Gordon
Subject: join with header line support
Date: Fri, 30 Oct 2009 19:02:53 -0400
User-agent: Mozilla-Thunderbird 2.0.0.22 (X11/20090707)

Hello,

I'd like to suggest a small feature for 'join':

"--header" makes join join the first line from each file regardless of the join 
field and ordering.
This allows joining files which have header lines in them.

Example:
===============
$ cat 1.txt ID Color Name
1       green   Alice   
2       red     Bob
3       blue    Carol
4       black   Dave


$ cat 2.txt ID Age
2       55
4       24

$ join --check-order --header -j 1 -a 1 -e unknown -o "0 1.3 2.2" 1.txt 2.txt ID Name Age
1       Alice   unknown
2       Bob     55
3       Carol   unknown
4       Dave    24

===============

Although the above can be accomplished by using several other utilities (cut, head, 
paste, sed or similar combination), having this feature built-in in join makes life a lot 
easier - especially if I'm joining severals files ( using pipes ), or using specific 
output fields (with "-o") - join will thus take care of extracting the right 
field header into the header line.

The following patch adds the "--header" feature. If "--header" is not used - 
there are no changes to the regular program flow.

Comments are welcomed. This patch is released under GPLv3 or later.
If you're willing to accept this patch, I'll be happy to assign copyright to 
GNU, etc.

thanks,
 gordon

=============================

--- join.orig.c 2009-09-23 04:25:44.000000000 -0400
+++ join.c      2009-10-30 19:00:01.000000000 -0400
@@ -146,6 +146,7 @@ static struct option const longopts[] =
  {"ignore-case", no_argument, NULL, 'i'},
  {"check-order", no_argument, NULL, CHECK_ORDER_OPTION},
  {"nocheck-order", no_argument, NULL, NOCHECK_ORDER_OPTION},
+  {"header", no_argument, NULL, 'H'},
  {GETOPT_HELP_OPTION_DECL},
  {GETOPT_VERSION_OPTION_DECL},
  {NULL, 0, NULL, 0}
@@ -157,6 +158,10 @@ static struct line uni_blank;
/* If nonzero, ignore case when comparing join fields.  */
static bool ignore_case;

+/* If nonzero, treat the first line of each file as column headers -
+   join them without checking for ordering */
+static bool join_header_lines;
+
void
usage (int status)
{
@@ -191,6 +196,7 @@ by whitespace.  When FILE1 or FILE2 (not
  --check-order     check that the input is correctly sorted, even\n\
                      if all input lines are pairable\n\
  --nocheck-order   do not check that the input is correctly sorted\n\
+  --header          treat first line in each file as field header line.\n\
"), stdout);
      fputs (HELP_OPTION_DESCRIPTION, stdout);
      fputs (VERSION_OPTION_DESCRIPTION, stdout);
@@ -616,6 +622,15 @@ join (FILE *fp1, FILE *fp2)
  initseq (&seq2);
  getseq (fp2, &seq2, 2);

+ if (join_header_lines && seq1.count && seq2.count) + {
+      prjoin(seq1.lines[0], seq2.lines[0]);
+      prevline[0] = NULL ;
+      prevline[1] = NULL ;
+      advance_seq (fp1, &seq1, true, 1);
+      advance_seq (fp2, &seq2, true, 2);
+    }
+
  while (seq1.count && seq2.count)
    {
      size_t i;
@@ -1052,6 +1067,10 @@ main (int argc, char **argv)
                         &nfiles, &prev_optc_status, &optc_status);
          break;

+        case 'H':
+          join_header_lines = true ;
+          break;
+
        case_GETOPT_HELP_CHAR;

        case_GETOPT_VERSION_CHAR (PROGRAM_NAME, AUTHORS);





reply via email to

[Prev in Thread] Current Thread [Next in Thread]