[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] join -t ''

From: Pádraig Brady
Subject: Re: [RFC] join -t ''
Date: Tue, 26 Jan 2010 12:01:27 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20091204 Thunderbird/3.0

I'm thinking of pushing both the doc change
(that join -t '\0' operates on whole line usually)
and the new functionality:
(that join -t '' always operates on the whole line)
as mentioned below...

On 29/10/09 12:15, Pádraig Brady wrote:
It's quite common to want `join` to operate on the whole line.
For e.g. see: https://bugzilla.redhat.com/show_bug.cgi?id=531355
In addition `sort` by default operates on the whole line.
So I think there should be an easy way for join to do the same.
The logical way for me is to specify an empty seperator with -t ''
as is done in the patch below. Would this be useful?
If not I'll at least document the -t '\0' option which achieves
the same thing iff there are no NUL characters in the line.
Note '\0' support was added in f9118c1c


diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index df7e963..57f6f11 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -5458,6 +5458,8 @@ locales and options if the output of @command{sort} is 
fed to
  sort a file on its default join field, but if you select a non-default
  locale, join field, separator, or comparison options, then you should
  do so consistently between @command{join} and @command{sort}.
+If @samp{join -t ''} is specified then the whole line is considered which
+matches the default operation of sort.

  If the input has no unpairable lines, a @acronym{GNU} extension is
  available; the sort order can be any order that considers two fields
@@ -5559,7 +5561,10 @@ option---are subject to the specified @var{field-list}.
  Use character @var{char} as the input and output field separator.
  Treat as significant each occurrence of @var{char} in the input file.
  Use @samp{sort -t @var{char}}, without the @option{-b} option of
address@hidden, to produce this ordering.
address@hidden, to produce this ordering.  If @samp{join -t ''} is specified,
+the whole line is considered, matching the default operation of sort.
+If @samp{-t '\0'} is specified then the @acronym{ASCII} @sc{nul}
+character is used to delimit the fields.

  @item -v @var{file-number}
  Print a line for each unpairable line in file @var{file-number}
diff --git a/src/join.c b/src/join.c
index d734a91..8c9b9d3 100644
--- a/src/join.c
+++ b/src/join.c
@@ -204,7 +204,8 @@ the remaining fields from FILE1, the remaining fields from 
FILE2, all\n\
  separated by CHAR.\n\
  Important: FILE1 and FILE2 must be sorted on the join fields.\n\
-E.g., use `sort -k 1b,1' if `join' has no options.\n\
+E.g., use ` sort -k 1b,1 ' if `join' has no options,\n\
+or use ` join -t '' ' if `sort' has no options.\n\
  Note, comparisons honor the rules specified by `LC_COLLATE'.\n\
  If the input is not sorted and some lines cannot be joined, a\n\
  warning message will be given.\n\
@@ -1024,8 +1025,8 @@ main (int argc, char **argv)
              unsigned char newtab = optarg[0];
              if (! newtab)
-              error (EXIT_FAILURE, 0, _("empty tab"));
-            if (optarg[1])
+              newtab = '\n'; /* '' =>  process the whole line.  */
+            else if (optarg[1])
                  if (STREQ (optarg, "\\0"))
                    newtab = '\0';

reply via email to

[Prev in Thread] Current Thread [Next in Thread]