bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: merge sort temporary files


From: Paul Eggert
Subject: Re: merge sort temporary files
Date: Fri, 14 May 2004 00:01:15 -0700
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)

Instead of adding a new option, I think I'd rather change 'sort' to
cater to your (relatively common) case, rather than to the (relatively
contrived) cases like `cat F | sort -m -o F - G' where people should
know that they're getting into trouble anyway.

Here's a proposed patch to solve your problem that way instead.

2004-05-13  Paul Eggert  <address@hidden>

        Improve performance of `sort -m' on large files, at the cost of
        making some contrived examples unsafe.  POSIX allows this
        optimization.  Performance problem reported by Jonathan Baker in
        <http://mail.gnu.org/archive/html/bug-coreutils/2004-05/msg00071.html>.

        * src/sort.c (first_same_file): Do not treat input pipes
        differently from other files.
        * doc/coreutils.texi (sort invocation): Document that "sort -m -o F"
        might write F before reading all the input.
        * NEWS: Likewise.

Index: NEWS
===================================================================
RCS file: /home/meyering/coreutils/cu/NEWS,v
retrieving revision 1.206
diff -p -u -r1.206 NEWS
--- NEWS        11 May 2004 16:48:42 -0000      1.206
+++ NEWS        14 May 2004 06:35:30 -0000
@@ -20,6 +20,12 @@ GNU coreutils NEWS                      
 
 ** New features
 
+  For efficiency, `sort -m' no longer copies input to a temporary file
+  merely because the input happens to come from a pipe.  As a result,
+  some relatively-contrived examples like `cat F | sort -m -o F - G'
+  are no longer safe, as `sort' might start writing F before `cat' is
+  done reading it.  This problem cannot occur unless `-m' is used.
+
   pwd now works even when run from a working directory whose name
   is longer than PATH_MAX.
 
Index: doc/coreutils.texi
===================================================================
RCS file: /home/meyering/coreutils/cu/doc/coreutils.texi,v
retrieving revision 1.180
diff -p -u -r1.180 coreutils.texi
--- doc/coreutils.texi  9 May 2004 19:42:19 -0000       1.180
+++ doc/coreutils.texi  14 May 2004 06:32:53 -0000
@@ -3265,9 +3265,13 @@ starting with 1.  So to sort on the seco
 @opindex --output
 @cindex overwriting of input, allowed
 Write output to @var{output-file} instead of standard output.
-If necessary, @command{sort} reads input before opening
+Normally, @command{sort} reads all input before opening
 @var{output-file}, so you can safely sort a file in place by using
 commands like @code{sort -o F F} and @code{cat F | sort -o F}.
+However, @command{sort} with @option{--merge} (@option{-m}) can open
+the output file before reading all input, so a command like @code{cat
+F | sort -m -o F - G} is not safe as @command{sort} might start
+writing @file{F} before @command{cat} is done reading it.
 
 @vindex POSIXLY_CORRECT
 On newer systems, @option{-o} cannot appear after an input file if
Index: src/sort.c
===================================================================
RCS file: /home/meyering/coreutils/cu/src/sort.c,v
retrieving revision 1.284
diff -p -u -r1.284 sort.c
--- src/sort.c  26 Apr 2004 15:37:33 -0000      1.284
+++ src/sort.c  14 May 2004 05:45:52 -0000
@@ -1878,9 +1878,7 @@ sortlines_temp (struct line *lines, size
 }
 
 /* Return the index of the first of NFILES FILES that is the same file
-   as OUTFILE.  If none can be the same, return NFILES.  Consider an
-   input pipe to be the same as OUTFILE, since the pipe might be the
-   output of a command like "cat OUTFILE".  */
+   as OUTFILE.  If none can be the same, return NFILES.  */
 
 static int
 first_same_file (char * const *files, int nfiles, char const *outfile)
@@ -1910,7 +1908,7 @@ first_same_file (char * const *files, in
            ? fstat (STDIN_FILENO, &instat)
            : stat (files[i], &instat))
           == 0)
-         && (S_ISFIFO (instat.st_mode) || SAME_INODE (instat, outstat)))
+         && SAME_INODE (instat, outstat))
        return i;
     }
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]