[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: merge sort temporary files
From: |
Paul Eggert |
Subject: |
Re: merge sort temporary files |
Date: |
Fri, 14 May 2004 00:01:15 -0700 |
User-agent: |
Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux) |
Instead of adding a new option, I think I'd rather change 'sort' to
cater to your (relatively common) case, rather than to the (relatively
contrived) cases like `cat F | sort -m -o F - G' where people should
know that they're getting into trouble anyway.
Here's a proposed patch to solve your problem that way instead.
2004-05-13 Paul Eggert <address@hidden>
Improve performance of `sort -m' on large files, at the cost of
making some contrived examples unsafe. POSIX allows this
optimization. Performance problem reported by Jonathan Baker in
<http://mail.gnu.org/archive/html/bug-coreutils/2004-05/msg00071.html>.
* src/sort.c (first_same_file): Do not treat input pipes
differently from other files.
* doc/coreutils.texi (sort invocation): Document that "sort -m -o F"
might write F before reading all the input.
* NEWS: Likewise.
Index: NEWS
===================================================================
RCS file: /home/meyering/coreutils/cu/NEWS,v
retrieving revision 1.206
diff -p -u -r1.206 NEWS
--- NEWS 11 May 2004 16:48:42 -0000 1.206
+++ NEWS 14 May 2004 06:35:30 -0000
@@ -20,6 +20,12 @@ GNU coreutils NEWS
** New features
+ For efficiency, `sort -m' no longer copies input to a temporary file
+ merely because the input happens to come from a pipe. As a result,
+ some relatively-contrived examples like `cat F | sort -m -o F - G'
+ are no longer safe, as `sort' might start writing F before `cat' is
+ done reading it. This problem cannot occur unless `-m' is used.
+
pwd now works even when run from a working directory whose name
is longer than PATH_MAX.
Index: doc/coreutils.texi
===================================================================
RCS file: /home/meyering/coreutils/cu/doc/coreutils.texi,v
retrieving revision 1.180
diff -p -u -r1.180 coreutils.texi
--- doc/coreutils.texi 9 May 2004 19:42:19 -0000 1.180
+++ doc/coreutils.texi 14 May 2004 06:32:53 -0000
@@ -3265,9 +3265,13 @@ starting with 1. So to sort on the seco
@opindex --output
@cindex overwriting of input, allowed
Write output to @var{output-file} instead of standard output.
-If necessary, @command{sort} reads input before opening
+Normally, @command{sort} reads all input before opening
@var{output-file}, so you can safely sort a file in place by using
commands like @code{sort -o F F} and @code{cat F | sort -o F}.
+However, @command{sort} with @option{--merge} (@option{-m}) can open
+the output file before reading all input, so a command like @code{cat
+F | sort -m -o F - G} is not safe as @command{sort} might start
+writing @file{F} before @command{cat} is done reading it.
@vindex POSIXLY_CORRECT
On newer systems, @option{-o} cannot appear after an input file if
Index: src/sort.c
===================================================================
RCS file: /home/meyering/coreutils/cu/src/sort.c,v
retrieving revision 1.284
diff -p -u -r1.284 sort.c
--- src/sort.c 26 Apr 2004 15:37:33 -0000 1.284
+++ src/sort.c 14 May 2004 05:45:52 -0000
@@ -1878,9 +1878,7 @@ sortlines_temp (struct line *lines, size
}
/* Return the index of the first of NFILES FILES that is the same file
- as OUTFILE. If none can be the same, return NFILES. Consider an
- input pipe to be the same as OUTFILE, since the pipe might be the
- output of a command like "cat OUTFILE". */
+ as OUTFILE. If none can be the same, return NFILES. */
static int
first_same_file (char * const *files, int nfiles, char const *outfile)
@@ -1910,7 +1908,7 @@ first_same_file (char * const *files, in
? fstat (STDIN_FILENO, &instat)
: stat (files[i], &instat))
== 0)
- && (S_ISFIFO (instat.st_mode) || SAME_INODE (instat, outstat)))
+ && SAME_INODE (instat, outstat))
return i;
}