bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#61300: wc -c doesn't advance stdin position when it's a regular file


From: Pádraig Brady
Subject: bug#61300: wc -c doesn't advance stdin position when it's a regular file
Date: Mon, 6 Feb 2023 19:38:24 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Thunderbird/109.0

On 06/02/2023 06:27, Stephane Chazelas wrote:
On 2023-02-05 20:59, Paul Eggert wrote:
On 2023-02-05 11:59, Pádraig Brady wrote:
[...]
Let's leave that as-is, please. If 'wc' can output the correct value
without reading its input, POSIX does not require 'wc' to do the read,
and it seems perverse to modify 'wc' to go to the effort to refuse to
tell the user useful information that the user requested and that 'wc'
knows.
[...]

But while I would agree it's very unlikely to ever be hit in practice,
as I can't think of any reason why one would call wc with its input not
input for reading, wc is meant to report how many bytes it has read, not
the size of its input (though POSIX seems ambiguous on that).

See also (with Pádraig's patch applied):

$ { echo test > file; wc -c; echo test2 >&0; cat file; } 0> file
5
test
test2

wc has lseek()ed to the end of the file even though it was opened in
write-only mode. Compare with:

$ { echo test > file; wc -lc; echo test2 >&0; cat file; } 0> file
wc: 'standard input': Bad file descriptor
0 0
test2

Some more thoughts on this.

Note the orig thread with motivation for the st_size optimization is at:
https://lists.gnu.org/archive/html/coreutils/2016-03/msg00020.html
Note also wc -c has had an st_size optimization for all sizes
since the very first coreutils implementation.

A similar edge case to Stehpane's above is also seen when doing
the lseek(near_end)+read() method, as shown by:

  ${ truncate -s 32768 file; wc -c; wc -c; } 0> file
  wc: 'standard input': Bad file descriptor
  28679
  wc: 'standard input': Bad file descriptor
  0

One possible solution is avoid the above issue is:

  start_pos=lseek(0,SEEK_CUR);
  bytes += lseek(near_end)
  while (read())
    {
      if (did_lseek && read error == EBADF|EINVAL)
        lseek(start_pos); did_lseek=false; bytes=0; continue;
    }

That would also fix an issue I saw for one file in /sys, where:
  /sys/devices/pci0000:00/0000:00:02.0/rom
  st_size = 131072, available bytes = 0, wc -c = 127007 (EINVAL)

Doing that method for all file sizes rather than just using st_size,
would work but also penalize perf for the common case.
Consider cached stats on a network file system for example.
So I guess in addition to be able to keep the st_size optimization
with stdin, consistent with other cases we could verify/restrict
to readable also.

Note this is only an issue for stdin. Files specified on the command line
and explicitly opened, should get a permission error at that stage.

Note also if you really want to read, you can always `cat | wc -c`
rather than just `wc -c`, so I'm still not sure we should
add the readable restriction for stdin, but I'm not very against it
at least since it is such an edge case.

cheers,
Pádraig





reply via email to

[Prev in Thread] Current Thread [Next in Thread]