bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fflush after ungetc


From: Eric Blake
Subject: Re: fflush after ungetc
Date: Thu, 06 Mar 2008 22:58:16 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080213 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

[Adding the Austin Group]

According to Bruno Haible on 3/6/2008 3:46 PM:
| Do you know the wording that the newest POSIX has about this?

I think that Interp 002 is incomplete in the face of ungetc.

http://www.opengroup.org/austin/interps/uploads/40/6806/AI-002.txt

It looks like the intent of POSIX is to specify exactly what happens to
the underlying file description when a stream is flushed, particularly
since other processes can observe the results when the file description is
duplicated across process boundaries.

|
| The following test program, run on various platforms, gives unconclusive
| results.
|
| ========================== foo.c =========================
| #include <stdio.h>
| int
| main (int argc, char **argv)
| {
|   /* Check that fflush after a non-backup ungetc() call discards the ungetc
|      buffer.  */
|   int c;
|
|   c = fgetc (stdin);
|   printf ("c = '%c'\n", c);
|
|   c = fgetc (stdin);
|   printf ("c = '%c'\n", c);
|
|   c = ungetc ('@', stdin);
|   printf ("ungetc result = '%c'\n", c);
|
|   fflush (stdin);
|
|   c = fgetc (stdin);
|   printf ("c = '%c'\n", c);
|
|   c = fgetc (stdin);
|   printf ("c = '%c'\n", c);
|
|   return 0;
| }
| =============================================================
|
| $ gcc foo.c
| $ ./a.out < foo.c
| $ cat foo.c | ./a.out
|
| On glibc-2.3.6: Different results.
| When reading from the regular file:
| c = '#'
| c = 'i'
| ungetc result = '@'
| c = 'n'
| c = 'c'

Or in other words, fflush changed the stream offset from 1 to 2,
discarding the ungetc data.

| When reading from the pipe:
| c = '#'
| c = 'i'
| ungetc result = '@'
| c = '@'
| c = 'n'

Or in other words, fflush failed, leaving the ungetc data intact.

|
| On MacOS X: twice
| c = '#'
| c = 'i'
| ungetc result = '@'
| c = '@'
| c = 'n'

Or in other words, regardless of whether fflush succeeds, ungetc data was
left intact.

|
| On HP-UX 11:
| When reading from the regular file:
| c = '#'
| c = 'i'
| ungetc result = '@'
| c = 'i'
| c = 'n'

Or in other words, the stream position was left intact at 1, but the
ungetc data was lost.

| When reading from the pipe:
| c = '#'
| c = 'i'
| ungetc result = '@'
| c = <EOF>
| c = <EOF>

Bug.  C99 is quite clear that implementations shall provide at least one
byte of ungetc buffering for all streams, and that it cannot fail if there
was a prior fgetc.

Is the intent of Interp 002 to make this behavior portable?  Or are we
resigned to documenting that fflush after ungetc produces unspecified results?

Next, consider this example:

$ echo 'm4exit-hello' > file
$ ( m4; cat ) < file

According to POSIX, m4 MUST leave the underlying file at the next
unprocessed byte, such that cat must pick up where m4 left off.  So I
claim this MUST print "-hello".  Now, when implementing m4, you MUST read
the byte '-' from the file, to decide the user is invoking "m4exit" or
"m4exit(1)", for example.  But if m4 is implemented with streams, it is
much simpler conceptually to call ungetc('-', stdin) when it is determined
that input is not '(', dispatch to the 'm4exit' handler, and then rely on
the auto-fflush() behavior of exit().

In this case, it then makes more sense for fflush() to preserve the
current stream offset, rather than the offset that was present prior to
the ungetc().  So for seekable input, this would argue against glibc's
implementation, and for either MacOS or HP-UX.  Without these semantics,
m4 would have to resort to an explicit fseek to the current stream
position, since the auto-fflush of exit() would leave the underlying file
description at the wrong offset.  (At any rate, GNU m4 already has to do
some explicit operations in an atexit hook on glibc systems, where the
exit() behavior violates POSIX because the fflush is not automatic, but
that is besides the point of this discussion).

The remaining question is what to do about the ungetc buffer from a
single-process standpoint.  The above example of interprocess behavior
pushed back what was read.  But if a different byte is pushed back, POSIX
is clear that the underlying file is not altered to contain that new byte.
~ Therefore, in interprocess communications, the second process would read
the original byte (as was done in HP-UX) rather than the ungetc buffer (as
was done in MacOS).  But for a single process, the wording for ungetc is
clear that only a successful file positioning function discards the ungetc
buffer, and does not list fflush as a file positioning function (and while
fflush on a write stream may change the position as data is flushed,
nothing in fflush, not even with Interp 002, mentions changing the stream
offset for read streams).  So here, it seems like MacOS (or glibc's pipe)
behavior is better.  Furthermore, if fflush leaves the ungetc buffer
intact, you can still do fseek(stdin,0,SEEK_CUR) to reread the real file
contents rather than the ungetc buffer.

My concern is that this statement, added by Interp 002, is ambiguous:
"the file offset of the underlying open file description shall be adjusted
so that the next operation on the open file description deals with the
byte after the last one read".  Does reading the ungetc buffer count as an
operation on the open file description, or is the next operation on the
open file description deferred until after the ungetc buffer is exhausted?
~ In other words, when calling fflush immediately after ungetc, is the
offset of the file description set to the current stream position (in the
example above, 1, as in MacOS) or to the stream position where the ungetc
buffer ends (2, as in glibc)?

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH0Nl484KuGfSFAYARAoEpAKCrW8pm2LXtkuMope2lUe8aanxaSgCgofxC
iexFnfiVRlgIZ58EfBCez7w=
=7wN/
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]