[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: java.io.DataInputStream.readLine misbehaviour
Re: java.io.DataInputStream.readLine misbehaviour
Sun, 02 Nov 2003 17:33:30 +0100
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030312
Mark Wielaard wrote:
On Fri, 2003-10-31 at 19:12, Guilhem Lavaux wrote:
Continuing the Classpath-Kaffe merge, I noticed that
DataInputStream is failing one of kaffe's regression test
(InputStreamTest). It seems the failure comes from
readLine(): readLine() is a little too conservative compared
to JDK's implementation. The real behaviour seems to be the
readLine() is invoked => it reads bytes until it finds a
'\r' => it reads the next byte
* if it is a '\n' it eats it
* if it isn't it keeps it in an internal buffer until
someone calls another read methods.
But in any case the byte isn't put back in the input buffer.
The Classpath implementation clearly took the lowest overhead for the
non-readLine case. For the benefit of the other mailinglist readers,
here is the actual comment from Classpaths (same for libgcj)
// FIXME: The following code tries to adjust the stream back one
// character if the next char read is '\n'. As a last resort,
// it tries to mark the position before reading but the bottom
// line is that it is possible that this method will not properly
// deal with a '\r' '\n' combination thus not fulfilling the
// DataInput contract for readLine. It's not a particularly
// safe approach threadwise since it is unsynchronized and
// since it might mark an input stream behind the users back.
// Along the same vein it could try the same thing for
// ByteArrayInputStream and PushbackInputStream, but that is
// probably overkill since this is deprecated & BufferedInputStream
// is the most likely type of input stream.
// The alternative is to somehow push back the next byte if it
// isn't a '\n' or to have the reading methods of this class
// keep track of whether the last byte read was '\r' by readLine
// and then skip the very next byte if it is '\n'. Either way,
// this would increase the complexity of the non-deprecated methods
// and since it is undesirable to make non-deprecated methods
// less efficient, the following seems like the most reasonable
That's a bug in Classpath then. Trying to 'read ahead' after \r fails on
for those systems whose 'end of line' is a plain \r. See this note from
Apple: http://developer.apple.com/technotes/tn/tn1157.html . Short
story: you're reading lines over network from a machine on mac os x: you
read \r, try to read ahead but the next character never comes since
the darwin machine already sent you its end of line, so you hang
forever, i.e. util the connection is terminated.
In fact, while you're at it, check out the rationale behind kaffe's
implementation as described in this thread:
, where i go into some detail how to deal with the \r vs \r\n issue. ;)
The FilterInputStream kludge was the only way to prevent fixing one bug
on the cost of introducing the other I could come up with, and I think
it's the only possible way to do it.
BTW I don't think we have to worry about the unsynchronized thing since
streams are already not thread safe.
They should be. My experience from working on kaffe's readers and
writers and testing against JDK was that the JDK implementation was very
thread safe, whereas kaffe's IO had thread safety kludged in afterwards
by me for some classes. From what I remember, Classpath code wasn't
thread safe at all, either.
For what it's worth, yes, there is code out there that expects threads
to be able to read from readers/writers and not trip over their feet ;)
AFAIK, the Java APIs explicitely mention when a method/class is not
thread safe, so by default we should assume that APIs are to be thread
safe. I mean, it's built in the laguage, why would you want to limit
yourself to single threads when you're doing something as important as IO?
The real problem is what happens when someone calls one of
the inherited method from FilterInputStream which are not
overloaded (e.g. available).
Kaffe's answer was to create a mini-buffer with default
access in FilterInputStream. It is obvious it is clumsy
because it slightly changes the API, but the only other
option would be to overload all read functions.
That is overhead for every FilterInputStream, that should be avoided.
But I agree that the Classpath way is not very clean.
Would a solution be to do the following in the constructor?
public DataInputStream (InputStream in)
super (in.markSupported() ? in : new BufferedInputStream(in, 1));
No. Even if the superstream supports marking, then a mark/rewind based
solution would delete a previous mark set by user of DataInputStream
class each time a \r occurs. That would be a quite ugly bug to find.
And then use the fact that in will always be a BufferedInputStream in
the readLine method? This does create a small overhead for
DataInputStream in case it isn't already wrapped around something that
buffers already, but it would be easy to make it always work correctly.
It doesn't necessarily mean it will always be a buffered input stream.
There are input streams that support marking that are not
Another question is how often (ever?) does this (mixing readLine() with
other DataInputStream and/or FilterInputStream calls happen in real
programs. The regression test of kaffe looks contrived, it even uses
StringInputStream which is an ugly (and thankfully deprecated) class.
Happened in real life with very popular libraries like xerces. See the
kaffe mailing list thread. User was not pleased at all, and wanted to
revert back to broken 'read ahead, and unread' method. I had to figure
out a way to fix both issues, that's the whole story. If someone can
find a more elegant way to do it, that doesn't create any new bugs, I'm
all ears. I've been there before and spent some time researching this,
though. You may have more luck. ;)