classpath
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

XML parsing problems


From: Chris Burdess
Subject: XML parsing problems
Date: Sun, 25 Dec 2005 10:00:31 +0000
User-agent: Mutt/1.5.10i

Hi

We discovered over IRC that there is a major problem with XML parsing using
the StAX driver, caused by a bug in BufferedInputStream. I'm therefore
reverting the default XML parser to aelfred2 until this is resolved.

The bug is in both gnu.xml.stream.XMLInputStreamReader and
java.io.BufferedInputStream - the former uses almost identical code to the
second in order to provide mark/reset functionality.

As I understand it, the problem can occur when the position in the buffer is
near the end. If the mark is set at position 2047 in the buffer, then we read
2 bytes and reset, then refill() will have been called and the position is
actually reset to position 2047 in the new buffer, 2K further along in the
original stream.

As the StAX parser relies heavily on mark/reset behaviour to function
correctly, it will not parse entities greater than 2K in size reliably (it
depends what structures are at the 2K boundaries).

If anyone has a robust solution to this problem please apply it; I will try
to address it but may not have much free time before the new year/release.
-- 
Chris Burdess
  "They that can give up essential liberty to obtain a little safety
  deserve neither liberty nor safety." - Benjamin Franklin

Attachment: pgpY3Z0nTDDL9.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]