classpathx-xml
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Classpathx-xml] Aelfred2 problem with whitespace


From: Chris Burdess
Subject: Re: [Classpathx-xml] Aelfred2 problem with whitespace
Date: Sat, 7 Aug 2004 15:47:48 +0100

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Musachy Barroso wrote:
For instance, if I parse the following document:
    <root>&lt;head&gt; &amp; &lt;body&gt;</root>
the spaces before and after the "&amp;" are lost!!

I checked it out, the thing is that the spaces are getting reported as
ignorableWhitespace.

I got this from the SAX faq (http://www.saxproject.org/?selected=faq):

The ContentHandler.characters() callback is missing data!

    Please read the JavaDoc for this method. A parser may split text
into any number of separate chunks, and some characters may be
reported using ignorableWhitespace() instead of this callback.

    If you want all the text inside an element, you need to collect
the text from the various characters callbacks into a buffer. Only
when you see the endElement event can you be sure that you have seen
all the text, and some of it may really "belong" to child elements.

However, this whitespace is clearly NOT ignorable! We should only report whitespace via the ignorableWhitespace callback if there are no text children of the element, or exceptionally in the case of a mixed content model where there is no text between elements (since it may be too computationally expensive otherwise).

I'd like to consider this a bug.
- -- Chris Burdess
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFBFOuV6dl1DEqHgrgRAvfMAJ9297N2c1G0JIoU2NRLCXAugafoVgCeJ6ct
J0E9wKo2Ma3Rqbsx097QBs4=
=HP+q
-----END PGP SIGNATURE-----





reply via email to

[Prev in Thread] Current Thread [Next in Thread]