For instance, if I parse the following document:
<root><head> & <body></root>
the spaces before and after the "&" are lost!!
I checked it out, the thing is that the spaces are getting reported as
ignorableWhitespace.
I got this from the SAX faq (http://www.saxproject.org/?selected=faq):
The ContentHandler.characters() callback is missing data!
Please read the JavaDoc for this method. A parser may split text
into any number of separate chunks, and some characters may be
reported using ignorableWhitespace() instead of this callback.
If you want all the text inside an element, you need to collect
the text from the various characters callbacks into a buffer. Only
when you see the endElement event can you be sure that you have seen
all the text, and some of it may really "belong" to child elements.