[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Classpathx-xml] GNUJAXP.JAR Bug in gnu.xml.pipeline.TextConsumer co
Re: [Classpathx-xml] GNUJAXP.JAR Bug in gnu.xml.pipeline.TextConsumer constructor
Tue, 02 Apr 2002 18:00:46 -0800
> when the TextConsumer class is used to emit plain XML
> (not specially handled XHTML), the XML declaration is
> missing in the output TextConsumer produces.
Not necessarily a bug. And your ad-hoc fix is a NOP,
since the default encoding (given no XML declaration)
is already UTF-8 ... as it says in the XML REC! :)
> super (w, isXhtml ? "US-ASCII" : null);
> The XMLWriter class does not emit the XML declaration if no encoding
> is given.
... and (see the XMLWriter spec) if it can't figure out what the
real encoding is, by asking the OutputStreamWriter that one
hopes you passed.
It'd be very wrong if that code was given a writer that buffered,
say, a "Big5" encoded stream and then always stuck a "UTF-8"
label on it. Far better not to put any encoding declaration into
the output stream, and require the caller to use external encoding
declarations (like "text/xml;charset=...") to resolve this stuff.
> I think both XMLWriter and TextConsumer are at fault here. I understand
> the reason for omitting the XML delaration when (X)HTML output is desired.
You've got it backwards. XHTML defaults to straight US-ASCII;
anything else has notable portability issues. (But you can emit
specially encoded XHTML if you really want to -- just set it up
> ... TextConsumer should have a constructor of the form
> public TextConsumer (Writer w, String encoding, boolean isXhtml)
If you want an encoding declaration, just make sure you're
passing an OutputStreamWriter. Or invoke setWriter ()
directly (labeling the encoding as you please) instead of
doing it implicitly in the constructor.
The TextConsumer API is set up so you actually have to
work to grab enough rope to hang yourself (as it were :) by
goofing up encodings. But all the tools for it are there,
if you really need them.
> The TextConsumer constructor with a Writer argument is sometimes
> preferable over the OutputStream variant - for example when you
> want output to a String - there's a StringWriter standard class but
> not a StringBufferOutputStream.
In such a case, yet another way to ensure there's an XML
declaration is to just write it directly to the output stream
yourself... :) But I'd not encourage such tricks, since strings
are by definition _not encoded_ ... putting any kind of
encoding declaration in them is error prone, since you may
not know the encoding that'll be used when it's eventually
encoded onto some OutputStream.