classpath-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [cp-patches] gnu/xml/transform/StreamSerializer.java: compatibilityM


From: Chris Burdess
Subject: Re: [cp-patches] gnu/xml/transform/StreamSerializer.java: compatibilityMode setting
Date: Sun, 13 Feb 2005 09:36:32 +0000

Ito Kazumitsu wrote:
gnu/xml/transform/StreamSerializer.java sets compatibilityMode to true
for any encoding other than UTF* or UCS*.

This is very inconvenient for people using non-ASCII encodings,
especially CJK encodings, where most of the characters > 127.

compatibilityMode being set to true, the output grows too large
and becomes completely human-unreadable.

I would like by some means to set compatibilityMode to false,
and this is what I did.

--- gnu/xml/transform/StreamSerializer.java.orig Fri Dec 24 07:38:44 2004
+++ gnu/xml/transform/StreamSerializer.java     Sun Feb 13 08:43:33 2005
@@ -106,6 +106,12 @@
             compatibilityMode = false;
           }
       }
+    String sysprop = System.getProperty(
+        "gnu.xml.transform.StreamSerializer.compatibilityMode");
+    if (sysprop != null)
+      {
+        compatibilityMode = ("true".equalsIgnoreCase(sysprop));
+      }
this.eol = (eol != null) ? eol : System.getProperty("line.separator");
     namespaces = new HashMap();
   }

Unfortunately your patch is almost guaranteed to produce non-well-formed XML.

I agree that compatibilityMode is a hack. What's really needed is a way to detect whether a character is a valid member of a given encoding, and what its value in that encoding is. If it isn't a valid member of that encoding, it needs to be escaped via the character entity reference mechanism. We can't simply use String.getBytes(encoding), as this will simply lose the characters that are not valid members of the encoding (converting them to ASCII 63 iirc).
--
Chris Burdess





reply via email to

[Prev in Thread] Current Thread [Next in Thread]