classpath-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes


From: Archie Cobbs
Subject: Re: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes
Date: Thu, 18 Nov 2004 11:57:41 -0600 (CST)

Jeroen Frijters wrote:
> The string isn't valid Unicode so the UTF-8 encoder is within its rights
> to encode the surrogate as an invalid character.

Correct.. (unfortunately :-)

> > Yes, which is how I came across this bug. There are classes 
> > in Classpath that store arbitrary binary data within String
> > objects.
> 
> Class files don't use UTF-8 to encode strings, they use the format used
> by DataOutputStream.writeUTF (what Sun calls "modified UTF").

Right.. though it would be nice if there were an encoder
for "modified UTF" as well.

> So maybe all we need to do is make sure that
> DataOutputStream.writeUTF/DataInputStream.readUTF can roundtrip *any*
> string (even if it has invalid Unicode characters).

Definitely .. here's a test case (this one works):

    import java.io.*;
    import java.util.*;
    public class xx {
        public static void main(String[] args) throws Exception {
            String s = "\ud8aa";
            ByteArrayOutputStream bas = new ByteArrayOutputStream();
            DataOutputStream das = new DataOutputStream(bas);
            das.writeUTF(s);
            das.close();
            DataInputStream dis = new DataInputStream(
              new ByteArrayInputStream(bas.toByteArray()));
            String t = dis.readUTF();
            System.out.println(s.equals(t));
        }
    }

My error was assuming that "UTF-8" encoding and Java's "modified UTF"
were the same thing when in fact they are different.

-Archie

__________________________________________________________________________
Archie Cobbs      *        CTO, Awarix        *      http://www.awarix.com


*
Confidentiality Notice: This e-mail message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.
*





reply via email to

[Prev in Thread] Current Thread [Next in Thread]