bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] star <-> GNU tar interchange issue


From: Nathan Stratton Treadway
Subject: Re: [Bug-tar] star <-> GNU tar interchange issue
Date: Mon, 25 Mar 2013 01:00:36 -0400
User-agent: Mutt/1.5.20 (2009-06-14)

On Sat, Mar 23, 2013 at 19:16:57 -0000, Mark wrote:
> I looked at a hex dump of the test_star.tar archive. For all files except
> the ...78.bin file, the o-umlaut character is represented by two bytes:
> 0xC3 0xB6. For the ...78.bin file the o-umlauts are represented by C3 83
> C2 B6 (see offsets 0x0C69 and 0x0C9D in the file).

It doesn't explain anything about why it's happening in the first place,
but I did notice that four-byte string appears to be the result of some
sort of double latin1 -> UTF-8 conversion.

That is, the o-umlaut character in latin1 is the F6 byte; when
represented in UTF-8 that expands to the two bytes C3 B6.

Those bytes, if then treated as latin1 characters instead of UTF-8 for
some reason, would display as "ö", and after another round of latin1 ->
UTF-8 conversion, would end up as C3 83 C2 B6....


                                                        Nathan

----------------------------------------------------------------------------
Nathan Stratton Treadway  -  address@hidden  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239



reply via email to

[Prev in Thread] Current Thread [Next in Thread]