[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: character encoding question

From: Peter Dyballa
Subject: Re: character encoding question
Date: Wed, 20 Feb 2013 11:44:26 +0100

Am 20.02.2013 um 07:34 schrieb Eric Abrahamsen:

> (string-as-unibyte "中") --> \344\270\255
> I understand that each of these three sections is a byte, also in octal.
> What's the correspondence between these bytes and the multibyte
> character's octal codepoint? Are there any functions that will get from
> one to the other?

It's defined in Unicode by the Unicode consortium. The code points in Unicode 
can be represented by different systems: UTF-7, UTF-8, UTF-16 with least 
significant byte first or most significant byte first, UTF-32, maybe more. 
Wikipedia certainly is a good start.

In the example above some Unicode character (#o47055) is represented by a 
sequence of three bytes. Since the bytes are numerically all greater than 127 
it must be saved in UTF-8 encoding. It's U+4E2D, some CJK Ideograph. 

> Second question: If emacs can't guess the encoding of a file, it gives
> you an error message showing the bytes it can't decode, plus the
> charsets it tried to use. How do I replicate that process manually?

C-x RET r – revert-buffer-with-coding-system. The function gives you the choice 
to select an encoding.



"Debugging? Klingons do not debug! Our software does not coddle the weak."

reply via email to

[Prev in Thread] Current Thread [Next in Thread]