Re: character encoding question

help-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: character encoding question

From:	Peter Dyballa
Subject:	Re: character encoding question
Date:	Wed, 20 Feb 2013 11:44:26 +0100

Am 20.02.2013 um 07:34 schrieb Eric Abrahamsen:

> (string-as-unibyte "中") --> \344\270\255
> 
> I understand that each of these three sections is a byte, also in octal.
> What's the correspondence between these bytes and the multibyte
> character's octal codepoint? Are there any functions that will get from
> one to the other?

It's defined in Unicode by the Unicode consortium. The code points in Unicode 
can be represented by different systems: UTF-7, UTF-8, UTF-16 with least 
significant byte first or most significant byte first, UTF-32, maybe more. 
Wikipedia certainly is a good start.

In the example above some Unicode character (#o47055) is represented by a 
sequence of three bytes. Since the bytes are numerically all greater than 127 
it must be saved in UTF-8 encoding. It's U+4E2D, some CJK Ideograph. 

> Second question: If emacs can't guess the encoding of a file, it gives
> you an error message showing the bytes it can't decode, plus the
> charsets it tried to use. How do I replicate that process manually?

C-x RET r – revert-buffer-with-coding-system. The function gives you the choice 
to select an encoding.

--
Greetings

  Pete

"Debugging? Klingons do not debug! Our software does not coddle the weak."

[Prev in Thread]

Current Thread

[Next in Thread]

character encoding question, Eric Abrahamsen, 2013/02/20
- Re: character encoding question, Peter Dyballa <=
- Re: character encoding question, Stefan Monnier, 2013/02/20
  - Re: character encoding question, Eli Zaretskii, 2013/02/20
    - Re: character encoding question, Eric Abrahamsen, 2013/02/20

Prev by Date: Re: create new key prefix
Next by Date: Re: recover file after crash
Previous by thread: character encoding question
Next by thread: Re: character encoding question
Index(es):
- Date
- Thread