Re: how to calculate the size of string in bytes?

From: Eli Zaretskii
Subject: Re: how to calculate the size of string in bytes?
Date: Tue, 18 Aug 2015 22:49:58 +0300

> Date: Tue, 18 Aug 2015 21:30:49 +0200
> Cc: address@hidden
> From:  <address@hidden>
> I was having difficulties in understanding you

Sorry about that.  It's a complex issue to explain in a few words.

> Now I understand: Emacs's internal (raw) coding system can represent
> "characters not expressible in utf-8".

More accurately, it can represent characters outside the Unicode code

And please don't call that "raw"; the internal representation of
characters used by Emacs is known as 'utf-8-emacs'.

> The function encode-coding-string passes those bytes silently
> through, outputting an invalid utf-8 sequence.

Yes.  Although in interactive functions Emacs will normally complain
and ask for a better encoding.

> So I venture the guess that when the Emacs buffer contains something
> epressible as valid utf-8, 'utf-8 and 'raw are equivalent


> (what about combining characters?)

Emacs doesn't normalize/compose/decompose characters when it encodes
text (with a notable exception of the utf-8-hfs encoding).
Applications that want this should do that themselves, e.g. using the
facilities in ucs-normalize.el.

