Re: [Help-source-highlight] Unicode files ?

From: Dario Teixeira
Subject: Re: [Help-source-highlight] Unicode files ?
Date: Tue, 30 Mar 2010 07:29:41 -0700 (PDT)


> About working all the time in utf-8, do you mean (for example)
> converting utf-16 or anything else to utf-8 then working in utf-8 ? Or
> only supporting utf-8 files?

What I meant is that if Source-highlight were to use internally only one
of the various Unicode encodings, then I would vote for UTF-8, since it's
the most common one (except perhaps in CJK countries) and therefore would
not generally require either an external application or Source-highlight's
frontend to convert between different encodings.

I am not familiar with Source-highlight's internals, so I cannot tell
you what is the best choice architecture-wise.  Nevertheless, I see two
broad options:

a) Parameterise the encoding in such a way that the internal functions
   that operate on strings would change depending on whether we were
   dealing with single-byte, UTF-8, UTF-16, etc.

b) Use only one Unicode encoding internally (ex: UTF-8), and make it the
   frontend's or external application's responsibility to convert to/from
   this encoding.  If whoever implements this option is not comfortable
   with variable-length encodings, then by all means use a fixed-length
   encoding like UTF-32 (aka UCS-4).

Dario Teixeira

