[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-source-highlight] Unicode files ?

From: Lorenzo Bettini
Subject: Re: [Help-source-highlight] Unicode files ?
Date: Tue, 30 Mar 2010 12:12:59 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20100322 Thunderbird/3.0.3


hopefully in C++ it should be easier to handle unicode in a more transparent way...

surely, I would say, that source-highlight should support unicode as a configuration/compilation flag: thus you should *build* source-highlight with unicode support.


On 03/30/2010 11:59 AM, Lionel Fumery wrote:
Hi Lorenzo, Martin (and others maybe),

Thanks for you answers. Again, as I discovered Source-highlight very
recently, I don't know if Unicode is an important feature for you or
not... I read sometimes source code from Japanese or Chinese developers,
and am French myself, so that's not unusual to store code or text files
in Unicode (I mostly work with Visual Studio).

Unicode files (UTF-8 for example, which is widely used on the Internet)
can store characters on 1 to 6 bytes. So of course it's very difficult
to use (length() and so are difficult)

*1) First you have to know if the file is Unicode or not.* They should
have a header, described here:

(Note that "bad" unicode text files are quite common (unicode text files
without any header), but no need to address this here.)

*2) The second thing is to convert the whole file to a "fixed bytes per
character" format*, so you can work with it. A wide char format (16 bits
wchar) is a good choice most of the time.

Here is a FAQ explaining how to read Unicode files :

I can provide some C code source snippets to match this.

*3) And then you can work with wchar functions*.

Don't know too much on the Linux side, but it's simply a matter of
wcslen, wcscpy, wcscat instead of length(), strcpy(), strcat() with
Visual Studio.

I'm going to take a look on the Source-Highlight code to see if this
could be easy to add...


Lorenzo Bettini wrote:
Lionel Fumery wrote:

I'm new with Source-Highlight, just began a week ago in fact. It
works fine, but I have some understanding issues about Unicode files.

For example, create a simple text file, saved as unicode, with only
the word "test". If you edit this text file with an hexadecimal
editor, the content will be FF FE 74 00 65 00 73 00 74 00. In this
sequence, FF FE is the unicode marker.

When highlighting this file :
source-highlight test.txt --line-number

the resulting HTML file is incorrect : <pre><tt><font
color="#000000">1:</font> ??t?e?s?t?</tt></pre>

As you see, the Unicode behavior is just missing.

Could you please explain me if this is supported by Source-Highlight,
and how can I enable it ?

Thank you a lot for your help!

Hi Lionel

actually I never dealt with unicode character thus source-highlight
probably does not support it...

has anybody got any idea on how adding such support to a C++ program?
Is it just a matter of using wchar for strings?


Help-source-highlight mailing list

Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino
ICQ# lbetto, 16080134     (GNU/Linux User # 158233)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]