Hi Lorenzo, Martin (and others maybe),
Thanks for you answers. Again, as I discovered Source-highlight very
recently, I don't know if Unicode is an important feature for you or
not... I read sometimes source code from Japanese or Chinese developers,
and am French myself, so that's not unusual to store code or text files
in Unicode (I mostly work with Visual Studio).
Unicode files (UTF-8 for example, which is widely used on the Internet)
can store characters on 1 to 6 bytes. So of course it's very difficult
to use (length() and so are difficult)
*1) First you have to know if the file is Unicode or not.* They should
have a header, described here:
http://en.wikipedia.org/wiki/Byte_order_mark
(Note that "bad" unicode text files are quite common (unicode text files
without any header), but no need to address this here.)
*2) The second thing is to convert the whole file to a "fixed bytes per
character" format*, so you can work with it. A wide char format (16 bits
wchar) is a good choice most of the time.
Here is a FAQ explaining how to read Unicode files :
http://www.cl.cam.ac.uk/~mgk25/unicode.html
<http://www.cl.cam.ac.uk/%7Emgk25/unicode.html>.
I can provide some C code source snippets to match this.
*3) And then you can work with wchar functions*.
Don't know too much on the Linux side, but it's simply a matter of
wcslen, wcscpy, wcscat instead of length(), strcpy(), strcat() with
Visual Studio.
I'm going to take a look on the Source-Highlight code to see if this
could be easy to add...
Best,
Lionel
Lorenzo Bettini wrote:
Lionel Fumery wrote:
Hi,
I'm new with Source-Highlight, just began a week ago in fact. It
works fine, but I have some understanding issues about Unicode files.
For example, create a simple text file, saved as unicode, with only
the word "test". If you edit this text file with an hexadecimal
editor, the content will be FF FE 74 00 65 00 73 00 74 00. In this
sequence, FF FE is the unicode marker.
When highlighting this file :
source-highlight test.txt --line-number
the resulting HTML file is incorrect : <pre><tt><font
color="#000000">1:</font> ??t?e?s?t?</tt></pre>
As you see, the Unicode behavior is just missing.
Could you please explain me if this is supported by Source-Highlight,
and how can I enable it ?
Thank you a lot for your help!
Hi Lionel
actually I never dealt with unicode character thus source-highlight
probably does not support it...
has anybody got any idea on how adding such support to a C++ program?
Is it just a matter of using wchar for strings?
thanks
Lorenzo
_______________________________________________
Help-source-highlight mailing list
address@hidden
http://lists.gnu.org/mailman/listinfo/help-source-highlight