[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[XForms] UTF-8 support development release

From: Jens Thoms Toerring
Subject: [XForms] UTF-8 support development release
Date: Mon, 30 Jun 2014 23:49:50 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Hi everyone,

   here's another fresh development release, this time beside
supporting True and Open Type fonts it also works with UTF-8:


This version is, of course, also available from the git re-
pository: it's the 'utf8' branch.

   For most of the left-pondians of you (if you're from the
northern parts) this probably isn't a big deal (unless you
also want e.g. Spanish texts to be displayed properly - for
someone used to it the difference between e.g. "PiƱa colada"
or just "Pina colada" is quite noticable), but until now I
couldn't even be sure that my surname would be reproduced
correctly in a XForms browser.;-)

   We actually have now three versions of the library:

a) with TTF/OTF support and UTF-8 always on
b) uses the X11 bitmap fonts and UTF-8
c) uses X11 fonts but no UTF-8

   The distinction between b) and c) is not based on a compile-
time setting but depends on the capabilities of the X11 version,
i.e., if that supports UTF-8 it will be avaialble and used, other-
wise the library will revert to allowing just ASCII characters
(but I think that all X11 versions in the last few years had
UTF-8 support, so c) is only relevant for very old installa-

   In the long run I'd, of course, prefer to reduce this back
to one version (the number of possible problems grows rather
likely exponentially), but for a transition period I'll try to
keep all of them up-to-date.

   Should you have been using non-ASCII characters in either
source code (e.g., for labels or other text you sent to an
XForms function), files generated by fdesign or files that,
for example, get read in into a browser, the best way to get
everything to work is probably by converting the encodings of
those files. A useful tool for doing that is 'iconv'.

   If you've set up your system to use a non-UTF-8 encoding by
default (like, for example, 'de_DE.iso88591') don't despair:
entering text into input fields should still work correctly,
as should copy-and-past.

   Of course, this is the first release of the new UTF-8 branch
and I'm sure that I have overlooked a number of details. For the
next days I'll go through the code and try to figure out what
could be further issues that need to be dealt with. Any obser-
vations from you will definitely help!

   Some things to consider:

a) In your own programs don't assume that the result of strlen()
   on a string equals the number of characters in the string.
   I've written a small set of functions for dealing with UTF-8
   strings and characters (see 'lib/utf-8.c') - would you like
   me to make them part of the public interface (perhaps after
   some more cleaning up)?

b) The meaning of the fl_set_input_maxchars() function has be-
   come equivocal: should that be the maximum number of charac-
   ters in input field or the maximum number of bytes? While
   the names states 'maxchars' I've decided ti make this mean
   the number of bytes instead, because there might be programs
   that copy the result of fl_get_input() into an array of that
   size, expecting the string never to be longer. Thus changing
   the meaning to "number of characters" could suddenly intro-
   duce buffer-overflow bugs in otherwise perfectly working
   programs. I guess it will be better to introduce a new
   function, named e.g. fl_set_input_utf8_maxchars() to allow
   a limit on the number of actual characters.
   BTW, I also had to modify fl_set_input_maxchars() a bit in
   that it now automatically will truncate the content of an
   input field if the new linit is lower than the number of
   bytes already stored in the input field.

c) For those that have created their object types there's a small
   change. The main function for handling an object, as well as
   that for a pre- and post-handler had the following signature:

     typedef int ( * FL_HANDLEPTR )( FL_OBJECT * obj,
                                     int         event,
                                     FL_Coord    mx,
                                     FL_Coord    my,
                                     int         key,
                                     void      * xev );

   I changed the type of the 'key' argument to a ne type,

     typedef int ( * FL_HANDLEPTR )( FL_OBJECT * obj,
                                     int         event,
                                     FL_Coord    mx,
                                     FL_Coord    my,
                                     FL_Char     key,
                                     void      * xev );

   'FL_Char' is an unsigned integer type, capable of storing
   at least 32 bits. In most cases this should be no problem
   (except a compiler warning if you set the warning level
   high enough).

   Finally, concerning the selection of default fonts: I've been
doing a bit of comparison between the fonts in the old version
and how they look now and, as far as I can see, the best match
is (in terms of width of strings and similarity)

    Helvertica  ->   DejaVu Sans Condensed
    Courier     ->   DejaVu Mono
    Times       ->   DejaVu Serif Condensed
    Charter     ->   DejaVu Serif

This should help minimize the number of changes you'll have
to make to your existing layouts. Of course, that's what I see
on my machine - given the uncertainties of which X11 fonts are
actually used you might have other experiences. Please let me

   If you use UTF-8 extensively don't be too disappointed if
some glyphs don't get properly displayed when using the default
fonts. The DejaVu fonts already contain a lot of glyphs (at least
if compared to other fonts, and they now make up nearly 2/3 of
the data volume of a download), but they're still missing a lot.
I've packages the newest version available (but if you already
have them installed on your system might use an older version).
But don't expect e.g., Chinese (or Klingon) texts etc. to be ren-
dered correctly - those glyphs aren't part of these fonts (yet).
And the UTF-8 support in the X11 fonts is, of course, even a lot
                         Best regards, Jens
  \   Jens Thoms Toerring  ________      address@hidden
   \_______________________________      http://toerring.de

reply via email to

[Prev in Thread] Current Thread [Next in Thread]