[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2] vl: set LC_CTYPE early in main() for all cod
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-devel] [PATCH v2] vl: set LC_CTYPE early in main() for all code |
Date: |
Tue, 16 Apr 2019 09:49:09 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) |
Daniel P. Berrangé <address@hidden> writes:
> Localization is not a feature whose impact is limited to the UI
> frontends. Other parts of QEMU rely in localization.
Typically by accident :)
> In particular the
> USB MTP driver needs to be able to convert filenames from the locale
> specified character set into UTF-16 / UCS-2 encoded wide characters.
Oookay. I guess that makes some sense.
> setlocale() is only set from two of the UI frontends though, and worse,
> there is inconsistent behaviour with GTK setting LC_CTYPE to C.UTF-8,
> while ncurses honours whatever is set in the user's environment.
>
> This causes the USP MTP driver to behave differently depending on which
> UI frontend is activated.
>
> Furthermore, the curses settings are dangerous because LC_CTYPE will affect
> the is{upper,lower,alnum} functions which much QEMU code assumes to have
> C locale sorting behaviour. This also breaks QMP if the env requests a
> non-UTF-8 locale, since QMP is defined to use UTF-8 encoding for JSON.
Can you elaborate on how QMP breaks with a non-UTF-8 locale? Ah, you do
in the comment you add to vl.c.
> This problematic curses code was introduced in:
>
> commit 2f8b7cd587558944532f587abb5203ce54badba9
> Author: Samuel Thibault <address@hidden>
> Date: Mon Mar 11 14:51:27 2019 +0100
>
> curses: add option to specify VGA font encoding
>
> This patch moves the GTK frontend setlocale() handling into the main()
> method. This ensures QMP and other QEMU code has a predictable C.UTF-8.
>
> Eventually QEMU should set LC_ALL, honouring the full user environment,
> but this needs various cleanups in QEMU code first.
Eventually, we'll all be dead.
> Hardcoding LC_CTYPE
> to C.UTF-8 is a partial regression vs the above curses commit, since it
> will break the curses wide character handling for non-UTF-8 locales but
> this is unavoidable until QEMU is cleaned up to cope with non-UTF-8
> locales fully.
>
> Setting of LC_MESSAGES is left in the GTK code since only the GTK
> frontend is using translation of strings. This lets us avoid the
> platform portability problem where LC_MESSAGES is not provided by
> locale.h on MinGW. GTK pulls it in indirectly from libintl.h via
> gi18n.h header, but we don't want to pull that into the global
> QEMU namespace.
>
> Signed-off-by: Daniel P. Berrangé <address@hidden>
> ---
>
> Changed in v2:
>
> - Leave LC_MESSAGES setting in gtk code to avoid platform
> portability problems.
>
> ui/curses.c | 2 --
> ui/gtk.c | 32 +++++++++++---------------------
> vl.c | 40 ++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 51 insertions(+), 23 deletions(-)
>
> diff --git a/ui/curses.c b/ui/curses.c
> index cc6d6da684..403cd57913 100644
> --- a/ui/curses.c
> +++ b/ui/curses.c
> @@ -27,7 +27,6 @@
> #include <sys/ioctl.h>
> #include <termios.h>
> #endif
> -#include <locale.h>
> #include <wchar.h>
> #include <langinfo.h>
> #include <iconv.h>
> @@ -716,7 +715,6 @@ static void curses_display_init(DisplayState *ds,
> DisplayOptions *opts)
> }
> #endif
>
> - setlocale(LC_CTYPE, "");
> if (opts->u.curses.charset) {
> font_charset = opts->u.curses.charset;
> }
> diff --git a/ui/gtk.c b/ui/gtk.c
> index e96e15435a..e6a41c79ab 100644
> --- a/ui/gtk.c
> +++ b/ui/gtk.c
> @@ -2208,12 +2208,12 @@ static void gtk_display_init(DisplayState *ds,
> DisplayOptions *opts)
>
> s->free_scale = FALSE;
>
> - /* Mostly LC_MESSAGES only. See early_gtk_display_init() for details. For
> - * LC_CTYPE, we need to make sure that non-ASCII characters are
> considered
> - * printable, but without changing any of the character classes to make
> - * sure that we don't accidentally break implicit assumptions. */
> + /*
> + * See comment in main() for why it has only setup LC_CTYPE,
> + * as opposed to LC_ALL. Given that we need to enable
> + * LC_MESSAGES too for menu translations.
> + */
> setlocale(LC_MESSAGES, "");
> - setlocale(LC_CTYPE, "C.UTF-8");
> bindtextdomain("qemu", CONFIG_QEMU_LOCALEDIR);
> textdomain("qemu");
>
> @@ -2262,22 +2262,12 @@ static void gtk_display_init(DisplayState *ds,
> DisplayOptions *opts)
>
> static void early_gtk_display_init(DisplayOptions *opts)
> {
> - /* The QEMU code relies on the assumption that it's always run in
> - * the C locale. Therefore it is not prepared to deal with
> - * operations that produce different results depending on the
> - * locale, such as printf's formatting of decimal numbers, and
> - * possibly others.
> - *
> - * Since GTK+ calls setlocale() by default -importing the locale
> - * settings from the environment- we must prevent it from doing so
> - * using gtk_disable_setlocale().
> - *
> - * QEMU's GTK+ UI, however, _does_ have translations for some of
> - * the menu items. As a trade-off between a functionally correct
> - * QEMU and a fully internationalized UI we support importing
> - * LC_MESSAGES from the environment (see the setlocale() call
> - * earlier in this file). This allows us to display translated
> - * messages leaving everything else untouched.
> + /*
> + * GTK calls setlocale() by default, importing the locale
> + * settings from the environment. QEMU's main() method will
> + * have set LC_MESSAGES and LC_CTYPE to allow GTK to display
> + * translated messages, including use of wide characters. We
> + * must not allow GTK to change other aspects of the locale.
> */
> gtk_disable_setlocale();
> gtkinit = gtk_init_check(NULL, NULL);
> diff --git a/vl.c b/vl.c
> index c696ad2a13..87a55cddb3 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -49,6 +49,7 @@ int main(int argc, char **argv)
> #define main qemu_main
> #endif /* CONFIG_COCOA */
>
> +#include <locale.h>
>
> #include "qemu/error-report.h"
> #include "qemu/sockets.h"
> @@ -3022,6 +3023,45 @@ int main(int argc, char **argv, char **envp)
> char *dir, **dirs;
> BlockdevOptionsQueue bdo_queue = QSIMPLEQ_HEAD_INITIALIZER(bdo_queue);
>
> + /*
> + * Ideally we would set LC_ALL, but QEMU currently isn't able to cope
> + * with arbitrary localization settings. In particular there are two
> + * known problems
> + *
> + * - The QMP monitor needs to use the C locale rules for numeric
> + * formatting. This would need a double/int -> string formatter
> + * that is locale independant.
Aha.
Reminds me of the story of a FORTRAN compiler that rejected only
far-away customers' programs...
> + *
> + * - The QMP monitor needs to encode all data as UTF-8. This needs
> + * to be updated to use iconv(3) to explicitly convert the current
> + * locale's charset into utf-8
Really? What locale-dependent text gets sent to QMP?
> + *
> + * - Lots of codes uses is{upper,lower,alnum,...} functions, expecting
> + * C locale sorting behaviour. Most QEMU usage should likely be
> + * changed to g_ascii_is{upper,lower,alnum...} to match code
> + * assumptions, without being broken by locale settnigs.
> + *
> + * We do still have two requirements
> + *
> + * - Ability to correct display translated text according to the
> + * user's locale
> + *
> + * - Ability to handle multibyte characters, ideally according to
> + * user's locale specified character set. This affects ability
> + * of usb-mtp to correctly convert filenames to UCS16 and curses
> + * & GTK frontends wide character display.
> + *
> + * The second requirement would need LC_CTYPE to be honoured, but
> + * this conflicts with the 2nd & 3rd problems listed earlier. For
> + * now we make a tradeoff, trying to set an explicit UTF-8 localee
> + *
> + * Note we can't set LC_MESSAGES here, since mingw doesn't define
> + * this constant in locale.h Fortunately we only need it for the
> + * GTK frontend and that uses gi18n.h which pulls in a definition
> + * of LC_MESSAGES.
> + */
> + setlocale(LC_CTYPE, "C.UTF-8");
> +
> module_call_init(MODULE_INIT_TRACE);
>
> qemu_init_cpu_list();
We should've stayed out of the GUI business.