Re: MinGW vs. setlocale

From: Eli Zaretskii
Subject: Re: MinGW vs. setlocale
Date: Sun, 15 Jun 2014 20:23:17 +0300

> Date: Thu, 12 Jun 2014 21:18:51 +0300
> From: Eli Zaretskii <address@hidden>
> CC: address@hidden
> I still have one problem left: the Turkish character-mapping tests
> are failing.  I think that's because somehow the LC_ALL environment
> variable gets set to "C".  With the current libunistring code, that
> setting in the environment overrides what's been set by 'setlocale',
> and the Turkish language rules are not used.
> I will fix this in libunistring

Done, and all the tests in i18n.test now pass on Windows.  But I also
needed another small change in i18n.c, see below.

> By the way, how do I run a single test from test-suite?

I'd still love to know the answer to this one.

> > >> --8<---------------cut here---------------start------------->8---
> > >> scheme@(guile-user)> ,m (ice-9 i18n)
> > >> scheme@(ice-9 i18n)> (locale-decimal-point (make-locale LC_ALL "fr_FR"))
> > >> $2 = ","
> > >> scheme@(ice-9 i18n)> (locale-thousands-separator (make-locale LC_ALL 
> > >> "fr_FR"))
> > >> $3 = " "
> > >> --8<---------------cut here---------------end--------------->8---
> > >
> > > I did try that, and saw a strange thing: the thousands separator is
> > > displayed as "\xa0".  That is very strange, because nl_langinfo does
> > > return " " for the French locale, as expected.  Why would the blank be
> > > translated into NBSP?  Can this also be due to libunistring problems?
> > 
> > NBSP is actually a better answer than just space, because it’d be unwise
> > to introduce a break in the middle of a number.
> But nl_langinfo returns a blank.  So who converts that to NBSP?

Answering myself here: no one.  What I thought was a blank was
actually NBSP (which was displayed as blank by GDB), that's what the
Windows French locale returns as the thousands separator.  I guess
i18n.test shouldn't assume the separator is a blank, but should
instead use the actual character.

> > So does ‘number->locale-string’ return "123\xa0456" for you?
> No, I get "123456".  I will revisit this after I finish fixing
> libunistring,

Revisited and fixed.  It turned out i18n.scm needed to be recompiled,
because it records the supported values of nl_langinfo arguments in
the .go file.  So whenever more supported values are added to
nl_langinfo (which was what I did for GROUPING), i18n.scm should be
recompiled.  (Shouldn't this happen automatically?)

> > >> >   UNRESOLVED: i18n.test: format ~h: French: 12345.5678
> > >> >   UNRESOLVED: i18n.test: format ~h: English: 12345.5678
> > >> >
> > >> > ~h is not supported on Windows.
> > >> 
> > >> ~h is implemented using ‘number->locale-string’.
> > >
> > > Maybe I'm confused, but isn't ~h about position directive in formats?
> > 
> > Yes, but that’s implemented in Scheme, in ice-9/format.scm.
> Thanks for the pointer, I guess I will need to take a better look at
> that.

Once i18n.scm was recompiled, the ~h format test also started working.

There are a couple of other locale-related tests that fail on Windows,
like the one that reads Unicode text from strings.  I decided not to
fix those, because they test a fundamentally non-portable operation.

Here are the changes I needed for i18n.c to get the tests to succeed.
They have to do with non-portable assumptions about when the various
nl_langinfo constants are defined.  E.g., there's no reason to assume
that if INT_FRAC_DIGITS isn't defined, neither will be FRAC_DIGITS.
As luck would have it, MinGW has some of these, but not the others, so
the conditionals failed, and Guile failed to convert P_SIGN_POSN to
one of the symbolic values, instead leaving it at its numerical value.

--- libguile/i18n.c~2   2014-06-15 14:21:53 +0300
+++ libguile/i18n.c     2014-06-15 14:58:09 +0300
@@ -1583,9 +1583,13 @@ SCM_DEFINE (scm_nl_langinfo, "nl-langinf
-#if (defined FRAC_DIGITS) && (defined INT_FRAC_DIGITS)
+#if defined FRAC_DIGITS || defined INT_FRAC_DIGITS
        case FRAC_DIGITS:
        case INT_FRAC_DIGITS:
          /* This is to be interpreted as a single integer.  */
          if (*c_result == CHAR_MAX)
            /* Unspecified.  */
@@ -1597,12 +1601,18 @@ SCM_DEFINE (scm_nl_langinfo, "nl-langinf
-#if (defined P_CS_PRECEDES) && (defined INT_N_CS_PRECEDES)
+#if defined P_CS_PRECEDES || defined N_CS_PRECEDES ||  \
+  defined INT_P_CS_PRECEDES || defined INT_N_CS_PRECEDES || \
+  defined P_SEP_BY_SPACE || defined N_SEP_BY_SPACE
        case P_CS_PRECEDES:
        case N_CS_PRECEDES:
        case INT_P_CS_PRECEDES:
        case INT_N_CS_PRECEDES:
-#if (defined P_SEP_BY_SPACE) && (defined N_SEP_BY_SPACE)
+#ifdef P_SEP_BY_SPACE
        case P_SEP_BY_SPACE:
        case N_SEP_BY_SPACE:
@@ -1613,11 +1623,16 @@ SCM_DEFINE (scm_nl_langinfo, "nl-langinf
-#if (defined P_SIGN_POSN) && (defined INT_N_SIGN_POSN)
+#if defined P_SIGN_POSN || defined N_SIGN_POSN || \
+  defined INT_P_SIGN_POSN || defined INT_N_SIGN_POSN
+#ifdef P_SIGN_POSN
        case P_SIGN_POSN:
        case N_SIGN_POSN:
        case INT_P_SIGN_POSN:
        case INT_N_SIGN_POSN:
          /* See `(libc) Sign of Money Amount' for the interpretation of the
             return value here.  */
          switch (*c_result)

