[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Accessing the environment's locale encoding settings

From: Ludovic Courtès
Subject: Accessing the environment's locale encoding settings
Date: Wed, 16 Nov 2011 01:13:51 +0100
User-agent: Gnus/5.110018 (No Gnus v0.18) Emacs/24.0.90 (gnu/linux)

Hi Bruno,

In Guile, strings coming from the C world are assumed to be encoded in
the current locale encoding.  Like in C, the current locale is set using
‘setlocale’, and it’s up to the user to write (setlocale LC_ALL "") to
set the locale according to the relevant environment variables.

The problem comes with command-line arguments: the user hasn’t yet had a
chance to call ‘setlocale’, yet they most likely have to be converted
from locale encoding.  Up to 2.0.3, they were instead assumed to be
ASCII, and we’re looking into fixing it [0].

The trick we came up with is to look at $LANG, $LC_ALL, & co. and try to
determine what the locale encoding “would be” if (setlocale LC_ALL "")
were called [1].

To do that, I use a variant of ‘localecharset’ in Gnulib, with a
special-case for the C locale:

/* Return the name of the locale encoding suggested by environment
   variables, even if it's not current, or NULL if no encoding is
   defined.  Based on Gnulib's `localcharset.c'.  */
static const char *
locale_encoding (void)
  static char buf[2 + 10 + 1];
  const char *locale, *codeset = NULL;

  /* Allow user to override the codeset, as set in the operating system,
     with standard language environment variables.  */
  locale = getenv ("LC_ALL");
  if (locale == NULL || locale[0] == '\0')
      locale = getenv ("LC_CTYPE");
      if (locale == NULL || locale[0] == '\0')
        locale = getenv ("LANG");
  if (locale != NULL && locale[0] != '\0')
      /* If the locale name contains an encoding after the dot, return it.  */
      const char *dot = strchr (locale, '.');

      if (dot != NULL)
          const char *modifier;

          /* Look for the possible @... trailer and remove it, if any.  */
          modifier = strchr (dot, '@');
          if (modifier == NULL)
            return dot;
          if (modifier - dot < sizeof (buf))
              memcpy (buf, dot, modifier - dot);
              buf [modifier - dot] = '\0';
              return buf;
      else if (strcmp (locale, "C") == 0)
          strcpy (buf, "ASCII");
          return buf;

      codeset = locale;

  return codeset;
What do you think of this approach?

Should we be checking for charset aliases?  If so, we’d need help from
Gnulib since ‘get_charset_aliases’ is internal.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]