bug-guile
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite


From: Linas Vepstas
Subject: bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string
Date: Mon, 9 Jan 2017 21:34:36 -0600

This short C program illustrates the issue.  The locale, the output port etc.
are UTF-8.  The bad results are no surprise: the code currently in git for
scm_puts etc. explicitly ignores the locale setting, always, and always
assumes latin1 -- its hard-coded in there.

--linas

#include <libguile.h>

void *wrap_eval(void* p)
{
   char *wtf = "(setlocale LC_ALL \"\")";
   SCM eval_str = scm_from_utf8_string(wtf);
   scm_eval_string(eval_str);

   return NULL;
}

void *wrap_puts(void* p)
{
   char *wtf = p;

   SCM port = scm_current_output_port ();

   scm_puts("the port-encoding is=", port);
   scm_puts(scm_to_utf8_string(scm_port_encoding(port)), port);

   scm_puts("\nThe string to display is =", port);
   scm_puts (wtf, port);

   scm_puts("\nWas expecting to see this=", port);
   SCM str = scm_from_utf8_string(wtf);
   scm_display(str, port);
   scm_puts("\n\n", port);

   return NULL;
}

int main(int argc, char* argv[])
{
   scm_with_guile(wrap_eval, 0x0);

   char * wtf = "Ćićolina";
   scm_with_guile(wrap_puts, wtf);

   wtf = "Thủ Dầu Một";
   scm_with_guile(wrap_puts, wtf);

   wtf = "Småland";
   scm_with_guile(wrap_puts, wtf);

   wtf = "Hòa Phú Phú Tân";
   scm_with_guile(wrap_puts, wtf);

   wtf = "係 拉 丁 字 母";
   scm_with_guile(wrap_puts, wtf);
}

The output is always this:

the port-encoding is=UTF-8
The string to display is =Ćićolina
Was expecting to see this=Ćićolina

the port-encoding is=UTF-8
The string to display is =Thủ Dầu Một
Was expecting to see this=Thủ Dầu Một

the port-encoding is=UTF-8
The string to display is =Småland
Was expecting to see this=Småland

the port-encoding is=UTF-8
The string to display is =Hòa Phú Phú Tân
Was expecting to see this=Hòa Phú Phú Tân

the port-encoding is=UTF-8
Was expecting to see this=係 拉 丁 字 母 æ¯


What's cool is that all this stuff works in email!

--linas

On Mon, Jan 9, 2017 at 4:03 PM, Andy Wingo <address@hidden> wrote:
> On Sun 08 Jan 2017 19:16, Linas Vepstas <address@hidden> writes:
>
>> There appears to be a regression in guile-2.2 with utf8 handling
>> in the scm_puts() scm_lfwrite() and scm_c_put_string() functions.
>>
>> In guile-2.0, one could give these utf8-encoded strings, and these
>> would display just fine.  In 2.2 they get mangled.
>
> Could it be this from NEWS:
>
>   ** Better locale support in Guile scripts
>
>   When Guile is invoked directly, either from the command line or via a
>   hash-bang line (e.g. "#!/usr/bin/guile"), it now installs the current
>   locale via a call to `(setlocale LC_ALL "")'.  For users with a unicode
>   locale, this makes all ports unicode-capable by default, without the
>   need to call `setlocale' in your program.  This behavior may be
>   controlled via the GUILE_INSTALL_LOCALE environment variable; see the
>   manual for more.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]