[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wcwidth replacement problems

From: Bruno Haible
Subject: Re: wcwidth replacement problems
Date: Sun, 24 Aug 2008 12:29:06 +0200
User-agent: KMail/1.5.4


Alexander V. Lukyanov wrote:
> I'm trying to use wcwith replacement on Solaris 8 and have noticed some
> problems. At first, the test for replacement did not detect the problem and
> thus did not replace the system function (patch for this is attached).

The comment in your diff says:

> +    dnl On Solaris 8, wcwidth(0x2022) (BULLET) returns -1.

This is not the case for me:

$ uname -srm
SunOS 5.8 sun4u
$ cat foo.c
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
int main ()
  if (setlocale (LC_ALL, "") == NULL) { printf ("bad locale\n"); exit (1); }
  printf ("wcwidth (0x00AB) = %d\n", wcwidth (0x00AB));
  printf ("wcwidth (0x00BB) = %d\n", wcwidth (0x00BB));
  printf ("wcwidth (0x2022) = %d\n", wcwidth (0x2022));
  printf ("wcwidth (0xd856) = %d\n", wcwidth (0xd856));
  return 0;
$ export LC_ALL=fr_FR.UTF-8
$ ./a.out 
wcwidth (0x00AB) = 1
wcwidth (0x00BB) = 1
wcwidth (0x2022) = 2
wcwidth (0xd856) = -1

Which looks all fine. (Giving the BULLET a width of 2 is a bit strange, but
not really wrong.)

Can you show the results of the same test program on your Solaris 8 machine?

> Then I have noticed that the replacement function is slow

Correct. Do you have suggestions for speeding up the replacement function?

> and broken.
> At least rpl_wcwidth(0x00AB) returns 0, but it should return 1 for the 
> character.

Oops, right. I'm fixing it through the attached patch, and adding an additional
unit test.

> The slowness is probably caused by checking the charset string every time
> wcwidth is called. I'm not sure which way to fix it would be correct, probably
> caching the check result will help.

When would the cache be invalidated? You cannot hook into setlocale().

> BTW, why not use this one: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c ?
> It's public domain.

It has also its bugs [1]. Additionally, it's slower because it uses binary
search rather than immediate table accesses.

Thanks for the reports!


[1] http://mail.nl.linux.org/linux-utf8/2007-07/msg00000.html

2008-08-24  Bruno Haible  <address@hidden>

        Fix uc_width(0x00AB) bug, introduced on 2007-07-08.
        * lib/uniwidth/width.c (nonspacing_table_data): Set bit for 0x00AD,
        not 0x00AB.
        Reported by Alexander V. Lukyanov <address@hidden>.

--- lib/uniwidth/width.c.orig   2008-08-24 12:26:12.000000000 +0200
+++ lib/uniwidth/width.c        2008-08-24 11:47:40.000000000 +0200
@@ -1,5 +1,5 @@
 /* Determine display width of Unicode character.
-   Copyright (C) 2001-2002, 2006-2007 Free Software Foundation, Inc.
+   Copyright (C) 2001-2002, 2006-2008 Free Software Foundation, Inc.
    Written by Bruno Haible <address@hidden>, 2002.
    This program is free software: you can redistribute it and/or modify it
@@ -36,7 +36,7 @@
   /* 0x0000-0x01ff */
   0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00, /* 0x0000-0x003f */
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, /* 0x0040-0x007f */
-  0xff, 0xff, 0xff, 0xff, 0x00, 0x08, 0x00, 0x00, /* 0x0080-0x00bf */
+  0xff, 0xff, 0xff, 0xff, 0x00, 0x20, 0x00, 0x00, /* 0x0080-0x00bf */
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x00c0-0x00ff */
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x0100-0x013f */
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x0140-0x017f */

reply via email to

[Prev in Thread] Current Thread [Next in Thread]