[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wcwidth replacement problems
From: |
Bruno Haible |
Subject: |
Re: wcwidth replacement problems |
Date: |
Sun, 24 Aug 2008 12:29:06 +0200 |
User-agent: |
KMail/1.5.4 |
Hi,
Alexander V. Lukyanov wrote:
> I'm trying to use wcwith replacement on Solaris 8 and have noticed some
> problems. At first, the test for replacement did not detect the problem and
> thus did not replace the system function (patch for this is attached).
The comment in your diff says:
> + dnl On Solaris 8, wcwidth(0x2022) (BULLET) returns -1.
This is not the case for me:
$ uname -srm
SunOS 5.8 sun4u
$ cat foo.c
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
int main ()
{
if (setlocale (LC_ALL, "") == NULL) { printf ("bad locale\n"); exit (1); }
printf ("wcwidth (0x00AB) = %d\n", wcwidth (0x00AB));
printf ("wcwidth (0x00BB) = %d\n", wcwidth (0x00BB));
printf ("wcwidth (0x2022) = %d\n", wcwidth (0x2022));
printf ("wcwidth (0xd856) = %d\n", wcwidth (0xd856));
return 0;
}
$ export LC_ALL=fr_FR.UTF-8
$ ./a.out
wcwidth (0x00AB) = 1
wcwidth (0x00BB) = 1
wcwidth (0x2022) = 2
wcwidth (0xd856) = -1
Which looks all fine. (Giving the BULLET a width of 2 is a bit strange, but
not really wrong.)
Can you show the results of the same test program on your Solaris 8 machine?
> Then I have noticed that the replacement function is slow
Correct. Do you have suggestions for speeding up the replacement function?
> and broken.
>
> At least rpl_wcwidth(0x00AB) returns 0, but it should return 1 for the
> character.
> 0x00AB is LEFT-POINTING DOUBLE ANGLE QUOTATION MARK.
Oops, right. I'm fixing it through the attached patch, and adding an additional
unit test.
> The slowness is probably caused by checking the charset string every time
> wcwidth is called. I'm not sure which way to fix it would be correct, probably
> caching the check result will help.
When would the cache be invalidated? You cannot hook into setlocale().
> BTW, why not use this one: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c ?
> It's public domain.
It has also its bugs [1]. Additionally, it's slower because it uses binary
search rather than immediate table accesses.
Thanks for the reports!
Bruno
[1] http://mail.nl.linux.org/linux-utf8/2007-07/msg00000.html
2008-08-24 Bruno Haible <address@hidden>
Fix uc_width(0x00AB) bug, introduced on 2007-07-08.
* lib/uniwidth/width.c (nonspacing_table_data): Set bit for 0x00AD,
not 0x00AB.
Reported by Alexander V. Lukyanov <address@hidden>.
--- lib/uniwidth/width.c.orig 2008-08-24 12:26:12.000000000 +0200
+++ lib/uniwidth/width.c 2008-08-24 11:47:40.000000000 +0200
@@ -1,5 +1,5 @@
/* Determine display width of Unicode character.
- Copyright (C) 2001-2002, 2006-2007 Free Software Foundation, Inc.
+ Copyright (C) 2001-2002, 2006-2008 Free Software Foundation, Inc.
Written by Bruno Haible <address@hidden>, 2002.
This program is free software: you can redistribute it and/or modify it
@@ -36,7 +36,7 @@
/* 0x0000-0x01ff */
0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00, /* 0x0000-0x003f */
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, /* 0x0040-0x007f */
- 0xff, 0xff, 0xff, 0xff, 0x00, 0x08, 0x00, 0x00, /* 0x0080-0x00bf */
+ 0xff, 0xff, 0xff, 0xff, 0x00, 0x20, 0x00, 0x00, /* 0x0080-0x00bf */
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x00c0-0x00ff */
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x0100-0x013f */
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0x0140-0x017f */