bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new modules for Unicode normalization


From: Pádraig Brady
Subject: Re: new modules for Unicode normalization
Date: Mon, 23 Feb 2009 00:37:53 +0000
User-agent: Thunderbird 2.0.0.6 (X11/20071008)

Bruno Haible wrote:
> Pádraig Brady wrote:
>>> I cannot estimate how much of these 10 MB get actually loaded into a
>>> process' working set. ...
>> $ uconv -x NFC&
>> $ sudo bin/ps_mem.py | grep uconv
>>  Private  +   Shared  =  RAM used       Program
>>   1.9 MiB + 788.0 KiB =   2.7 MiB       uconv
> 
> A great tool! 

Cheers :)
It gets a surprising amount of downloads.
I must look at getting it into procps or somewhere.

> Let's see how it compares to a normalization program built
> against gnulib:
> 
> $ /arch/x86-linux/inst-icu/3.6/bin/uconv -x NFC &
> 
> $ ./unorm NFC &
> 
> # ./bin/ps_mem.py |grep uconv
>   2.3 MiB + 126.5 KiB =   2.4 MiB       uconv
> # ./bin/ps_mem.py |grep unorm
> 100.0 KiB +  11.5 KiB = 111.5 KiB       unorm
> 
> So, it uses about 20 times less memory.

Impressive, though uconv provides more functionality
(equiv to iconv+unorm).

> Now let me try to compare the speed (averaged over 100 consecutive runs, to
> eliminate the effects of load dependent CPU frequency scaling):

I usually do `/etc/init.d/cpuspeed stop` before any benchmarking.

> $ time uconv -x NFD < $UCD51/NormalizationTest.txt > /tmp/1
> real    0m0.612s
> $ time ./unorm NFD < $UCD51/NormalizationTest.txt > /tmp/2
> real    0m0.261s
> 
> $ time uconv -x NFKC < $UCD51/NormalizationTest.txt > /tmp/1
> real    0m0.598s
> $ time ./unorm NFKC < $UCD51/NormalizationTest.txt > /tmp/2
> real    0m0.309s
> 
> So, it's also twice as fast.
> 
> The gnulib modules used are:
>   uninorm/filter
>   uninorm/nfc
>   uninorm/nfd
>   uninorm/nfkc
>   uninorm/nfkd
>   unistr/u8-uctomb
>   unistr/u8-mbtoucr

That is also impressive. Can you share the code for unorm
so I can do my own testing.

I'm now swayed a bit more in favour of another `unorm` util.
However I'm still 60/40 against. What do others think?

cheers,
Pádraig.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]