bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new modules for Unicode normalization


From: Bruno Haible
Subject: Re: new modules for Unicode normalization
Date: Sun, 22 Feb 2009 18:21:36 +0100
User-agent: KMail/1.9.9

Pádraig Brady wrote:
> > I cannot estimate how much of these 10 MB get actually loaded into a
> > process' working set. ...
> 
> $ uconv -x NFC&
> $ sudo bin/ps_mem.py | grep uconv
>  Private  +   Shared  =  RAM used       Program
>   1.9 MiB + 788.0 KiB =   2.7 MiB       uconv

A great tool! Let's see how it compares to a normalization program built
against gnulib:

$ /arch/x86-linux/inst-icu/3.6/bin/uconv -x NFC &

$ ./unorm NFC &

# ./bin/ps_mem.py |grep uconv
  2.3 MiB + 126.5 KiB =   2.4 MiB       uconv
# ./bin/ps_mem.py |grep unorm
100.0 KiB +  11.5 KiB = 111.5 KiB       unorm

So, it uses about 20 times less memory.

Now let me try to compare the speed (averaged over 100 consecutive runs, to
eliminate the effects of load dependent CPU frequency scaling):

$ time uconv -x NFD < $UCD51/NormalizationTest.txt > /tmp/1
real    0m0.612s
user    0m0.597s
sys     0m0.014s
$ time ./unorm NFD < $UCD51/NormalizationTest.txt > /tmp/2
real    0m0.261s
user    0m0.247s
sys     0m0.014s

$ time uconv -x NFKC < $UCD51/NormalizationTest.txt > /tmp/1
real    0m0.598s
user    0m0.583s
sys     0m0.014s
$ time ./unorm NFKC < $UCD51/NormalizationTest.txt > /tmp/2
real    0m0.309s
user    0m0.297s
sys     0m0.011s

So, it's also twice as fast.

The gnulib modules used are:
  uninorm/filter
  uninorm/nfc
  uninorm/nfd
  uninorm/nfkc
  uninorm/nfkd
  unistr/u8-uctomb
  unistr/u8-mbtoucr

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]