[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] feature request: iconv/recode dynamic extension

From: Franta Hanzlík
Subject: Re: [bug-gawk] feature request: iconv/recode dynamic extension
Date: Sat, 22 Dec 2018 21:32:26 +0100

On Sat, 22 Dec 2018 12:11:59 -0700
address@hidden wrote:

> Hi.
> You've already had some good responses; it looks like I don't need
> to really do anything. An extension to provide access to iconv would
> be useful, but I don't have the cycles for that (unless you want to
> discuss my consulting rates).
> For searching your data, instead of looping, you may want to rework
> things to take advantage of gawk's associative arrays and the ability
> to see if some value exists as a subscript in an array.  I strongly
> recommend working your way through the gawk manual, particularly Part I
> thereof, to learn more.
> Best of luck,
> Arnold

Hi Arnold,
thank for your response. Unfortunately associative arrays I cannot use
(I think) - items as name, surname, street, city and other are not
unique and isn't possible use them as array indexes. And using some
as all values concatenated to build unique address perhaps also isn't
way in this case - as people are making typing mistakes or some of
their contact values changes. And for maximum accuracy I want to catch
these minor changes too - for this I compute something like the
probability of matching the form data with a particular user in the DB.
I think it isn't possible to use associative array in this case.
On other hand, they are perhaps useful in dedia() function, as it
recommended Wolfgang in previous post:

BEGIN{ dd["ü"] = "u"; dd["ö"] = "o"; dd["ó"] = "o"; d["ä"] = "a" }

function dedia(s){
    r = "";
    for( i = 1; i <= length(s); ++i ){
        c = substr( s, i, 1 );
        if ( c in dd ){                 <======
            c = dd[c];
        r = r c;
    return r;

Gawk manual and associated docs (GAWK: Effective AWK programming, etc.)
I always study, yes.

Franta H.

> Franta Hanzl??k <address@hidden> wrote:
> > Hello,
> > not sure when it is good idea, but I think this may be usefull for
> > others also: I'm just doing some word processing in gawk, and it's
> > part is two string comparison. These strings are plaintext ASCII
> > strings obtained by removing diacritics from the original Latin-1
> > and Latin-2 strings - thus I need conversion as
> >  "??????????????????????????" -> "aaeeooscyiuuu".
> > For now I solve this by calling external conversion program - as
> >
> > iconv -f UTF-8 -t US-ASCII//TRANSLIT <<< "??????????????????????????????"
> >    or
> > recode -f u8..flat <<< "??????????????????????????????"
> >
> > but for thousands strings it is too slow (and resource expensive).
> >
> > There is perhaps lot of similar text conversions cases, where gawk
> > dynamic extension for this needs wil be very useful.
> >
> > Eventually, when this idea isn't totally bad, I can try to program
> > it, but I have no programming skills - thus can You please give me
> > some advice on how to do this?
> > -- 
> > Thanks in advance, Franta Hanzlik
> >  

S pozdravem
František Hanzlík

Luční 502           Linux/Unix/LAN/Internet       Tel: +420-372-222302
33209 Štěnovice    e-mail:address@hidden      Fax: +420-372-222302
Czech Republic        http://hanzlici.cz/         GSM: +420-604-117319
Tento mail neobsahuje viry, byl odeslán z operačního systému Linux

reply via email to

[Prev in Thread] Current Thread [Next in Thread]