bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Gawk Enhancement Suggestion


From: Ed Morton
Subject: Re: [bug-gawk] Gawk Enhancement Suggestion
Date: Sun, 18 Dec 2011 11:09:08 -0600
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20111105 Thunderbird/8.0

Arnold - in case it's a useful reference, below is an awk function that'd implement the functionality I'd like us to have in a built in gawk cut() function along with a sample "body" just to show how it behaves. In reality I wouldn't expect people to be passing the cut() arguments in as awk variables, they'd probably be hard-coded.

    Ed.

$ cat cut.awk
function cut(inStr,tgtFldNrsStr,inSep,outSep,
        errMsg,numTgtFldNrs,tmpArr,tmpSepArr,sepArr,i,rangeStart,
        rangeEnd,fldNr,tgtFldNrsArr,numFlds,inArr,outStr,numSelFlds)
{
    # Rationalize the input separator and input string to account for expected,
    # special-case behavior when the input separator is a single blank char, " 
":
    inSep = (inSep == "" ? FS : inSep)

    if (inSep == " ") {
        gsub(/(^[[:space:]]+|[[:space:]]+$)/,"",inStr)
        inSep="[[:space:]]+"
    }

    # Create arrays of the fields and the separators between the fields:
    numFlds = split(inStr,inArr,inSep)

    for (fldNr=1; fldNr<=numFlds; fldNr++) {
        inStr = substr(inStr,length(inArr[fldNr])+1)
        match(inStr,inSep)
        sepArr[fldNr] = (outSep == "" ? substr(inStr,1,RLENGTH) : outSep)
        inStr = substr(inStr,RLENGTH+1)
    }

    # Create an array of the field numbers to be selected:
    numTgtFldNrs = split(tgtFldNrsStr,tmpArr,/[,-]/)
    split(tgtFldNrsStr,tmpSepArr,/[^,-]+/)

    for (i=1;i<=numTgtFldNrs;i++) {
        rangeStart = tmpArr[i]

        if (tmpSepArr[i+1] == "-") {
            if ( tmpArr[i+1] == "" ) {
                tmpArr[i+1] = numFlds
            }
            rangeEnd = tmpArr[++i]
        }
        else {
            rangeEnd = rangeStart
        }

        if ( (rangeStart ~ /^[^[:digit:]]+$/) && (rangeEnd ~ /^[^[:digit:]]+$/) 
) {
            if (rangeStart > rangeEnd) {
                errMsg = "invalid decreasing range"
            }
            else if (rangeStart < 1) {
                errMsg = "fields and positions are numbered from 1"
            }
        }
        else {
            errMsg = "invalid field list"
        }

        if (errMsg) {
            print errMsg | "cat>&2"
            return ""
        }

        for (fldNr=rangeStart; fldNr<=rangeEnd; fldNr++) {
            tgtFldNrsArr[fldNr]
        }
    }

    # Form a string of the selected fields with their preceeding separators
    # (after the first field):
    for (fldNr=1; fldNr<=numFlds; fldNr++) {
        if (fldNr in tgtFldNrsArr) {
            outStr = (numSelFlds++ ? outStr sepArr[fldNr-1] : "") inArr[fldNr]
        }
    }

    return outStr
}

{
    print cut($0,flds,fs,ofs)
}

$ cat file
    a b    c  d
$
$ awk -v flds="2,3" -f cut.awk file
b    c
$ awk -v flds="1-3" -f cut.awk file
a b    c
$ awk -v flds="2-" -f cut.awk file
b    c  d
$  awk -v flds="1,2-4" -v fs="[[:blank:]]+" -v ofs="#" -f cut.awk file
#a#b#c

On 12/14/2011 4:22 PM, Ed Morton wrote:
Arnold - I'll take a look and get back to you. I expect I can code up what I have in mind (which isn't quite what's in that example) as an awk function that people could copy/paste but I was really hoping to get a new builtin string function out of this.

Thanks,

    Ed.

On 12/6/2011 2:38 PM, Aharon Robbins wrote:
Hi Ed.

Re the below. See the node "Cut Program" in the gawk manual.
Adaptation into a function should be straightforward.

A diff to redo that section as a function + code to call it will
be cheerily reviewed and most likely accepted.

Thanks,

Arnold

Date: Fri, 02 Dec 2011 07:36:50 -0600
From: Ed Morton<address@hidden>
To: address@hidden
Subject: [bug-gawk] Gawk Enhancement Suggestion

Arnold - could we get a "cut()" function for gawk similar to the UNIX
one of the same name that just lets people select specific fields or
ranges of fields?  Below is some details (copied from something I posted
at comp.lang.awk).

Regards,

      Ed.

A function that returns a string of selected fields is very often all we need. e.g.:

print cut($0,3)       # print from the 3rd field to the end of the record using
FS as separator
print cut($0,3,",")   # print from the 3rd field to the end of the record using
"," as separator
print cut($0,"3-7")   # print from the 3rd field to the 7th field
print cut($0,"3,5,7") # print the 3rd, 5th, and 7th fields

The separating substring between the fields would just be whatever separator
preceeded the trailing fields. So, with this input (with leading blanks):

      a b    c  d

the following code snippets would produce the output that follows them:

print cut($0,"1,3")
a    c

print cut($0,"1,3",/[[:blank:]]+/)
   b

print cut($0,"1-3")
a b    c

print cut($0,"2,4")
b  d

print cut($0,"1-4")
a b    c  d

Obviously $0 can be replaced by any string and cut() should take an optional 4th
arg to specify the separator in it's returned string:

print cut($0,"1,3",FS,OFS)
a c

print cut($0,"1,3",/[[:blank:]]+/,"#")
#b

print cut($0,"1-3",FS,OFS)
a b c

print cut($0,"2,4",FS,":")
b:d

print cut($0,"1-4",FS,"|")
a|b|c|d





reply via email to

[Prev in Thread] Current Thread [Next in Thread]