[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Gawk Enhancement Suggestion

From: Ed Morton
Subject: Re: [bug-gawk] Gawk Enhancement Suggestion
Date: Sun, 18 Dec 2011 11:09:08 -0600
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20111105 Thunderbird/8.0

Arnold - in case it's a useful reference, below is an awk function that'd implement the functionality I'd like us to have in a built in gawk cut() function along with a sample "body" just to show how it behaves. In reality I wouldn't expect people to be passing the cut() arguments in as awk variables, they'd probably be hard-coded.


$ cat cut.awk
function cut(inStr,tgtFldNrsStr,inSep,outSep,
    # Rationalize the input separator and input string to account for expected,
    # special-case behavior when the input separator is a single blank char, " 
    inSep = (inSep == "" ? FS : inSep)

    if (inSep == " ") {

    # Create arrays of the fields and the separators between the fields:
    numFlds = split(inStr,inArr,inSep)

    for (fldNr=1; fldNr<=numFlds; fldNr++) {
        inStr = substr(inStr,length(inArr[fldNr])+1)
        sepArr[fldNr] = (outSep == "" ? substr(inStr,1,RLENGTH) : outSep)
        inStr = substr(inStr,RLENGTH+1)

    # Create an array of the field numbers to be selected:
    numTgtFldNrs = split(tgtFldNrsStr,tmpArr,/[,-]/)

    for (i=1;i<=numTgtFldNrs;i++) {
        rangeStart = tmpArr[i]

        if (tmpSepArr[i+1] == "-") {
            if ( tmpArr[i+1] == "" ) {
                tmpArr[i+1] = numFlds
            rangeEnd = tmpArr[++i]
        else {
            rangeEnd = rangeStart

        if ( (rangeStart ~ /^[^[:digit:]]+$/) && (rangeEnd ~ /^[^[:digit:]]+$/) 
) {
            if (rangeStart > rangeEnd) {
                errMsg = "invalid decreasing range"
            else if (rangeStart < 1) {
                errMsg = "fields and positions are numbered from 1"
        else {
            errMsg = "invalid field list"

        if (errMsg) {
            print errMsg | "cat>&2"
            return ""

        for (fldNr=rangeStart; fldNr<=rangeEnd; fldNr++) {

    # Form a string of the selected fields with their preceeding separators
    # (after the first field):
    for (fldNr=1; fldNr<=numFlds; fldNr++) {
        if (fldNr in tgtFldNrsArr) {
            outStr = (numSelFlds++ ? outStr sepArr[fldNr-1] : "") inArr[fldNr]

    return outStr

    print cut($0,flds,fs,ofs)

$ cat file
    a b    c  d
$ awk -v flds="2,3" -f cut.awk file
b    c
$ awk -v flds="1-3" -f cut.awk file
a b    c
$ awk -v flds="2-" -f cut.awk file
b    c  d
$  awk -v flds="1,2-4" -v fs="[[:blank:]]+" -v ofs="#" -f cut.awk file

On 12/14/2011 4:22 PM, Ed Morton wrote:
Arnold - I'll take a look and get back to you. I expect I can code up what I have in mind (which isn't quite what's in that example) as an awk function that people could copy/paste but I was really hoping to get a new builtin string function out of this.



On 12/6/2011 2:38 PM, Aharon Robbins wrote:
Hi Ed.

Re the below. See the node "Cut Program" in the gawk manual.
Adaptation into a function should be straightforward.

A diff to redo that section as a function + code to call it will
be cheerily reviewed and most likely accepted.



Date: Fri, 02 Dec 2011 07:36:50 -0600
From: Ed Morton<address@hidden>
To: address@hidden
Subject: [bug-gawk] Gawk Enhancement Suggestion

Arnold - could we get a "cut()" function for gawk similar to the UNIX
one of the same name that just lets people select specific fields or
ranges of fields?  Below is some details (copied from something I posted
at comp.lang.awk).



A function that returns a string of selected fields is very often all we need. e.g.:

print cut($0,3)       # print from the 3rd field to the end of the record using
FS as separator
print cut($0,3,",")   # print from the 3rd field to the end of the record using
"," as separator
print cut($0,"3-7")   # print from the 3rd field to the 7th field
print cut($0,"3,5,7") # print the 3rd, 5th, and 7th fields

The separating substring between the fields would just be whatever separator
preceeded the trailing fields. So, with this input (with leading blanks):

      a b    c  d

the following code snippets would produce the output that follows them:

print cut($0,"1,3")
a    c

print cut($0,"1,3",/[[:blank:]]+/)

print cut($0,"1-3")
a b    c

print cut($0,"2,4")
b  d

print cut($0,"1-4")
a b    c  d

Obviously $0 can be replaced by any string and cut() should take an optional 4th
arg to specify the separator in it's returned string:

print cut($0,"1,3",FS,OFS)
a c

print cut($0,"1,3",/[[:blank:]]+/,"#")

print cut($0,"1-3",FS,OFS)
a b c

print cut($0,"2,4",FS,":")

print cut($0,"1-4",FS,"|")

reply via email to

[Prev in Thread] Current Thread [Next in Thread]