|
From: | Ed Morton |
Subject: | Re: [bug-gawk] Gawk Enhancement Suggestion |
Date: | Sun, 18 Dec 2011 11:09:08 -0600 |
User-agent: | Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20111105 Thunderbird/8.0 |
Ed. $ cat cut.awk function cut(inStr,tgtFldNrsStr,inSep,outSep, errMsg,numTgtFldNrs,tmpArr,tmpSepArr,sepArr,i,rangeStart, rangeEnd,fldNr,tgtFldNrsArr,numFlds,inArr,outStr,numSelFlds) { # Rationalize the input separator and input string to account for expected, # special-case behavior when the input separator is a single blank char, " ": inSep = (inSep == "" ? FS : inSep) if (inSep == " ") { gsub(/(^[[:space:]]+|[[:space:]]+$)/,"",inStr) inSep="[[:space:]]+" } # Create arrays of the fields and the separators between the fields: numFlds = split(inStr,inArr,inSep) for (fldNr=1; fldNr<=numFlds; fldNr++) { inStr = substr(inStr,length(inArr[fldNr])+1) match(inStr,inSep) sepArr[fldNr] = (outSep == "" ? substr(inStr,1,RLENGTH) : outSep) inStr = substr(inStr,RLENGTH+1) } # Create an array of the field numbers to be selected: numTgtFldNrs = split(tgtFldNrsStr,tmpArr,/[,-]/) split(tgtFldNrsStr,tmpSepArr,/[^,-]+/) for (i=1;i<=numTgtFldNrs;i++) { rangeStart = tmpArr[i] if (tmpSepArr[i+1] == "-") { if ( tmpArr[i+1] == "" ) { tmpArr[i+1] = numFlds } rangeEnd = tmpArr[++i] } else { rangeEnd = rangeStart } if ( (rangeStart ~ /^[^[:digit:]]+$/) && (rangeEnd ~ /^[^[:digit:]]+$/) ) { if (rangeStart > rangeEnd) { errMsg = "invalid decreasing range" } else if (rangeStart < 1) { errMsg = "fields and positions are numbered from 1" } } else { errMsg = "invalid field list" } if (errMsg) { print errMsg | "cat>&2" return "" } for (fldNr=rangeStart; fldNr<=rangeEnd; fldNr++) { tgtFldNrsArr[fldNr] } } # Form a string of the selected fields with their preceeding separators # (after the first field): for (fldNr=1; fldNr<=numFlds; fldNr++) { if (fldNr in tgtFldNrsArr) { outStr = (numSelFlds++ ? outStr sepArr[fldNr-1] : "") inArr[fldNr] } } return outStr } { print cut($0,flds,fs,ofs) } $ cat file a b c d $ $ awk -v flds="2,3" -f cut.awk file b c $ awk -v flds="1-3" -f cut.awk file a b c $ awk -v flds="2-" -f cut.awk file b c d $ awk -v flds="1,2-4" -v fs="[[:blank:]]+" -v ofs="#" -f cut.awk file #a#b#c On 12/14/2011 4:22 PM, Ed Morton wrote:
Arnold - I'll take a look and get back to you. I expect I can code up what I have in mind (which isn't quite what's in that example) as an awk function that people could copy/paste but I was really hoping to get a new builtin string function out of this.Thanks, Ed. On 12/6/2011 2:38 PM, Aharon Robbins wrote:Hi Ed. Re the below. See the node "Cut Program" in the gawk manual. Adaptation into a function should be straightforward. A diff to redo that section as a function + code to call it will be cheerily reviewed and most likely accepted. Thanks, ArnoldDate: Fri, 02 Dec 2011 07:36:50 -0600 From: Ed Morton<address@hidden> To: address@hidden Subject: [bug-gawk] Gawk Enhancement Suggestion Arnold - could we get a "cut()" function for gawk similar to the UNIX one of the same name that just lets people select specific fields or ranges of fields? Below is some details (copied from something I posted at comp.lang.awk). Regards, Ed.A function that returns a string of selected fields is very often all we need. e.g.:print cut($0,3) # print from the 3rd field to the end of the record using FS as separator print cut($0,3,",") # print from the 3rd field to the end of the record using "," as separator print cut($0,"3-7") # print from the 3rd field to the 7th field print cut($0,"3,5,7") # print the 3rd, 5th, and 7th fields The separating substring between the fields would just be whatever separator preceeded the trailing fields. So, with this input (with leading blanks): a b c d the following code snippets would produce the output that follows them: print cut($0,"1,3") a c print cut($0,"1,3",/[[:blank:]]+/) b print cut($0,"1-3") a b c print cut($0,"2,4") b d print cut($0,"1-4") a b c dObviously $0 can be replaced by any string and cut() should take an optional 4tharg to specify the separator in it's returned string: print cut($0,"1,3",FS,OFS) a c print cut($0,"1,3",/[[:blank:]]+/,"#") #b print cut($0,"1-3",FS,OFS) a b c print cut($0,"2,4",FS,":") b:d print cut($0,"1-4",FS,"|") a|b|c|d
[Prev in Thread] | Current Thread | [Next in Thread] |