[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: The best way to convert space separated text to TSV?

From: Manuel Collado
Subject: Re: The best way to convert space separated text to TSV?
Date: Tue, 11 Feb 2020 10:48:54 +0100
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0

El 11/02/2020 a las 4:42, Peng Yu escribió:

Many programs (such as wc and ps) print results in tables with one or
more spaces as separators. But the last column allows spaces in them.
To process the output of wc, I came up with the following code
(sometimes I need to manually change the display name such as
"file1"). But it is too verbose.

        OFS = "\t"
        for(i=1;i<ARGC;++i) {
                fnames[i] = ARGV[i]
        nfiles = ARGC - 1
        delete ARGV
        match($0, /^[ ]*/)
        line = substr($0, RSTART+RLENGTH)
        NF = 1
        for(i=1; i<=n; ++i) {
                if(match(line, /[ ]+/)) {
                        $i = substr(line, 1, RSTART-1)
                        line = substr(line, RSTART+RLENGTH)
        if(NR <= nfiles) {
                $i = fnames[NR]
        } else {
                if(line "") $i = line

$ awk -v n=2 -f ./wc.awk file1 <<EOF
  a bb c
aa  b c

$ awk -v n=3 -f ./wc.awk <<EOF
  a bb c

What is the most succinct way to convert such kind of input to TSV
format with gawk? Thanks.

Please try the following:

--- tabular1,awk ---
   nf = split($0, f, " ", s)
   offset = length(s[0])
   for (n=1; n<numcols; n++) {   # first numcols-1 columns
      offset = offset + length(f[n]) + length(s[n])
      printf("%s\t", f[n])
   print substr($0, offset)      # last column, with spaces

It can be invoked as:

wc xxxx | gawk -f tabular1.awk -v nomcols=4

Hope this helps.

Manuel Collado - http://lml.ls.fi.upm.es/~mcollado

reply via email to

[Prev in Thread] Current Thread [Next in Thread]