help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Insertion of extra OFS character into output string


From: Neil R. Ormos
Subject: Re: Insertion of extra OFS character into output string
Date: Mon, 13 Mar 2023 20:58:30 -0500 (CDT)
User-agent: Alpine 2.20 (DEB 67 2015-01-07)

H wrote:
> "Neil R. Ormos" wrote:
>> H wrote:

>>> I am a newcomer to awk and have run into an
>>> issue I have not figured out yet... My
>>> platform is CentOS 7 running awk 4.0.2, the
>>> default version. [...]

>> I don't have 4.0.2 available to test, but I
>> tested with older and newer versions.

>> When I test, I get the result I think I expect
>> from the code you posted. [...]

>> It would be easier to help if you would please provide:
>> the simplest input line that reproduces the problem;
>> the output you expect; and
>> the output you are getting.

> I am not on my computer but typing this on my
> phone. With that caveat, a /minimal/ example
> would be:

> echo "Alpha,Beta,Charlie,Delta" | awk 'BEGIN{FS=","; 
> FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}'

> I would expect to see:
> Alpha<TAB>Beta<TAB>Charlie<TAB>Delta
> but instead see
> Alpha<TAB><TAB>Beta<TAB>Charlie<TAB>Delta

> If you change $1=$1 to $2=$2 you will find that the extra tab character then 
> moves to the next field.

> Can anyone try this with the most recent version of awk?

I tested with four versions of Gawk:
  GNU Awk 3.1.7
  GNU Awk 4.1.1
  GNU Awk 4.1.4
  GNU Awk 5.2.0

and among those versions was able to reproduce the behavior that is vexing you 
only in version 4.1.1.  

It appears that issue was fixed no later than version 4.1.4.  

Version 5.2.0 is fairly recent but not the latest, and, in any case, does not 
exhibit the problem you have experienced.

> I believe I had also tried without the
> definition of FS with the same result.  Finally,
> note that the FPAT expression comes from the awk
> documentation and is thus expected to work.

I wasn't saying that setting FS was causing the problem.  Just that setting FS 
would be overridden by the subsequent setting of FPAT.

========================================

gawk --version | head -1
GNU Awk 3.1.7

echo "Alpha,Beta,Charlie,Delta" | gawk 'BEGIN{FS=","; 
FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}' | hexdump 
$hexdumparg:q
     0      0  | 41 6c 70 68 61 09 42 65 | 065 108 112 104 097 009 066 101 |   
A   l   p   h   a  \t   B   e
     8      8  | 74 61 09 43 68 61 72 6c | 116 097 009 067 104 097 114 108 |   
t   a  \t   C   h   a   r   l
    10     16  | 69 65 09 44 65 6c 74 61 | 105 101 009 068 101 108 116 097 |   
i   e  \t   D   e   l   t   a
    18     24  | 0a                      | 010                             |  
\n                            

========================================

gawk --version | head -1
GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.2-p3, GNU MP 6.0.0)

echo "Alpha,Beta,Charlie,Delta" | gawk 'BEGIN{FS=","; 
FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}' | hexdump 
$hexdumparg:q
     0      0  | 41 6c 70 68 61 09 09 42 | 065 108 112 104 097 009 009 066 |   
A   l   p   h   a  \t  \t   B
     8      8  | 65 74 61 09 43 68 61 72 | 101 116 097 009 067 104 097 114 |   
e   t   a  \t   C   h   a   r
    10     16  | 6c 69 65 09 44 65 6c 74 | 108 105 101 009 068 101 108 116 |   
l   i   e  \t   D   e   l   t
    18     24  | 61 0a                   | 097 010                         |   
a  \n                        

========================================

gawk --version | head -1
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.2)

echo "Alpha,Beta,Charlie,Delta" | gawk 'BEGIN{FS=","; 
FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}' | hexdump 
$hexdumparg:q
     0      0  | 41 6c 70 68 61 09 42 65 | 065 108 112 104 097 009 066 101 |   
A   l   p   h   a  \t   B   e
     8      8  | 74 61 09 43 68 61 72 6c | 116 097 009 067 104 097 114 108 |   
t   a  \t   C   h   a   r   l
    10     16  | 69 65 09 44 65 6c 74 61 | 105 101 009 068 101 108 116 097 |   
i   e  \t   D   e   l   t   a
    18     24  | 0a                      | 010                             |  
\n                            

========================================

gawk --version | head -1
GNU Awk 5.2.0, API 3.2, PMA Avon 7, (GNU MPFR 3.1.5, GNU MP 6.1.2)

echo "Alpha,Beta,Charlie,Delta" | gawk 'BEGIN{FS=","; 
FPAT="([^,]*)|(\"[^\"]+\")"; OFS="\t"} {$1=$1; gsub(/"/, ""); print}' | hexdump 
$hexdumparg:q
     0      0  | 41 6c 70 68 61 09 42 65 | 065 108 112 104 097 009 066 101 |   
A   l   p   h   a  \t   B   e
     8      8  | 74 61 09 43 68 61 72 6c | 116 097 009 067 104 097 114 108 |   
t   a  \t   C   h   a   r   l
    10     16  | 69 65 09 44 65 6c 74 61 | 105 101 009 068 101 108 116 097 |   
i   e  \t   D   e   l   t   a
    18     24  | 0a                      | 010                             |  
\n                            

========================================



reply via email to

[Prev in Thread] Current Thread [Next in Thread]