bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Field splitting in gawk


From: Aharon Robbins
Subject: Re: [bug-gawk] Field splitting in gawk
Date: Wed, 01 Jan 2014 22:25:31 +0200
User-agent: Heirloom mailx 12.5 6/20/10

Hello.

I'm going to address both your mails in one reply.

> Date: Mon, 30 Dec 2013 11:53:06 +0100
> From: address@hidden
> To: address@hidden
> Subject: [bug-gawk] Field splitting in gawk
>
> In some cases the splitting into fields is not necessary,

As has been pointed out, gawk does lazy field splitting.
It only splits the record as far as necessary.

> It would be nice to have a feature like setting
> FIELDWIDTHS="0"
> to mean no field splitting is done

This isn't needed.

> and
> $0==$1
> and
> NF=1

As Andy pointed out, using FS = "\n" is a portable and easy
way to achieve this.

> The purpose of this suggested feature is to improve efficiency.

In general, if you're worried that something in gawk isn't efficient,
you need to be able to provide data proving your contention. This would
be done by compiling gawk for profiling and then running it with an awk
program and data that highlight the problem. You could then submit
a bug report with the program, data, and results from gprof.

> Date: Tue, 31 Dec 2013 07:40:28 +0100
> From: address@hidden
> To: address@hidden
> Subject: [bug-gawk] Field separators in awk
>
> Hi,
>
> there is a builtin variable RT (  
> http://www.gnu.org/software/gawk/manual/gawk.html#Auto_002dset ) that  
> contains the matched text by RS, but there is no similar variable for  
> FS.

This is correct, and on purpose. The reason is that setting such an
(array) variable upon every record read would be a big performance hit.

> In Gnu Awk version 4, the split() function does contain a forth input  
> argument "seps" that gives access to the matched text by FS.

Exactly to make this facility available, but only to those who wish
to actually use it, and and are thus willing to incur the cost.

> However, it seems inefficient to call split() on $0 just to obtain the  
> matched text. (Since the field splitting has already been done by awk).

But the field splitting has not already been done, as explained earlier.

To sum up, I see no reason to change gawk's current behavior.

> Best regards,
> Haakon Haegland

Thanks,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]