[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#65416: Feature request: include first line of file in output
From: |
arnold |
Subject: |
bug#65416: Feature request: include first line of file in output |
Date: |
Tue, 22 Aug 2023 20:33:19 -0600 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
I can't speak for the grep guys, but at least I was correct that
current gawk is much faster than gawk 4.0.2.
Arnold
Daniel Green <ddgreen@gmail.com> wrote:
> I don't have access to a newer gawk where I did the initial timings, but I
> ran an almost identical test on my home machine.
>
> grep (v3.11): ~0.60s
> perl (v5.38.0): ~3.21s
> gawk (v4.0.2 built from source with `-O3 -march=native`): ~10.22s
> gawk (v5.2.2 built from source with `-O3 -march=native`): ~4.95s
>
> If grep will never add this functionality I'll survive, it just seemed like
> it might not be too much work to implement, and would probably still be
> much faster than using awk/perl. I've never looked at the grep source code
> before, but could be tempted to try implementing it myself if there was any
> chance of the path being accepted.
>
> Dan
>
> On Mon, Aug 21, 2023 at 2:37 PM <arnold@skeeve.com> wrote:
>
> > Gawk 4.0.2 is 11 years old. Try timing the current version,
> > I'll bet it's faster. And it solves your problem NOW,
> > instead of waiting for a feature that the grep developers
> > aren't likely to add.
> >
> > My two cents of course.
> >
> > Arnold
> >
> > Daniel Green <ddgreen@gmail.com> wrote:
> >
> > > That works, as well as the Perl version I've been using:
> > >
> > > perl -ne 'print if ($. == 1 || /pattern/)'
> > >
> > > But timings for a real-life example (3GB file with ~16m lines, CentOS 7)
> > > show the problem:
> > >
> > > grep (v2.20): ~1.15s
> > > perl (v5.36.1): ~4.48s
> > > awk (v4.0.2): ~10.81s
> > >
> > > Admittedly grep is just searching in those timings, but I suspect it
> > could
> > > accomplish the full task with a minimal decrease in speed.
> > >
> > > Dan
> > >
> > > On Mon, Aug 21, 2023 at 12:57 PM <arnold@skeeve.com> wrote:
> > >
> > > > Daniel Green <ddgreen@gmail.com> wrote:
> > > >
> > > > > I'm frequently searching CSV files with 20-30 columns, and when
> > there's a
> > > > > hit it can be hard to know what the columns are. An option to also
> > print
> > > > > the first line of a file (either always, or only if that file had a
> > match
> > > > > to the pattern) in addition to any hits would be nice.
> > > > >
> > > > > Thanks,
> > > > > Dan
> > > >
> > > > It sounds like awk would be a better tool:
> > > >
> > > > awk 'FNR == 1 || /pattern/' files ...
> > > >
> > > > should do the trick.
> > > >
> > > > HTH,
> > > >
> > > > Arnold
> > > >
> >
- bug#65416: Feature request: include first line of file in output, Daniel Green, 2023/08/21
- bug#65416: Feature request: include first line of file in output, arnold, 2023/08/21
- bug#65416: Feature request: include first line of file in output, Daniel Green, 2023/08/21
- bug#65416: Feature request: include first line of file in output, Paul Jackson, 2023/08/23
- bug#65416: Feature request: include first line of file in output, Paul Jackson, 2023/08/23
- bug#65416: Feature request: include first line of file in output, Daniel Green, 2023/08/23
- bug#65416: Feature request: include first line of file in output, Paul Jackson, 2023/08/23
- bug#65416: Feature request: include first line of file in output, Daniel Green, 2023/08/24
- bug#65416: Feature request: include first line of file in output, lacsaP Patatetom, 2023/08/29