[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#65416: Feature request: include first line of file in output
From: |
Daniel Green |
Subject: |
bug#65416: Feature request: include first line of file in output |
Date: |
Tue, 22 Aug 2023 22:12:25 -0400 |
I don't have access to a newer gawk where I did the initial timings, but I
ran an almost identical test on my home machine.
grep (v3.11): ~0.60s
perl (v5.38.0): ~3.21s
gawk (v4.0.2 built from source with `-O3 -march=native`): ~10.22s
gawk (v5.2.2 built from source with `-O3 -march=native`): ~4.95s
If grep will never add this functionality I'll survive, it just seemed like
it might not be too much work to implement, and would probably still be
much faster than using awk/perl. I've never looked at the grep source code
before, but could be tempted to try implementing it myself if there was any
chance of the path being accepted.
Dan
On Mon, Aug 21, 2023 at 2:37 PM <arnold@skeeve.com> wrote:
> Gawk 4.0.2 is 11 years old. Try timing the current version,
> I'll bet it's faster. And it solves your problem NOW,
> instead of waiting for a feature that the grep developers
> aren't likely to add.
>
> My two cents of course.
>
> Arnold
>
> Daniel Green <ddgreen@gmail.com> wrote:
>
> > That works, as well as the Perl version I've been using:
> >
> > perl -ne 'print if ($. == 1 || /pattern/)'
> >
> > But timings for a real-life example (3GB file with ~16m lines, CentOS 7)
> > show the problem:
> >
> > grep (v2.20): ~1.15s
> > perl (v5.36.1): ~4.48s
> > awk (v4.0.2): ~10.81s
> >
> > Admittedly grep is just searching in those timings, but I suspect it
> could
> > accomplish the full task with a minimal decrease in speed.
> >
> > Dan
> >
> > On Mon, Aug 21, 2023 at 12:57 PM <arnold@skeeve.com> wrote:
> >
> > > Daniel Green <ddgreen@gmail.com> wrote:
> > >
> > > > I'm frequently searching CSV files with 20-30 columns, and when
> there's a
> > > > hit it can be hard to know what the columns are. An option to also
> print
> > > > the first line of a file (either always, or only if that file had a
> match
> > > > to the pattern) in addition to any hits would be nice.
> > > >
> > > > Thanks,
> > > > Dan
> > >
> > > It sounds like awk would be a better tool:
> > >
> > > awk 'FNR == 1 || /pattern/' files ...
> > >
> > > should do the trick.
> > >
> > > HTH,
> > >
> > > Arnold
> > >
>
- bug#65416: Feature request: include first line of file in output, Daniel Green, 2023/08/21
- bug#65416: Feature request: include first line of file in output, arnold, 2023/08/21
- bug#65416: Feature request: include first line of file in output, Daniel Green, 2023/08/21
- bug#65416: Feature request: include first line of file in output, Paul Jackson, 2023/08/23
- bug#65416: Feature request: include first line of file in output, Paul Jackson, 2023/08/23
- bug#65416: Feature request: include first line of file in output, Daniel Green, 2023/08/23
- bug#65416: Feature request: include first line of file in output, Paul Jackson, 2023/08/23
- bug#65416: Feature request: include first line of file in output, Daniel Green, 2023/08/24
- bug#65416: Feature request: include first line of file in output, lacsaP Patatetom, 2023/08/29