Bob, thanks (yet again) for
the info. A few comments:
I'm using GNU grep 2.5.1
compiled to run on Windows - OS is XP SP2. I do not have sed, nor do I
know the syntax if I were to obtain a version. I would expect that this
grep is PCRE compatible.
The main thing, though, is
that you've pointed out the problem - that the output of the first grep has the
filename. Silly me! I forgot about that. I'm sure I can work
it from there, given that and the additional good info you've provided.
(I'm not working today and just browsing email, so I can't dig into the details
of the REs you sent. I'll study the details later.)
Regards, Mickey
From: Bob Proulx
[mailto:bob@proulx.com]
Sent: Mon 3/31/2008 8:24 PM
To:
Mickey Ferguson
Cc: help-gnu-utils@gnu.org
Subject: Re: help
with regular _expression_ formation
Mickey Ferguson wrote:
> I'm generating output from a grep
command, which I then want to process in
> grep again, filtering out my
unwanted text. In this specific example, I
> want to filter out all
lines that start with zero or more white space,
> followed by the comment
characters "//". Here is what I thought I would
>
use:
>
> grep StopProductServices *.rul *.h | grep
^\s*[^/]
Unless you have a grep that is using PCRE (perl compatible
regular
expressions) then the above has three problems. One is that \s
is a
PCRE space pattern but not a normal regular _expression_. Two is
that
the * and brackets are shell metacharacters. Those would need to
be
quoted to protect them from shell expansion. Three is that
pattern
doesn't do what you want. Try this:
grep -v
"^[[:space:]]*//"
The "[[:space:]]*" is a little long but is POSIX
standard making it
preferred these days. The old way was "
*".
> The first grep obviously finds all occurrences of
StopProductServices within
> all *.rul and *.h files. Then that
output is piped into grep, with the
Having two grep's in a pipeline works
but usually the character I/O
between them is slower than combining
them. This is especially true
on MS where spawning multiple processes
is exceptionally slow. Plus
what you are doing is more suitable for sed
than grep because sed will
report an error code if there is an error.
Grep reports whether there
was a match. So in this case I would use sed
and combine the
operations. Plus I would attack the comment problem
differently. I
would simply remove them from the pattern space and
remove any
whitespace ahead of it. Try this:
sed -n
"s|[[:space:]]*//.*||;/StopProductServices/p" *.rul *.h
> To break it
down a little, I first produced the output from the first grep,
> which is
used for the pipe:
You have to be extra charful about grep's into
grep's. Let me point
out why:
> ->grep StopProductServices
*.rul *.h
> NTService.rul:// FUNCTION:
StopProductServices(sProduct)
> NTService.rul:// 03/24/08 MSF - Make
StopProductServices() take an
So far so good. But...
>
Then I ran the full command, and you can see that the output is not at
all
> what I expected:
>
> [11:06:39]: ***
C:\WIP\VESTA\Installer\Script Files ***
> ->grep StopProductServices
*.rul *.h | grep ^\s*[^/]
> NTService.rul://
FUNCTION: StopProductServices(sProduct)
> NTService.rul:// 03/24/08
MSF - Make StopProductServices() take an
Here we see the problem.
The first grep read multiple input files.
Therefore it printed out the name
of the input file as a prefix to the
pattern. The second grep's "^"
will anchor on the filename and not
the original line. You would need
to add -h to the first grep's
option list to suppress including the
filename. Second you would need
to fix the pattern and quote the
pattern.
grep -h StopProductServices *.rul *.h | grep -v
"^[[:space:]]*//"
But I still recommend using sed. I just wanted to
comment about grep
into grep including the filename.
Hope that
helps,
Bob