RE: help with regular expression formation

Bob, thanks (yet again) for the info. A few comments:

I'm using GNU grep 2.5.1 compiled to run on Windows - OS is XP SP2. I do not have sed, nor do I know the syntax if I were to obtain a version. I would expect that this grep is PCRE compatible.

The main thing, though, is that you've pointed out the problem - that the output of the first grep has the filename. Silly me! I forgot about that. I'm sure I can work it from there, given that and the additional good info you've provided. (I'm not working today and just browsing email, so I can't dig into the details of the REs you sent. I'll study the details later.)

Regards, Mickey

From: Bob Proulx [mailto:bob@proulx.com]
Sent: Mon 3/31/2008 8:24 PM
To: Mickey Ferguson
Cc: help-gnu-utils@gnu.org
Subject: Re: help with regular _expression_ formation

Mickey Ferguson wrote:
> I'm generating output from a grep command, which I then want to process in
> grep again, filtering out my unwanted text. In this specific example, I
> want to filter out all lines that start with zero or more white space,
> followed by the comment characters "//". Here is what I thought I would
> use:
>
> grep StopProductServices *.rul *.h | grep ^\s*[^/]

Unless you have a grep that is using PCRE (perl compatible regular
expressions) then the above has three problems. One is that \s is a
PCRE space pattern but not a normal regular _expression_. Two is that
the * and brackets are shell metacharacters. Those would need to be
quoted to protect them from shell expansion. Three is that pattern
doesn't do what you want. Try this:

grep -v "^[[:space:]]*//"

The "[[:space:]]*" is a little long but is POSIX standard making it
preferred these days. The old way was " *".

> The first grep obviously finds all occurrences of StopProductServices within
> all *.rul and *.h files. Then that output is piped into grep, with the

Having two grep's in a pipeline works but usually the character I/O
between them is slower than combining them. This is especially true
on MS where spawning multiple processes is exceptionally slow. Plus
what you are doing is more suitable for sed than grep because sed will
report an error code if there is an error. Grep reports whether there
was a match. So in this case I would use sed and combine the
operations. Plus I would attack the comment problem differently. I
would simply remove them from the pattern space and remove any
whitespace ahead of it. Try this:

sed -n "s|[[:space:]]*//.*||;/StopProductServices/p" *.rul *.h

> To break it down a little, I first produced the output from the first grep,
> which is used for the pipe:

You have to be extra charful about grep's into grep's. Let me point
out why:

> ->grep StopProductServices *.rul *.h
> NTService.rul:// FUNCTION: StopProductServices(sProduct)
> NTService.rul:// 03/24/08 MSF - Make StopProductServices() take an

So far so good. But...

> Then I ran the full command, and you can see that the output is not at all
> what I expected:
>
> [11:06:39]: *** C:\WIP\VESTA\Installer\Script Files ***
> ->grep StopProductServices *.rul *.h | grep ^\s*[^/]
> NTService.rul:// FUNCTION: StopProductServices(sProduct)
> NTService.rul:// 03/24/08 MSF - Make StopProductServices() take an

Here we see the problem. The first grep read multiple input files.
Therefore it printed out the name of the input file as a prefix to the
pattern. The second grep's "^" will anchor on the filename and not
the original line. You would need to add -h to the first grep's
option list to suppress including the filename. Second you would need
to fix the pattern and quote the pattern.

grep -h StopProductServices *.rul *.h | grep -v "^[[:space:]]*//"

But I still recommend using sed. I just wanted to comment about grep
into grep including the filename.

Hope that helps,
Bob

From:	Mickey Ferguson
Subject:	RE: help with regular expression formation
Date:	Tue, 1 Apr 2008 10:29:57 -0700