bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-grep] [bugs #12210] EGexecute() fails to find matches on (exact &&


From: anonymous
Subject: [bug-grep] [bugs #12210] EGexecute() fails to find matches on (exact && match_icase)
Date: Fri, 4 Mar 2005 20:27:58 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20040914 Firefox/0.10

Follow-up Comment #6, bugs #12210 (project grep):

I'm using latest (today's) CVS.
For a short example of a consequence look at bug #9768
I'm not considering wide char issues here.

Detailed description of what I see as the root problem (design problems
involving EGexecute with exact=1) follows. 

Disclaimer: I have studied the grep source quite hard, but only for a few
days and I am still struggling with dfa, kwset and regex internals. Please do
not shoot. This mail is long.

involved globals:
match_icase && only_matching

mainly involved functions:
grep.c::prline()
search.c::EGexecute() and called functions

command: echo Claudio | grep -io claudio 
output: claudio
expected output: Claudio

description:
after compiling the pattern (Gcompile), we reach grepfile(), then grep(),
then grepbuf().
In grepbuf() we have the call to execute (in our case EGexecute), called with
last parameter 0 (non exact), and on the buffer containing "Claudio". We are
looking for matching lines, but do not care about offset of the match in the
line yet.

Ok, EGexecute finds the match easily, because we succeed using the first
strat (kwset). The position of the matching line (0) in the buffer is
returned.

Back to grepbuf(), we now enter in prtext() since we found a match, and from
prtext() to prline().

Now the nasty prline, in all the cases involving match_icase  and either
color_option or only_matching, needs to search again in the string, this time
with exact=1 (we need the actual offset in the line), because we want to be
able to mark before|match|after for output purposes.

The nasty prline, for all cases about match_icase says:

if (match_icase)
{
/* cut */
    for (i = 0; i < lim - beg; i++)
      ibeg[i] = tolower (beg[i]);

      /* do_stuff involving execute() passing exact = 1
         calc ing on the NEW buffer,
         but also printing from the NEW buffer */
}

Now why is the buffer being converted to lower case before calling execute()
(EGexecute()) with exact = 1?

I think that the answer is, because EGexecute fails to consider icase when
passed with exact=1 and cases differ. If this is wanted, it looks very evil
to me.

So the hack has been written around it, which mostly works, but helps making
the prline function the blob it is, and causing the damage of messing the
output in the only_matching case.

I think that at first the search.c function EGexecute() exact=1 case should
be fixed, to make it work even in the match_icase situation without having to
tolower all the data before calling it.

I think that as long term goal, a new design for the exact case should be
found. Bug #11579 for example is involved, very difficult to fix (in a clean
way) with current exact case design imo.

I was hoping to help the grep community, so please if I missed something
obvious, help me understand.

Thanks

CLaudio


    _______________________________________________________

This item URL is:

  <http://savannah.gnu.org/bugs/?func=detailitem&item_id=12210>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]