[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Preventing matches in regular expressions

From: Tom Lord
Subject: Re: [Gnu-arch-users] Preventing matches in regular expressions
Date: Wed, 11 Aug 2004 10:43:25 -0700 (PDT)

    > From: Aaron Bentley <address@hidden>

    > That bears a striking resemblance to the inventory optimization work I 
    > did with the cut operator.  This might work nicely also:

    > R1[[:cut 3:]]|R2[[:cut 1:]]|R3[:cut 3:]

    > As far as I can tell, reusing cut numbers is legal, and we could then 
    > assign 1 to junk, 2 to backup, 3 to precious, etc.

I don't think that that should work.  The cut operator doesn't
strictly follow Posix's leftmost-longest rules (which is a large part
of why it's so useful).

If a regexp can match more than one way, different matches ending with
different `cut' labels; *and* if the user has not otherwise requested
information about *how* the regexp matched (i.e., the regexp doesn't
contain backreferences and the user hasn't asked for submatch
positions) then the cut label returned shouldn't be based on the
leftmost-longest rules, but rather, (the current rule is), the
smallest magnitude cut label is returned.

In other words, if you match:

        .f[[:cut 4:]]|i.[[:cut 2:]]

against the string


even though the leftmost-longest rule suggests the final label should
be 4, in fact it's 2 since that is a possible match and is the smaller
cut label.

If you map categories to a fixed set of cut labels, then
categorizations will always have a fixed priority (e.g., "backup"
always checked before "source", or whatever).

    > > So, for your fai feature, you could:
    > >   Propose a new alternative to =tagging-method files (or new 
    > >   syntax to go inside of them), allowing for directives 
    > >   in which the ordering of directives is significant.

    > Okay, I'll consider that.  The current semantics are already very close 
    > to what I want, because .arch-inventory tends to contain specific-file 
    > regexes, while =tagging-method tends to contain file-class regexes.

Right.  That was the idea.

    > Since I'm trying to automate things, I've been trying to look at every 
    > case, which is why I was looking for a way to handle cases when the 
    > regexes appear in the same files.

Currently the `inventory' categorization model is, in essense, a very
specific `find(1)' expression.  Reinterpreting the categorization
model as a `lex'-like lexer specification is possible, desirable, and
only a small change (measured in modified LOCs).

    > >   Modify the inventory code to use the rx optimization. 
    > >   As a side effect, you'll probably make `tla inventory' faster.

    > Well, first we need to fix the cut operator handling.

Maybe my recollection is wrong.   I thought that the only issue,
really, was that the Posix code was conservatively deciding that 
it needed to use backtracking matching in some case where it doesn't
really have to.   I.e., wasn't it something like finding the right
line of code and changing a 0 to a 1 to turn on the optimization?

Rx is kind of hairy, to be sure.   I'm looking forward to rewriting it
xl someday :-)   But I think this particular change is pretty


reply via email to

[Prev in Thread] Current Thread [Next in Thread]