grep-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Grep-devel] proposed new function for dfa


From: arnold
Subject: Re: [Grep-devel] proposed new function for dfa
Date: Thu, 24 Nov 2016 01:31:08 -0700
User-agent: Heirloom mailx 12.4 7/29/08

Hi Paul.

In terms of your technical questions, I don't know; I don't understand
the struct dfa so I treat it as a black box and copy code based on
pattern matching. It usually works to do so, but not always.

(Side question; why doesn't dfaalloc just memset the new struct dfa to
zero immediately?  That would be simplest, no?)

Your proposal is more or less what I need. The d1 would be kept around
for the life of the program and used over and over again.

The main thing is that gawk can compile a regex up to twice - once
ignoring case and once paying attention to it. The new dfasyntax struct
and routines need to allow for that; if the case flag only goes into
dfacomp then I haven't been saved as much trouble as I would like.

Thanks,

Arnold

Paul Eggert <address@hidden> wrote:

> On 11/22/2016 11:56 PM, address@hidden wrote:
> > I did some profiling and in this case
> > dfasyntax was using a huge percentage of CPU time. I could try to
> > reproduce it if I really need to convince you.
>
> Sounds like too much work for too little benefit. Unfortunately I'm 
> still confused. If Awk needs to reuse a DFA's syntax, presumably the use 
> case would be like this:
>
>     struct dfa *d1 = dfalloc (), *d2 = dfaalloc ();
>     dfasyntax (d1, ...);
>     for (...)
>       {
>         dfacopysyntax (d2, d1);
>         dfacomp (..., d2, ...);
>         dfaexec (d2, ...);
>       }
>     dfafree (d1); dfafree (d2); free (d1); free (d2)
>
> I don't see how this could work if dfacopysyntax doesn't clear the 
> non-syntax part of the struct dfa, and yet you indicated that it's OK to 
> remove that memset.
>
> Also, I'm not following how the copied syntax can start with the same 
> value for canychar as the original. Aren't these indexes into uncopied data?
>
> More generally, I'm worried that the copied and the original objects 
> might share pointers, and that freeing one might harm the other. Is that 
> not possible?
>
>
> Another thought. The intent is that there be a fast way to do what you 
> want, and clearly the current dfa.c doesn't do that. How about the 
> following idea instead?
>
> * Add an opaque type 'struct dfasyntax' to dfa.h.
>
> * dfacomp accepts a new 'struct dfasyntax *' arg, specifying the syntax.
>
> * dfasyntax no longer accepts a 'struct dfa *' arg.  Instead, it returns 
> a pointer of type 'struct dfasyntax *' to a newly allocated opaque 
> object O that contains all the info that dfasyntax currently computes.  
> O can be freed via 'free'; this shouldn't happen as long as anyone is 
> using a struct dfa object for which dfacomp has specified O.
>
> The goal is to give Awk a way to efficiently compute the syntax just 
> once, while better handling the abovementioned storage-allocation and 
> canychar issues. The use case would be like this:
>
>     struct dfa *d = dfalloc ();
>     struct dfasyntax *s = dfasyntax (...);
>     for (...)
>       {
>         dfacomp (..., d, s, ...);
>         dfaexec (d, ...);
>       }
>     dfafree (d); free (d); free (s);



reply via email to

[Prev in Thread] Current Thread [Next in Thread]