grep-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Grep-devel] proposed new function for dfa


From: Paul Eggert
Subject: Re: [Grep-devel] proposed new function for dfa
Date: Wed, 23 Nov 2016 13:42:25 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0

On 11/22/2016 11:56 PM, address@hidden wrote:
I did some profiling and in this case
dfasyntax was using a huge percentage of CPU time. I could try to
reproduce it if I really need to convince you.

Sounds like too much work for too little benefit. Unfortunately I'm still confused. If Awk needs to reuse a DFA's syntax, presumably the use case would be like this:

   struct dfa *d1 = dfalloc (), *d2 = dfaalloc ();
   dfasyntax (d1, ...);
   for (...)
     {
       dfacopysyntax (d2, d1);
       dfacomp (..., d2, ...);
       dfaexec (d2, ...);
     }
   dfafree (d1); dfafree (d2); free (d1); free (d2)

I don't see how this could work if dfacopysyntax doesn't clear the non-syntax part of the struct dfa, and yet you indicated that it's OK to remove that memset.

Also, I'm not following how the copied syntax can start with the same value for canychar as the original. Aren't these indexes into uncopied data?

More generally, I'm worried that the copied and the original objects might share pointers, and that freeing one might harm the other. Is that not possible?


Another thought. The intent is that there be a fast way to do what you want, and clearly the current dfa.c doesn't do that. How about the following idea instead?

* Add an opaque type 'struct dfasyntax' to dfa.h.

* dfacomp accepts a new 'struct dfasyntax *' arg, specifying the syntax.

* dfasyntax no longer accepts a 'struct dfa *' arg. Instead, it returns a pointer of type 'struct dfasyntax *' to a newly allocated opaque object O that contains all the info that dfasyntax currently computes. O can be freed via 'free'; this shouldn't happen as long as anyone is using a struct dfa object for which dfacomp has specified O.

The goal is to give Awk a way to efficiently compute the syntax just once, while better handling the abovementioned storage-allocation and canychar issues. The use case would be like this:

   struct dfa *d = dfalloc ();
   struct dfasyntax *s = dfasyntax (...);
   for (...)
     {
       dfacomp (..., d, s, ...);
       dfaexec (d, ...);
     }
   dfafree (d); free (d); free (s);




reply via email to

[Prev in Thread] Current Thread [Next in Thread]