[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: using non-Emacs regexp syntax

From: Paul Pogonyshev
Subject: Re: using non-Emacs regexp syntax
Date: Sat, 2 Dec 2006 00:54:12 +0200
User-agent: KMail/1.7.2

Stuart D. Herring wrote:
> > If you don't mind, I'll work on it now.  Changes can be added to whatever
> > .el file in the distribution later.
> >
> > Also, is there sense in supporting conversion to and from several formats?
> > E.g. some require that plus operator is escaped, while everything else is
> > not.  E.g. something like this:
> >
> >     (convert-regexp :sed :emacs some-regexp)
> >                     FROM   TO   PATTERN-STRING
> >
> > Of course, it will add more complexity, but it shouldn't be much of a
> > problem for users of this function and implementing it in Lisp should
> > still
> > be not hard.
> I've already started on this sort of thing, writing a converter just
> between the two formats supported by GNU grep.  (These are
> "GNU-extended-basic-RE" and "extended-RE with backreferences".)  As it
> happens, that conversion can be done with one function because the formats
> are so similar.  I had planned to go on to the more general case, but for
> now I'll just provide what I have for comment and/or use.  (I have papers,
> so any use is fine.)  If, Paul, you'd like, we can collaborate on this, or
> one of us of your choice can go on with it.
> [...]

I will happily pass this to you if you wish.  I planned a more generic
implementation which can be briefly described as this:

* Each implemented format provides a table of associations
  construct-name -> construct-generator (some constructs,  like []
  character class, will require a parameter.)  In the simplest form,
  construct-generator can be just a fixed string, which will suffice in
  most cases.

* Each format also provides a parser that splits a regexp into a list
  of construct-name.

* Entry function (or a helper for it) combines together a table for
  output format and a parser for input format.  The result is a regexp
  in output format.

Maybe it is too slow, though.  However, given that Emacs lived happily
without this sort of function, it can hardly be too slow.  But maybe
you can come up with a simpler solution.

(One more thing: it probably makes sense to add conversion function
for replacement strings too.  E.g. some formats require $N, some
(like Emacs) use \N for referencing the matched group.)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]