Re: regexp curly braces do not work

From: Eric Blake
Subject: Re: regexp curly braces do not work
Date: Thu, 30 Jul 2020 11:04:53 -0500
On 7/30/20 1:24 AM, Hyunho Cho wrote:
m4 (GNU M4) 1.4.18
Operating System: Ubuntu 20.04 LTS
Kernel: Linux 5.4.0-33-generic
Architecture: x86-64

sh$ m4 <<\@
patsubst(`hello', `ll', `XX')
patsubst(`hello', `l+', `XX')
patsubst(`hello', `l\{2\}', `XX')
patsubst(`hello', `l{2}', `XX')

Unfortunately correct for m4 1.4. The source code uses GNU regex.h, and does not bother to modify re_syntax or call re_set_syntax(), which means m4 defaults to RE_SYNTAX_EMACS (ie. all regex bits set to 0). Looking at the various bits that can be set for re_syntax,

/* If this bit is set, either \{...\} or {...} defines an
     interval, depending on RE_NO_BK_BRACES.
   If not set, \{, \}, {, and } are literals.  */

and since m4 isn't doing anything about that bit, no form of {} works.

At some point, a future m4 release will add the ability to set regex flavors (emacs, POSIX BRE, POSIX ERE, etc), and depending on the flavor chosen, either {} or \{\} will work. But until that release happens, you are stuck with the existing m4 being limited in expressive power.

