bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Misunderstood, bug or limitation of indexing ENVIRON with "\\1" in g


From: arnold
Subject: Re: Misunderstood, bug or limitation of indexing ENVIRON with "\\1" in gensub() ?
Date: Mon, 06 Apr 2020 07:20:33 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Gawk goes well beyond POSIX in many ways, but there's no intent to
go any further.

So your answer is correct, there won't be an eval facility or
similar added.  Anyone who really needs that should use the shell,
or perl, or something else that provides it.

Thanks,

Arnold

Wolfgang Laun <address@hidden> wrote:

> Well, I'm not a developer, but I understand that gawk is not intended to go
> far beyond what POSIX has standardized.
>
> String interpolation is merely syntactic sugar, as you can (given variables
> "is" and "interpolated" even now write
>    "This " is " an " interpolated " string."
>
> Implementation of a workalike to the Shell's or Perl's eval() function is
> an entirely different thing. Incremental compilation requires that the
> runtime is capable of arbitrarily switching between parsing and executing.
> Nobody will want to open this can of worms.
>
> Wolfgang
>
>
> On Mon, 6 Apr 2020 at 14:22, Vincent Férotin <address@hidden>
> wrote:
>
> > Hey Wolfgang, thank you very much for the detailed answer!
> >
> > You perfectly understand my needs and I greatly appreciated your
> > solutions proposals. :-)
> >
> > Beyond my little and rather anecdotal needs, and I understand that awk
> > in its current state
> > does not works as I previously expected, one minor intend of my previous
> > message
> > to bug-gawk mailing-list was to ask you, developers, if eventually
> > such a feature is (or is not) desirable
> > for a future version of awk? That is, should a-future-awk could do
> > string interpolation
> > and in-place evaluation, and interprets all "\\1" occurrences in a
> > g(en)sub context
> > than sole replacement string?
> >
> > Anyway, thanks again!
> >
> > V.F.
> >
> >
> >
> > Le sam. 4 avr. 2020 à 06:41, Wolfgang Laun <address@hidden> a
> > écrit :
> > >
> > >
> > > If I understand everything correctly, you are trying to replace some
> > >    %abc%
> > > in an input line by the value of the environment variable abc.
> > >
> > > This cannot be done using a single gsub, because the backref \\1 only
> > works within a string literal that is to be the complete replacement text.
> > What you need is an additional evaluation, of the expression "ENVIRON["
> > "\\1"  "]", to be inserted in place of the %-% placeholder. If awk had
> > eval, you could write:
> > >
> > >      print gensub(/%([_A-Z]+)%/, eval("ENVIRON[\"\\1\"]"), "g")   # this
> > is not awk
> > >
> > > You might use Perl, where substitution (s///) has a flag 'e', requesting
> > the replacement to be evaluated as an expression to become the text to be
> > inserted, i.e., an implied eval.
> > >
> > > echo 'repository=%MY_URL%  # %COMMENT%'  | \
> > >    COMMENT="Set repository URL" MY_URL="http://www.example.com"; \
> > >    perl -e 'while( <> ){s/%([_A-Z]+)%/$ENV{$1}/ge; print;}'
> > >
> > > An awk version requires a user-defined function:
> > > function envsub(text,  chunk){
> > >     while( match(text, /([^%]*)%([_A-Z]+)%(.*)/, chunk) != 0 ){
> > >        sub( "%"chunk[2]"%", ENVIRON[chunk[2]], text )
> > >     }
> > >     return text;
> > > }
> > > { print envsub($0) }
> > >
> > > Wolfgang
> > >
> > >
> > > On Fri, 3 Apr 2020 at 18:35, Vincent Férotin <address@hidden>
> > wrote:
> > >>
> > >> Hi gawk maintainers!
> > >>
> > >> New to awk/gawk/mawk, I'd like to describe here what could possibly be
> > a bug,
> > >> at least a limitation, I encountered in these tools for my basic usage.
> > >> Perhaps what follows is not a bug but a miscomprehension of
> > me-as-newbee?
> > >> Anyway, thanks in advance for reading this...
> > >>
> > >> V.F.
> > >>
> > >>
> > >> TL;DR
> > >> =====
> > >>
> > >> Using [gm]awk as a templating/macro engine, following shell commands
> > >> do not output what could be expected:
> > >>
> > >>     $ echo "repository=%MY_URL%  # %COMMENT%" |COMMENT="Set repository
> > >> URL" MY_URL="http://www.example.com"; awk '{print gensub(/%([_A-Z]+)%/,
> > >> ENVIRON["\\1"], "g")}'
> > >>     repository=  #
> > >>
> > >> or roughly equivalent:
> > >>
> > >>     $ echo "repository=MY_URL  # COMMENT" |COMMENT="Set repository
> > >> URL" MY_URL="http://www.example.com"; awk '{gsub(/[_A-Z]+/,
> > >> ENVIRON["&"]); print $0}'
> > >>     repository=  #
> > >>
> > >> It seems that "\\1" of gensub() (or "&" for gsub()) is not well escaped
> > >> with content providing from what regexp. captured, at least in the
> > context
> > >> of indexing ENVIRON. Expected output should be, IMHO and as far as I
> > understand:
> > >>
> > >>     repository=http://www.example.com  # Set repository URL
> > >>
> > >>
> > >> Versions tested
> > >> ===============
> > >>
> > >> * gawk:
> > >>   - 4.1.4 (Ubuntu 18.04 Bionic)
> > >>   - 4.2.1 (Ubuntu 19.10 Eoan)
> > >>   - 5.0.1 (Ubuntu 20.04 Focal)
> > >> * mawk:
> > >>   - 3.3 (Ubuntu 18.04 Bionic & 19.10 Eoan)
> > >>   - 3.4.20200120 (Ubuntu 20.04 Focal)
> > >>
> > >>
> > >> Usage
> > >> =====
> > >>
> > >> In order to provision some virtual machine with Bash scripts,
> > >> I used 'sed' for replacing some paths (string) or
> > >> configuration file contents, but fail for some usages, where replaced
> > string
> > >> contains some chars. 'sed' could interpret as metachars (such as "/").
> > >>
> > >> I then tried using 'm4', where effective values to replace placeholders
> > are
> > >> available as environment variables.
> > >> But Debian/Ubuntu packaging seems to have some limitations, notably by
> > disabling
> > >> '-W, --word-regexp=REGEXP' option (expected to allow setting
> > >> placeholder regexp.,
> > >> for e.g. "%([_A-Z]+)%").
> > >> Using m4 as is, with its available configuration as chosen by
> > >> packaging maintainers,
> > >> is feasible:
> > >>
> > >>     $ echo "changecom\nrepository=MY_URL  # COMMENT" | m4
> > >> -DMY_URL="$MY_URL" -DCOMMENT="$COMMENT"
> > >>
> > >>     repository=http://www.example.com  # Set repository URL
> > >>
> > >> but I miss choosing a more robust placeholder delimiters
> > >> (I started here by pre- and suffixing them by "%",
> > >> but I also could have chosen an other format, such as the more common
> > "${var}").
> > >>
> > >> It seems that this need still exists outside my sole and naïve usage,
> > >> see for example:
> > >> -
> > https://stackoverflow.com/questions/415677/how-to-replace-placeholders-in-a-text-file
> > >> -
> > https://stackoverflow.com/questions/2914220/bash-templating-how-to-build-configuration-files-from-templates-with-bash
> > >>
> > >> Note that, outside an alone answer (over a total of 40 (16+24 at time
> > >> of this writing)):
> > >> -
> > https://stackoverflow.com/questions/2914220/bash-templating-how-to-build-configuration-files-from-templates-with-bash#answer-9590655
> > >> no valid answer use awk or one of its derivates!
> > >> (NB: This specific answer could probably suffice for my needs...)
> > >>
> > >>
> > >> Evidences that `gensub(..., ENVIRON["\\1"])` should work
> > >> ========================================================
> > >>
> > >> Using "\\1" in gensub() is well escaped:
> > >>
> > >>     $ echo "repository=%MY_URL%  # %COMMENT%" | awk '{print
> > >> gensub(/%([_A-Z]+)%/, "( \\1 )", "g")}'
> > >>     repository=( MY_URL )  # ( COMMENT )
> > >>
> > >> Passing directly desired var. name to ENVIRON also works:
> > >>
> > >>     $ echo "repository=%MY_URL%" |MY_URL="http://www.example.com"; awk
> > >> '{print gensub(/%MY_URL%/, ENVIRON["MY_URL"], "g")}'
> > >>     repository=http://www.example.com
> > >>
> > >>
> > >> `ENVIRON` seems to not accept other expressions as index
> > >> ========================================================
> > >>
> > >> Note also that trying to re-write awk script provided by above
> > >> StackOverflow answer
> > >> described in
> > https://stackoverflow.com/questions/2914220/bash-templating-how-to-build-configuration-files-from-templates-with-bash#answer-9590655
> > >> that is:
> > >>
> > >>     'match($0, "[$]{.*}") {var = substr($0, (RSTART + 2), (RLENGTH -
> > >> 3)); gsub("[$]{"var"}", ENVIRON[var])}1'
> > >>
> > >> into more condensed and adapted to my use case:
> > >>
> > >>     '{gensub(/%([_A-Z]+)%/, ENVIRON[substr("\\1", 1, (length("\\1") -
> > >> 2))])}'  # gawk
> > >>     '{gsub(/%[_A-Z]+%/, ENVIRON[substr("&", 1, (length("&") - 1))]);
> > >> print $0}'  # mawk
> > >>
> > >> does not work either.
> > >>
> > >>
> > >> Search for previous existing occurrences of `gensub(...,
> > ENVIRON["\\1"])`
> > >> ========================================================================
> > >>
> > >> No occurrence of ``ENVIRON[`` with other type of index than plain
> > >> string or variable
> > >> were found in:
> > >>
> > >> * `sed and awk Pocket Reference` by Arnold Robbins (O'Reilly, 2002, 2nd
> > ed.)
> > >>     http://shop.oreilly.com/product/9780596003524.do
> > >> * `sed & awk` by Dale Dougherty & Arnold Robbins (O'Reilly, 1997, 2nd
> > ed.)
> > >>     http://shop.oreilly.com/product/9781565922259.do
> > >> * `Effective awk Programming` by Arnold Robbins (O'Reilly, 2015, 4th
> > ed.)
> > >>     http://shop.oreilly.com/product/0636920033820.do
> > >> * `GNU awk - awesome one-liners` by Sundeep Agarwal (version 0.7)
> > >>     https://learnbyexample.github.io/books/
> > >>     (pointed recently in HackerNews:
> > >> https://news.ycombinator.com/item?id=22758217 )
> > >> * `bug-gawk` archives
> > >>       https://lists.gnu.org/archive/html/bug-gawk/
> > >>
> >



reply via email to

[Prev in Thread] Current Thread [Next in Thread]