[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Feature to add
From: |
Assaf Gordon |
Subject: |
Re: Feature to add |
Date: |
Thu, 19 Jul 2018 05:44:06 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
(adding sed-devel@ mailing list, please use reply-all to keep the thread
public and archived).
Hello Russell,
On 19/07/18 04:18 AM, Russell Harper wrote:
I'm not writing specifically about parsing floating point numbers or
factoring integers, these are just examples to illustrate. You can
substitute anything else instead.
What I'm proposing is an x flag for substitutions to indicate that the
substitution is obtained by running an executable and inserting its output.
's/<reg-exp>/<executable> <argument>*/x'
Some examples:
's/UUID/uuidgen/gx' # replaces instances of "UUID"
with output from uuidgen
's/([0-9]+)/factor \1/gx' # replaces integers with output
from factor <integer>
's~(http://[A-Za-z.]+)~wget \1~x' # replaces URL with output from
wget <URL>
's~([a-z]+)~./pluralize \1~gix' # custom utility to pluralize words
Currently there is no easy and robust way to do this in any of the core
utilities.
Thank you for expanding and explaining on your request.
This indeed seems like a specialized feature, perhaps a bit out of scope
for sed. GNU sed does have the "s///e" extension ("e" for "eval"),
but that runs a shell command on the entire pattern space once,
and not on every matched group as in your examples.
However Perl can easily do exactly what you ask for (and in a robust way).
First,
Perl's regex substitution also has an "e" flag, but it is more powerful
than sed's: it calls a perl function on every matched group.
In the following example, every number (matching the regex /(\d+)/ )
is transformed using perl's built-in hex() function:
$ echo 230 19 FOO 40 BAR 50 | perl -np -e 's/(\d+)/hex($1)/ge'
560 25 FOO 64 BAR 80
(That is: 0x230 is 560 in decimal, 0x19 is 25 in decimal, etc.).
Similarly,
we can define our own perl function to do any transformation we'd like.
The following example increments any matched number by 1:
$ echo 230 19 FOO 40 BAR 50 \
| perl -np -e 'sub f($) { return $_[0] + 1 ; }' \
-e 's/(\d+)/f($1)/ge'
231 20 FOO 41 BAR 51
Lastly,
Perl excels at text processing and evaluating external commands,
so we modify our function to execute "factor" on any matched
number:
$ echo 230 19 FOO 40 BAR 50 \
| perl -np -e 'sub f($) { return `factor $_[0]` ; }' \
-e 's/(\d+)/f($1)/ge'
230: 2 5 23
19: 19
FOO 40: 2 2 2 5
BAR 50: 2 5 5
And an example with UUID:
$ echo UUID FOO UUID BAR UUID \
| perl -np -e 'sub f($) { $t = `uuidgen` ; chomp $t ; $t }' \
-e 's/(UUID)/f($1)/ge'
4a64a434-73b2-47f9-985f-2eff776b981d FOO
fc7f3796-cfed-4850-a363-a70edfceee1b BAR
de65fe02-96fd-436e-ae2b-66127c438702
Of course,
when executing things like that on the shell, extra care must be taken
to ensure malicious input can't cause unintended consequences with shell
escaping tricks.
=======
As for adding a new feature to sed:
There is always a trade-off between adding more and more specialized
features to sed, and between using existing solution even if they are
a bit more verbose (i.e. my perl examples are much longer than the
hypothetical s///x sed feature).
I don't think we can/should modify sed's existing s///e flag (that would
break existing scripts), but we could perhaps consider adding a new
flag.
What do others think - is it worth it, or better just stick with perl ?
(Jim?)
The semantics of such flag must be carefully defined, e.g.
what's the interplay with grouping, with global flag, with other flags?
regards,
- assaf