Re: Feature to add

sed-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Feature to add

From:	Russell Harper
Subject:	Re: Feature to add
Date:	Thu, 19 Jul 2018 08:27:17 -0400

Thank you Assaf for the reply.

Just another note with existing solutions in sed, awk, and perl, they all
seem to break if you want to process matches including $ or & characters.
Examples:

* currency converter where you want to do a live conversion of USD amounts
like $2000 to Japanese yen
* wget of URLs with & in the path
* assembly language number converter for real hexadecimal $FFFF.FF and
octal &7777.77 to base 10

Basically in all the existing solutions, two levels of quoting are needed
just for regular alphanumerics, and there's no room to process arguments
with $ or &. A sed x flag for substitution would leave one level of quoting
to protect these arguments using single quotes:

"/(\$[0-9]+\(\.[0-9]+\)?)/latest-quote '\\1' JPY/x"

Regards,

Russell

On Thu, Jul 19, 2018 at 7:44 AM, Assaf Gordon <address@hidden> wrote:

> (adding sed-devel@ mailing list, please use reply-all to keep the thread
> public and archived).
>
>
> Hello Russell,
>
>
> On 19/07/18 04:18 AM, Russell Harper wrote:
>
>> I'm not writing specifically about parsing floating point numbers or
>> factoring integers, these are just examples to illustrate. You can
>> substitute anything else instead.
>>
>> What I'm proposing is an x flag for substitutions to indicate that the
>> substitution is obtained by running an executable and inserting its output.
>>
>>      's/<reg-exp>/<executable> <argument>*/x'
>>
>> Some examples:
>>
>>      's/UUID/uuidgen/gx'                # replaces instances of "UUID"
>> with output from uuidgen
>>      's/([0-9]+)/factor \1/gx'          # replaces integers with output
>> from factor <integer>
>>      's~(http://[A-Za-z.]+)~wget \1~x'  # replaces URL with output from
>> wget <URL>
>>      's~([a-z]+)~./pluralize \1~gix'    # custom utility to pluralize
>> words
>>
>> Currently there is no easy and robust way to do this in any of the core
>> utilities.
>>
>
> Thank you for expanding and explaining on your request.
>
> This indeed seems like a specialized feature, perhaps a bit out of scope
> for sed. GNU sed does have the "s///e" extension ("e" for "eval"),
> but that runs a shell command on the entire pattern space once,
> and not on every matched group as in your examples.
>
> However Perl can easily do exactly what you ask for (and in a robust way).
>
> First,
> Perl's regex substitution also has an "e" flag, but it is more powerful
> than sed's: it calls a perl function on every matched group.
>
> In the following example, every number (matching the regex /(\d+)/ )
> is transformed using perl's built-in hex() function:
>
>   $ echo 230 19 FOO 40 BAR 50 | perl -np -e 's/(\d+)/hex($1)/ge'
>   560 25 FOO 64 BAR 80
>
> (That is: 0x230 is 560 in decimal, 0x19 is 25 in decimal, etc.).
>
> Similarly,
> we can define our own perl function to do any transformation we'd like.
> The following example increments any matched number by 1:
>
>   $ echo 230 19 FOO 40 BAR 50 \
>         | perl -np -e 'sub f($) { return $_[0] + 1 ; }' \
>                    -e 's/(\d+)/f($1)/ge'
>   231 20 FOO 41 BAR 51
>
>
> Lastly,
> Perl excels at text processing and evaluating external commands,
> so we modify our function to execute "factor" on any matched
> number:
>
>   $ echo 230 19 FOO 40 BAR 50 \
>         | perl -np -e 'sub f($) { return `factor $_[0]` ; }' \
>                    -e 's/(\d+)/f($1)/ge'
>   230: 2 5 23
>    19: 19
>    FOO 40: 2 2 2 5
>    BAR 50: 2 5 5
>
>
> And an example with UUID:
>
>   $ echo UUID FOO UUID BAR UUID \
>       | perl -np -e 'sub f($) { $t = `uuidgen` ; chomp $t ; $t }' \
>                  -e 's/(UUID)/f($1)/ge'
>   4a64a434-73b2-47f9-985f-2eff776b981d FOO 
> fc7f3796-cfed-4850-a363-a70edfceee1b
> BAR de65fe02-96fd-436e-ae2b-66127c438702
>
>
> Of course,
> when executing things like that on the shell, extra care must be taken
> to ensure malicious input can't cause unintended consequences with shell
> escaping tricks.
>
> =======
>
> As for adding a new feature to sed:
>
> There is always a trade-off between adding more and more specialized
> features to sed, and between using existing solution even if they are
> a bit more verbose (i.e. my perl examples are much longer than the
> hypothetical s///x sed feature).
>
> I don't think we can/should modify sed's existing s///e flag (that would
> break existing scripts), but we could perhaps consider adding a new
> flag.
>
> What do others think - is it worth it, or better just stick with perl ?
> (Jim?)
>
> The semantics of such flag must be carefully defined, e.g.
> what's the interplay with grouping, with global flag, with other flags?
>
> regards,
>  - assaf
>

[Prev in Thread]

Current Thread

[Next in Thread]

Feature to add, Russell Harper, 2018/07/18
- Message not available
  - Message not available
    - Re: Feature to add, Assaf Gordon, 2018/07/19
    - Re: Feature to add, Russell Harper <=
    - Re: Feature to add, Assaf Gordon, 2018/07/19

Prev by Date: Re: Feature to add
Next by Date: Re: Feature to add
Previous by thread: Re: Feature to add
Next by thread: Re: Feature to add
Index(es):
- Date
- Thread