bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] feature request: expanding escape sequences


From: Ed Morton
Subject: Re: [bug-gawk] feature request: expanding escape sequences
Date: Sun, 06 Jul 2014 09:26:41 -0500
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

Arnold - I don't believe it can be done fairly easily in any way. For example, wrt your suggestion of splitting on an RE, let's say I define my escape-sequence matching RE as '\[[:alpha:]]`. Well, that's wrong:
$ awk -v var='a\tb' 'BEGIN{print var}'
a       b

$ awk -v var='a\jb' 'BEGIN{print var}'
awk: warning: escape sequence `\j' treated as plain `j'
ajb
so I need to enumerate all possible characters that COULD be in escape sequences, e.g. '\[tn...]'.

Now, what if someone had input that contains an escaped backslash followed by a 't':
$ awk -v var='a\\tb' 'BEGIN{print var}'
a\tb
It would obviously be wrong for my RE to assume that every '\t' in my input should be translated to a tab since it could be part of a '\\t', so now I'd need to have some way of looking at the string immediately preceeding my candidate "escape sequence" and if it's an odd number of backslashes then not treat it as an escape sequence. Are there other cases where '\t' could be in a context that means it's not a tab? I've no idea.

Once I'm done with all of that I still need a series of if-else-s to manually map every escape sequence to it's specific character. Are there locale or other issues with that? I've no idea.

Anyway, it's becoming a fairly lengthy and complicated script to do such a conceptually trivial thing. Realistically, no-one's going to write a function like that or write some C code for it instead of just calling the shell's printf to expand the tabs for them despite the risk that they could end up executing some dangerous piece of code by doing so.

I don't think this is something that will come up often so if you'd rather not provide a function for it on the basis that it's not a common enough problem to clutter up the language with a solution, I completely understand, but it is something there's simply no good solution for today when it does come up.

Thanks for considering it either way,

    Ed.

On 7/6/2014 6:13 AM, Aharon Robbins wrote:
Hello Ed.

This is something that can be done fairly easily as a function written
in awk.  In particular, I'd consider using the four-argument version of
split(), which gives you the pieces and the separators, using a regex
that matches escape sequences. You could then convert the strings into
their corresponding values fairly easily, and rejoin all the parts into
the result string.

Another alternative would be to write an extension function in C to
do this.

I don't see this as general purpose enough to require adding another
built-in function to gawk.

Thanks,

Arnold

Date: Sat, 05 Jul 2014 09:38:17 -0500
From: Ed Morton <address@hidden>
To: address@hidden
Subject: [bug-gawk] feature request: expanding escape sequences

Guys - I just tried to do something conceptually trivial and discovered there is 
literally NO good way to do it in awk so I was wondering if a gawk function 
could be provided to do it?

I just want to be able to expand escape sequences in text read 
from an input file. For example, let's say I have:

     $ cat file
     a\tb
     c\td

and I want to output:

     a<tab>b
     c<tab>d

where "<tab>" is a literal control character.

The most concise solution I've come up with is 
to invoke the shell to parse the file, e.g.

     $ awk '{ system("printf \047" $0 "\n\047") }' file
     a       b
     c       d

but that has some caveats, not least of which is that if the input file contains 
text like '$(ls)' then that command will be executed!

The alternatives seem to be writing text to manually parse each line to convert 
every escape-char to it's literal character, or using a script to read the input 
file to generate another script and then execute that.

I'm not looking for an `eval` function, just something that will convert escape 
sequences to their equivalent characters so I can do something like this if 
`file` contains lines of formatting text:

     awk '{ printf expandEscapes($0), "whatever" }' file

and expandEscapes() will just return it's argument as a string with all escape 
chars expanded (actual function name up for grabs of course).

What do you think?

      Ed.


    


reply via email to

[Prev in Thread] Current Thread [Next in Thread]