bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] How to convert string like ã to utf-8?


From: arnold
Subject: Re: [bug-gawk] How to convert string like ã to utf-8?
Date: Tue, 17 May 2016 01:27:39 -0600
User-agent: Heirloom mailx 12.4 7/29/08

Hi.

Peng Yu <address@hidden> wrote:

> Hi, I have some document with something like &#x00E3;. I want to
> convert it to utf-8 in awk. Is there a way to do? Thanks.

Assuming that &#x00E3 is the right code, you will need to write a
program that:

1. splits the line using /&#[Xx][[:xdigit:]]{2,4};/ as the regexp.
Use the version that saves the separators.

2. Write a function to parse the entity and return the binary value
as a character.

3. Loop over the various array elements making the substitution and then
joining the parts back together.

4. Write out the reconstituted line.

It's what a professor of mine long ago called a SMOP: A Simple Matter
Of Programming.

If you want me to write the program for you, we can discuss my consulting
rates.  But I doubt that's necessary. :-)

Thanks,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]