[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] How to convert string like ã to utf-8?
From: |
arnold |
Subject: |
Re: [bug-gawk] How to convert string like ã to utf-8? |
Date: |
Tue, 17 May 2016 01:27:39 -0600 |
User-agent: |
Heirloom mailx 12.4 7/29/08 |
Hi.
Peng Yu <address@hidden> wrote:
> Hi, I have some document with something like ã. I want to
> convert it to utf-8 in awk. Is there a way to do? Thanks.
Assuming that ã is the right code, you will need to write a
program that:
1. splits the line using /&#[Xx][[:xdigit:]]{2,4};/ as the regexp.
Use the version that saves the separators.
2. Write a function to parse the entity and return the binary value
as a character.
3. Loop over the various array elements making the substitution and then
joining the parts back together.
4. Write out the reconstituted line.
It's what a professor of mine long ago called a SMOP: A Simple Matter
Of Programming.
If you want me to write the program for you, we can discuss my consulting
rates. But I doubt that's necessary. :-)
Thanks,
Arnold