|
From: | Pádraig Brady |
Subject: | Re: [Patch] expand,unexpand multibyte support |
Date: | Mon, 18 Feb 2013 20:41:06 +0000 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1 |
On 02/18/2013 03:30 PM, Ondrej Oprala wrote:
Hi, I've been working on multibyte support for the {un,}expand utilities lately, my approach being similar to Padraig's from 2010 ( http://lists.gnu.org/archive/html/coreutils/2010-09/msg00029.html ) . Both tools now read by lines, not bytes, and then iterate over the characters properly. Since both tools share huge amounts of code, I've created an expand-core.c file to hold it. I've also noticed that if you add libunistring to bootstrap.conf's list of modules, libcoreutils.a will have problems compiling, hence the gnulib patch. I was planning on doing cut next, if this gets accepted well.
Thanks for working on this! The general approach looks good. Since tabs are used for alignment, the width of space and non spaces are significant. For example ideographic space is 2 wide. So augmenting the tests with something like ensuring the following is aligned would be good: env printf '12345678 e\t|ascii(1) \u00E9\t|composed(1) e\u0301\t|decomposed(1) \u3000\t|ideo-space(2) \uFF0D\t|full-hypen(2) ' | expand I'll try to review fully over the next while. thanks, Pádraig.
[Prev in Thread] | Current Thread | [Next in Thread] |