[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Patch] expand,unexpand multibyte support
From: |
Pádraig Brady |
Subject: |
Re: [Patch] expand,unexpand multibyte support |
Date: |
Thu, 07 Mar 2013 03:49:04 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
On 02/20/2013 05:59 PM, Ondrej Oprala wrote:
> On 02/20/2013 12:21 AM, Bernhard Voelker wrote:
>> On 02/18/2013 04:30 PM, Ondrej Oprala wrote:
>>> Hi, I've been working on multibyte support for the {un,}expand utilities
>>> lately, my approach being similar to Padraig's from 2010 (
>>> http://lists.gnu.org/archive/html/coreutils/2010-09/msg00029.html ) .
>>> Both tools now read by lines, not bytes, and then iterate over the
>>> characters properly.
>> Thanks - the package maintainer of all distributions will be very happy
>> once multi-byte support is integrated to all upstream coreutils programs.
>>
>> I can't tell too much about your patch yet - I even tried to avoid
>> the umlaut-ö in my own name whenever possible my whole life ;-)
>> So here are only a few general notes.
>>
>> I had a bit problems to compile because expand-core.h needs the types
>> uintmax_t, uint32_t and uint8_t which are not yet defined.
>> Moving the #include down (in src/{expand,expand-core,unexpand}.c)
>> worked around this problem, e.g. expand.c:
>>
>> --- a/src/expand.c
>> +++ b/src/expand.c
>> @@ -46,13 +46,13 @@
>> # include <unistring/localcharset.h>
>> #endif
>>
>> -#include "expand-core.h"
>> -
>> #include "system.h"
>> #include "error.h"
>> #include "fadvise.h"
>> #include "xstrndup.h"
>>
>> +#include "expand-core.h"
>> +
>> /* The official name of this program (e.g., no 'g' prefix). */
>> #define PROGRAM_NAME "expand"
>>
>>
>> In the expand() and unexpand() functions, the variable rawline can
>> be changed from char** to char*. This makes the code better readable.
>> I'm not sure whether we should free the rawline buffer on any input
>> line anyway, because it's filled by getline() and resized as needed;
>> i.e. reusing malloc'ed space is not a bad thing.
>>
>> And instead of strdup() without checking, we should use xstrdup():
>> {
>> - line = (uint8_t *) strdup (*rawline);
>> + line = (uint8_t *) xstrdup (rawline);
>> clen = 1;
>> }
>> Hopefully, we can avoid that strdup in the final version, and just
>> use the buffer we already have.
>>
>> In the tests, please check the exit status of expand/unexpand,
>> and use our own compare function; e.g. tests/expand/mb.sh:
>>
>> -expand < in > out
>> -cmp out exp > /dev/null 2>&1 || fail=1
>> +expand < in > out || fail=1
>> +compare exp out || fail=1
>>
>> It would be good to have much more tests. I have the feeling
>> that the coverage rate of expand/unexpand wasn't too great
>> also in the existing tests - though the unexpand test looks
>> slightly better than the expand test.
>>
>> Have a nice day,
>> Berny
> Thank you both very much for looking at the patch,
> I realize the patch is a bit big and thus hard to review (for which I'm
> sorry), but
> with {un,}expand tools being so tightly coupled together, I can't imagine a
> different approach.
>
> I've made some fixes/additions.
> The patch now correctly handles characters with a display length > 1,
> The columns variable is now incremented by character unit length, not by 1
> implicitly.
> Unexpand now uses the buffer line + offset to output pending characters as
> well,
> which makes it easier to work with blanks such as ideo-space in the pending
> buffer.
> (special case when pending blanks with display length == 2 get into
> the pending_blank[] buffer is covered in tests.)
> Speaking of tests, I've added a few test cases (thank you again for the
> ideo-space
> reproducers ;) ).
> Tests now use the correct compare, header files are ordered properly and
> s/strdup/xstrdup/ in code.
The attached 5 patches fix lots of build issues.
The test passes as does make syntax-check.
I've not looked at the code yet.
thanks,
Pádraig.
expand-mb-build-fixes.patch
Description: Text Data
- Re: [Patch] expand,unexpand multibyte support,
Pádraig Brady <=