Re: [Patch] expand,unexpand multibyte support

coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Patch] expand,unexpand multibyte support

From:	Ondrej Oprala
Subject:	Re: [Patch] expand,unexpand multibyte support
Date:	Wed, 20 Feb 2013 18:59:02 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 02/20/2013 12:21 AM, Bernhard Voelker wrote:

On 02/18/2013 04:30 PM, Ondrej Oprala wrote:

Hi, I've been working on multibyte support for the {un,}expand utilities
lately, my approach being similar to Padraig's from 2010 (
http://lists.gnu.org/archive/html/coreutils/2010-09/msg00029.html ) .
Both tools now read by lines, not bytes, and then iterate over the
characters properly.

Thanks - the package maintainer of all distributions will be very happy
once multi-byte support is integrated to all upstream coreutils programs.

I can't tell too much about your patch yet - I even tried to avoid
the umlaut-ö in my own name whenever possible my whole life ;-)
So here are only a few general notes.

I had a bit problems to compile because expand-core.h needs the types
uintmax_t, uint32_t and uint8_t which are not yet defined.
Moving the #include down (in src/{expand,expand-core,unexpand}.c)
worked around this problem, e.g. expand.c:

--- a/src/expand.c
+++ b/src/expand.c
@@ -46,13 +46,13 @@
  # include <unistring/localcharset.h>
  #endif

-#include "expand-core.h"
-
  #include "system.h"
  #include "error.h"
  #include "fadvise.h"
  #include "xstrndup.h"

+#include "expand-core.h"
+
  /* The official name of this program (e.g., no 'g' prefix).  */
  #define PROGRAM_NAME "expand"


In the expand() and unexpand() functions, the variable rawline can
be changed from char** to char*. This makes the code better readable.
I'm not sure whether we should free the rawline buffer on any input
line anyway, because it's filled by getline() and resized as needed;
i.e. reusing malloc'ed space is not a bad thing.

And instead of strdup() without checking, we should use xstrdup():
        {
-        line = (uint8_t *) strdup (*rawline);
+        line = (uint8_t *) xstrdup (rawline);
          clen = 1;
        }
Hopefully, we can avoid that strdup in the final version, and just
use the buffer we already have.

In the tests, please check the exit status of expand/unexpand,
and use our own compare function; e.g. tests/expand/mb.sh:

-expand < in > out
-cmp out exp > /dev/null 2>&1 || fail=1
+expand < in > out || fail=1
+compare exp out || fail=1

It would be good to have much more tests. I have the feeling
that the coverage rate of expand/unexpand wasn't too great
also in the existing tests - though the unexpand test looks
slightly better than the expand test.

Have a nice day,
Berny

Thank you both very much for looking at the patch,

I realize the patch is a bit big and thus hard to review (for which I'msorry), butwith {un,}expand tools being so tightly coupled together, I can'timagine a different approach.


I've made some fixes/additions.
The patch now correctly handles characters with a display length > 1,

The columns variable is now incremented by character unit length, not by1 implicitly.Unexpand now uses the buffer line + offset to output pending charactersas well,which makes it easier to work with blanks such as ideo-space in thepending buffer.

(special case when pending blanks with display length == 2 get into
the pending_blank[] buffer is covered in tests.)

Speaking of tests, I've added a few test cases (thank you again for theideo-space

reproducers ;) ).

Tests now use the correct compare, header files are ordered properly ands/strdup/xstrdup/ in code.


Thanks,
Ondrej

DIFF
Description: Text document

[Prev in Thread]

Current Thread

[Next in Thread]

[Patch] expand,unexpand multibyte support, Ondrej Oprala, 2013/02/18
- Re: [Patch] expand,unexpand multibyte support, Pádraig Brady, 2013/02/19
- Re: [Patch] expand,unexpand multibyte support, Bernhard Voelker, 2013/02/19
  - Re: [Patch] expand,unexpand multibyte support, Ondrej Oprala <=

Prev by Date: [PATCH] dircolors.hin: add .cab, .alz, .lzo, .lrz, .t7z, .tzo and .lha to colorized archives
Next by Date: [PATCH]: uniq: add "--group" option
Previous by thread: Re: [Patch] expand,unexpand multibyte support
Next by thread: 8.21.2-c53c - failure on rm/deep-2 and du/long-from-unreadable
Index(es):
- Date
- Thread