coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: expand/unexpand: add tests, refactor common code


From: Pádraig Brady
Subject: Re: expand/unexpand: add tests, refactor common code
Date: Sun, 17 Jul 2016 13:49:49 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 17/07/16 02:52, Assaf Gordon wrote:
> Hello,
> 
>> On Jun 27, 2016, at 06:56, Pádraig Brady <address@hidden> wrote:
>>
>> On 27/06/16 06:17, Assaf Gordon wrote:
>>> Hello Pádraig and all,
>>>
>>>> On Jun 25, 2016, at 07:20, Pádraig Brady <address@hidden> wrote:
>>>>
>>>> As part of this, or at least before looking at multibyte changes,
>>>> it would be worth considering this proposal for changing the
>>>> unexpand algorithm: http://bugs.gnu.org/23335
>>>
>>> The above bug-report addresses this TODO item:
>>> ===
>>> unexpand: [http://www.opengroup.org/onlinepubs/007908799/xcu/unexpand.html]
>>>  printf 'x\t \t y\n'|unexpand -t 8,9 should print its input, unmodified.
>>>  printf 'x\t \t y\n'|unexpand -t 5,8 should print "x\ty\n"
>>> ===
>>
>> I think the second command is wrong there actually?
>> Surely it should print "x\t\t y\n"
> 
> Digging a bit deeper about various 'unexpand' implementation, it seems there 
> are more differences.
> Attached is a summary of most of coreutil's unexpand tests on various systems.
> The trivial cases give the same results, but more tricky cases (e.g. the 
> 'blanks' and 'posix' tests) do differ.
> 
> The test script is here: http://files.housegordon.org/tmp/test-unexpand-2.sh
> (the last 'ff' octet for AIX can be ignored, I suspect a bug in AIX's 
> unexpand when lines are not '\n' terminated).
> 
> Example (the inputs are 'blank-1' and 'blank-11' from 
> <coreutils>/tests/misc/unexpand.pl):
> 
> blanks-1   AIX-1                09 62 09 09 63 09 09 09 64
> blanks-1   Darwin-14.4.0        20 62 09 20 63 09 09 20 64 
> blanks-1   FreeBSD-10.1-RELEASE 20 62 09 20 63 09 09 20 64 
> blanks-1   Linux-3.16.0-4-amd64 09 62 09 09 63 09 09 09 64
> blanks-1   SunOS-5.11           20 62 20 20 63 20 20 20 64
> 
> blanks-11  AIX-1                09 09 34
> blanks-11  Darwin-14.4.0        09 34 
> blanks-11  FreeBSD-10.1-RELEASE 09 34 
> blanks-11  Linux-3.16.0-4-amd64 09 09 34
> blanks-11  SunOS-5.11           09 20 34
> 
> 
> And so I wonder if it's best to leave unexpand's algorithm as-is, for the 
> sake of backwards-compatability (if someone is expecting coreutils' expected 
> behavior),
> and then focus back on multibyte character processing in 'expand' (with or 
> without using the refactoring patches).

I think the existing algorithm is fine,
and the refactoring patch should go in now.

We should move the two items from TODO to tests though,
to record this investigation.

# comment that this should arguably minimize translation
# as is done on Solaris, and not modify input, but at least
# verify prints "x\t\t\t y\n"
printf 'x\t \t y\n'|unexpand -t 8,9

# verify prints "x\t\t y\n"
printf 'x\t \t y\n'|unexpand -t 5,8

That with the previous 'extern' patch adjustment I sent
and it's good to push.

thanks!
Pádraig



reply via email to

[Prev in Thread] Current Thread [Next in Thread]