[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: expand/unexpand: add tests, refactor common code
From: |
Pádraig Brady |
Subject: |
Re: expand/unexpand: add tests, refactor common code |
Date: |
Sun, 17 Jul 2016 13:49:49 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 |
On 17/07/16 02:52, Assaf Gordon wrote:
> Hello,
>
>> On Jun 27, 2016, at 06:56, Pádraig Brady <address@hidden> wrote:
>>
>> On 27/06/16 06:17, Assaf Gordon wrote:
>>> Hello Pádraig and all,
>>>
>>>> On Jun 25, 2016, at 07:20, Pádraig Brady <address@hidden> wrote:
>>>>
>>>> As part of this, or at least before looking at multibyte changes,
>>>> it would be worth considering this proposal for changing the
>>>> unexpand algorithm: http://bugs.gnu.org/23335
>>>
>>> The above bug-report addresses this TODO item:
>>> ===
>>> unexpand: [http://www.opengroup.org/onlinepubs/007908799/xcu/unexpand.html]
>>> printf 'x\t \t y\n'|unexpand -t 8,9 should print its input, unmodified.
>>> printf 'x\t \t y\n'|unexpand -t 5,8 should print "x\ty\n"
>>> ===
>>
>> I think the second command is wrong there actually?
>> Surely it should print "x\t\t y\n"
>
> Digging a bit deeper about various 'unexpand' implementation, it seems there
> are more differences.
> Attached is a summary of most of coreutil's unexpand tests on various systems.
> The trivial cases give the same results, but more tricky cases (e.g. the
> 'blanks' and 'posix' tests) do differ.
>
> The test script is here: http://files.housegordon.org/tmp/test-unexpand-2.sh
> (the last 'ff' octet for AIX can be ignored, I suspect a bug in AIX's
> unexpand when lines are not '\n' terminated).
>
> Example (the inputs are 'blank-1' and 'blank-11' from
> <coreutils>/tests/misc/unexpand.pl):
>
> blanks-1 AIX-1 09 62 09 09 63 09 09 09 64
> blanks-1 Darwin-14.4.0 20 62 09 20 63 09 09 20 64
> blanks-1 FreeBSD-10.1-RELEASE 20 62 09 20 63 09 09 20 64
> blanks-1 Linux-3.16.0-4-amd64 09 62 09 09 63 09 09 09 64
> blanks-1 SunOS-5.11 20 62 20 20 63 20 20 20 64
>
> blanks-11 AIX-1 09 09 34
> blanks-11 Darwin-14.4.0 09 34
> blanks-11 FreeBSD-10.1-RELEASE 09 34
> blanks-11 Linux-3.16.0-4-amd64 09 09 34
> blanks-11 SunOS-5.11 09 20 34
>
>
> And so I wonder if it's best to leave unexpand's algorithm as-is, for the
> sake of backwards-compatability (if someone is expecting coreutils' expected
> behavior),
> and then focus back on multibyte character processing in 'expand' (with or
> without using the refactoring patches).
I think the existing algorithm is fine,
and the refactoring patch should go in now.
We should move the two items from TODO to tests though,
to record this investigation.
# comment that this should arguably minimize translation
# as is done on Solaris, and not modify input, but at least
# verify prints "x\t\t\t y\n"
printf 'x\t \t y\n'|unexpand -t 8,9
# verify prints "x\t\t y\n"
printf 'x\t \t y\n'|unexpand -t 5,8
That with the previous 'extern' patch adjustment I sent
and it's good to push.
thanks!
Pádraig