[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#9420: cut: --output-delimiter ignored in combination with -c
From: |
Jim Meyering |
Subject: |
bug#9420: cut: --output-delimiter ignored in combination with -c |
Date: |
Thu, 01 Sep 2011 23:19:26 +0200 |
Jim Meyering wrote:
> Pádraig Brady wrote:
>> On 09/01/2011 07:33 PM, Philipp Thomas wrote:
>>>
>>> Cut from older coreutils (at least until 7.1) honoured --output-delimiter in
>>> combination with -c. Newer coreutils don't, i.e. with the older cut you get
>>>
>>> $ echo 12 | cut --output-delimiter=X -c1,2
>>> 1X2
>>>
>>> And with the newer ones
>>>
>>> $ echo 12 | cut --output-delimiter=X -c1,2
>>> 12
>>>
>>> Is this a regression or was this a deliberate change that wasn't documented?
>>
>> Looks like a regression introduced with the i18n patch,
>> so I'm closing this here.
>>
>> $ echo 12 | cut --output-delimiter=X -c1,2
>> 12
>> $ echo 12 | LANG=C cut --output-delimiter=X -c1,2
>> 1X2
>
> Wondering how that could happen, given our test suite,
> I realized that we take care to set LC_ALL=C for most tests.
> At least for cut, I'm changing that. Now we'll run each test with
> LC_ALL=C, and again (when possible) with e.g., LC_ALL=fr_FR.UTF-8.
>
>>From ea8295673ebe81b8b0a64bc35a497a44ea419934 Mon Sep 17 00:00:00 2001
> From: Jim Meyering <address@hidden>
> Date: Thu, 1 Sep 2011 21:30:10 +0200
> Subject: [PATCH] tests: exercise distro-added multibyte code paths in cut
>
> * tests/misc/cut: Repeat each test using a multibyte locale,
> if the configure-time test found one.
Ahem.
Running that against the cut from F15's coreutils-8.10-2.fc15.x86_64,
I get numerous segfaults, one for each of these, as well as for
each of the from-stdin variants:
:|/usr/bin/cut --output-d=: -b4567890-
:|/usr/bin/cut --output-d=: -c4567890-
:|/usr/bin/cut --output-d=: -f4567890-
Each does this:
zsh: segmentation fault (core dumped) /usr/bin/cut --output-d=: -c4567890-
The new tests exposed another minor bug in the MB patch series.
The patched cut gives this diagnostic:
cut: invalid byte, character or field list
while the upstream version gives a more precise one:
cut: invalid decreasing range
That too causes test failures.
The "inval1" test provides one example:
cut: test inval1: stderr mismatch, comparing inval1.E (actual) and inval1.3
(expected)
*** inval1.E Thu Sep 1 23:01:36 2011
--- inval1.3 Thu Sep 1 23:01:36 2011
***************
*** 1,2 ****
! cut: invalid byte, character or field list
Try `cut --help' for more information.
--- 1,2 ----
! cut: invalid decreasing range
Try `cut --help' for more information.
inval1.r...
Notice that the offending diagnostic there mentions "character".
That's an addition from the multi-byte patch series.
To make the test suite accommodate that new diagnostic,
I had to make an additional change:
diff --git a/tests/misc/cut b/tests/misc/cut
index 7c1450b..7ed4134 100755
--- a/tests/misc/cut
+++ b/tests/misc/cut
@@ -170,6 +170,19 @@ if ($mb_locale ne 'C')
{
my @new_t = @$t;
my $test_name = shift @new_t;
+
+ # Depending on whether cut is multi-byte-patched,
+ # it emits different diagnostics:
+ # non-MB: invalid byte or field list
+ # MB: invalid byte, character or field list
+ # Adjust the expected error output accordingly.
+ if (grep {ref $_ eq 'HASH' && exists $_->{ERR} && $_->{ERR} eq $inval}
+ (@new_t))
+ {
+ my $sub = {ERR_SUBST => 's/, character//'};
+ push @new_t, $sub;
+ push @$t, $sub;
+ }
push @new, ["$test_name-mb", @new_t, {ENV => "LC_ALL=$mb_locale"}];
}
push @Tests, @new;
Here's the amended patch:
>From 553cd6b5b39ecbaa4fe807099e754373eff9ea1e Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Thu, 1 Sep 2011 21:30:10 +0200
Subject: [PATCH] tests: exercise distro-added multibyte code paths in cut
* tests/misc/cut: Repeat each test using a multibyte locale,
if the configure-time test found such a locale.
Adjust the tests so that they also accept a slightly
different diagnostic that is specific to the MB-patched cut.
---
tests/misc/cut | 33 +++++++++++++++++++++++++++++++++
1 files changed, 33 insertions(+), 0 deletions(-)
diff --git a/tests/misc/cut b/tests/misc/cut
index c905ba9..7ed4134 100755
--- a/tests/misc/cut
+++ b/tests/misc/cut
@@ -23,6 +23,10 @@ use strict;
# Turn off localization of executable's output.
@ENV{qw(LANGUAGE LANG LC_ALL)} = ('C') x 3;
+my $mb_locale = $ENV{LOCALE_FR_UTF8};
+! defined $mb_locale || $mb_locale eq 'none'
+ and $mb_locale = 'C';
+
my $prog = 'cut';
my $try = "Try \`$prog --help' for more information.\n";
my $from_1 = "$prog: fields and positions are numbered from 1\n$try";
@@ -156,6 +160,35 @@ my @Tests =
['big-unbounded-f', '--output-d=:', '-f1234567890-', {IN=>''}, {OUT=>''}],
);
+if ($mb_locale ne 'C')
+ {
+ # Duplicate each test vector, appending "-mb" to the test name and
+ # inserting {ENV => "LC_ALL=$mb_locale"} in the copy, so that we
+ # provide coverage for the distro-added multi-byte code paths.
+ my @new;
+ foreach my $t (@Tests)
+ {
+ my @new_t = @$t;
+ my $test_name = shift @new_t;
+
+ # Depending on whether cut is multi-byte-patched,
+ # it emits different diagnostics:
+ # non-MB: invalid byte or field list
+ # MB: invalid byte, character or field list
+ # Adjust the expected error output accordingly.
+ if (grep {ref $_ eq 'HASH' && exists $_->{ERR} && $_->{ERR} eq $inval}
+ (@new_t))
+ {
+ my $sub = {ERR_SUBST => 's/, character//'};
+ push @new_t, $sub;
+ push @$t, $sub;
+ }
+ push @new, ["$test_name-mb", @new_t, {ENV => "LC_ALL=$mb_locale"}];
+ }
+ push @Tests, @new;
+ }
+
+
@Tests = triple_test address@hidden;
my $save_temps = $ENV{DEBUG};
--
1.7.7.rc0.362.g5a14