From MAILER-DAEMON Tue Nov 10 08:27:59 2020 Received: from list by lists.gnu.org with archive (Exim 4.90_1) id 1kcTgl-0001XN-1l for mharc-bug-datamash@gnu.org; Tue, 10 Nov 2020 08:27:59 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:55090) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kOfvy-0004SL-Qh for bug-datamash@gnu.org; Sat, 03 Oct 2020 07:42:39 -0400 Received: from mail-ed1-x52f.google.com ([2a00:1450:4864:20::52f]:39099) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kOfvw-0005J5-M0 for bug-datamash@gnu.org; Sat, 03 Oct 2020 07:42:38 -0400 Received: by mail-ed1-x52f.google.com with SMTP id t21so1859706eds.6 for ; Sat, 03 Oct 2020 04:42:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=to:from:subject:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=Xio7kc9jXq4pqqQuK4L30KAb0HRpaoLmi6mrInv+57Y=; b=rx4YKgprLsRGtl74xaxU0uB4Q1J/r2EbJ+SYG1/NDIo49dxhLsU+GhE9GvjTsSMT7A YCoX/U5UcGVN7awmVWz5I7Hn7T7qADs2ZjnA/jIXrQlpfI4viq3UcW5x1c2cJTITCMP0 XSkxbW2duyNXax81eDeWZjpP0Vc2t41WgXbuyBOaLI825T/es5D3NVQ/hnramA88lwAD XdQA+W+kHXEAdm7zAHe85soyVJdVteiuGjpTd0XTVQoSUu0y86HdJPp1hk/VUnoxfZur fHY5Ft867Z7uv7yRgb5wKyGEYGCKNA6t9IzKpJzPmVDPDd6Roi0utqSGgPygnDLDIPpK 0MSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:subject:message-id:date:user-agent :mime-version:content-language:content-transfer-encoding; bh=Xio7kc9jXq4pqqQuK4L30KAb0HRpaoLmi6mrInv+57Y=; b=DF0bQkTqqmTUHqP+RCfCQqe2F+x0LO7XxIvJMXkUnDCsgRlIVcfI9rHz5ionc6Op2U KYK1Kn1bW0y/9YK1+nB8D8GLef9J0cuZHJO6MIJDZw94rTsk2a24wopK2/1SfpZCbCoM uMHwDajhsABvFtxKozrYaGcrQ5QqCbaeKB+9D3apH3StKNe14nX6uF295wTBk3o1acXy VFLS5VDEzwUgfFfq3j0T6Di3GQaXYWqWhfpAyYnVIHuo/vHoqOkJsJA3DLvaW3eCIkDd LoM6Kk6EXFIL4V3ohC0TKWg0kegW7twuIJzmuLR1C82HjXQo74eQdhs8R0wOW7L6TlTQ h99w== X-Gm-Message-State: AOAM530+t2W7DauSUuc0c8aWa5xsGMCTLIPAMQdBXq+DqwjPzJOVf6U8 TYDkGeucsZHzDk3fsJLlmaKY91I/Nh/NdA== X-Google-Smtp-Source: ABdhPJwv6ZAdMHGd3Q5yHoa1t/W/37eQeR7GYggNk79Qmx+m4EsVhAGDLkVIxK2LC6xff2jisLnAGw== X-Received: by 2002:a50:fa94:: with SMTP id w20mr7761418edr.206.1601725353210; Sat, 03 Oct 2020 04:42:33 -0700 (PDT) Received: from asommer-mac.local (p200300c13729db00385dcb85b55bdf98.dip0.t-ipconnect.de. [2003:c1:3729:db00:385d:cb85:b55b:df98]) by smtp.googlemail.com with ESMTPSA id bk9sm2806354ejb.122.2020.10.03.04.42.32 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 03 Oct 2020 04:42:32 -0700 (PDT) To: bug-datamash@gnu.org From: Andreas Sommer Subject: Basic calculation mistakes (e.g. mean/median) Message-ID: <97710b78-e398-1b28-92a5-07c5d46c152f@googlemail.com> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Received-SPF: pass client-ip=2a00:1450:4864:20::52f; envelope-from=andreas.sommer87@googlemail.com; helo=mail-ed1-x52f.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Tue, 10 Nov 2020 08:27:58 -0500 X-BeenThere: bug-datamash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Questions, Discussions and bug reports for GNU Datamash" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Date: Sat, 03 Oct 2020 11:42:39 -0000 X-Original-Date: Sat, 3 Oct 2020 13:42:31 +0200 X-List-Received-Date: Sat, 03 Oct 2020 11:42:39 -0000 Hi, I just started using datamash on a small dataset and noticed its calculations are off. And indeed it doesn't calculate basics correctly. Or it somehow depends on the type of display (see below argument combinations): --- $ seq 1 3 | datamash mean 1 median 1 2 2 $ seq 1 3 | datamash -H mean 1 median 1 mean(1) median(1) 2.5 2.5 $ seq 1 3 | datamash -R 5 -H mean 1 median 1 mean(1) median(1) 2.50000 2.50000 --- $ seq 1 4 | datamash -H -R 2 mean 1 median 1 mean(1) median(1) 3.00 3.00 $ seq 1 4 | datamash -R 2 mean 1 median 1 2.50 2.50 $ seq 1 4 | datamash mean 1 median 1 2.5 2.5 --- Until that gets fixed, it means I can't trust the tool :( Cheers, Andreas From MAILER-DAEMON Tue Nov 10 08:28:01 2020 Received: from list by lists.gnu.org with archive (Exim 4.90_1) id 1kcTgn-0001Yj-72 for mharc-bug-datamash@gnu.org; Tue, 10 Nov 2020 08:28:01 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:51044) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kW9XW-0002We-HJ for bug-datamash@gnu.org; Fri, 23 Oct 2020 22:44:18 -0400 Received: from mail-yb1-f193.google.com ([209.85.219.193]:46953) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kW9XU-0000Sp-RW for bug-datamash@gnu.org; Fri, 23 Oct 2020 22:44:18 -0400 Received: by mail-yb1-f193.google.com with SMTP id a4so2791816ybq.13 for ; Fri, 23 Oct 2020 19:44:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:from:date:message-id :subject:to; bh=Wcdr/TGgELzBYtpNKzEybSmfT/k+4SW5nlgBxBwXrAM=; b=SDa0ZTA0ZruYQjgnHfSKxq1c6kFsjRspWnpNbZ9bFmUHmP4fVXV6DO/GXW0HMGSM7l pCTiI3cMU5VyeKCww9hJbzSTbFtGALIdrSoXf1KR9lsZouC+5zoRCw8iquimkrvsYs/P LbseK6kxMujxn6KATAWQf/gw6AZngv2zFcUdDMNQqjk/zwWyj/5CileFHPSomo6rMOoo 3wuhX5nelRsmqsy0xDdigdrc25cuIYtK3Mg0t/Mq47z9iAotfdFXQ5VdTTZf88iGh5zK 6sAzfq+uF5mohQ9NzIFkod46GxRmbFanM+VgaTo7VL4oSkfOK0YUdvYfk58xUCfXFOQE bElA== X-Gm-Message-State: AOAM531u8kkWDNvsStgWUm90Ez9czlS74oSgN8diw0ulokhbcH/fls9f Dgxp743uIOXOqy4LtKfP4YUakk+Hg2vA8GK/IC1u3DhK+uo2GA== X-Google-Smtp-Source: ABdhPJyFWiR1bQ3o1Aw8RbTcHpCL5oKX+Wu/ATysqKexHgqHYLtIk3CnOxUJOijMvhUf2qYqhHV6sk7IoUZrLKpfbQ0= X-Received: by 2002:a25:4cc9:: with SMTP id z192mr7279197yba.297.1603507454753; Fri, 23 Oct 2020 19:44:14 -0700 (PDT) MIME-Version: 1.0 Reply-To: cronos586@gmail.com From: Catalin Patulea Message-ID: Subject: "Segmentation fault" when input contains embedded NUL characters To: bug-datamash@gnu.org Content-Type: multipart/alternative; boundary="0000000000003c400e05b261acc3" Received-SPF: pass client-ip=209.85.219.193; envelope-from=cronos586@gmail.com; helo=mail-yb1-f193.google.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/23 22:44:15 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: -8 X-Spam_score: -0.9 X-Spam_bar: / X-Spam_report: (-0.9 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, FREEMAIL_REPLYTO_END_DIGIT=0.25, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Tue, 10 Nov 2020 08:27:58 -0500 X-BeenThere: bug-datamash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Questions, Discussions and bug reports for GNU Datamash" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Date: Sat, 24 Oct 2020 02:44:18 -0000 X-Original-Date: Fri, 23 Oct 2020 22:44:04 -0400 X-List-Received-Date: Sat, 24 Oct 2020 02:44:18 -0000 --0000000000003c400e05b261acc3 Content-Type: text/plain; charset="UTF-8" Hello, $ datamash --version datamash (GNU datamash) 1.6 $ dd if=/dev/zero bs=100 count=1 | datamash countunique 1 1+0 records in 1+0 records out 100 bytes copied, 0.000125612 s, 796 kB/s Segmentation fault backtrace: (gdb) bt #0 0x000055555555c95c in field_op_get_string_ptrs (op=0x55555557a5f0, sort_case_sensitive=sort_case_sensitive@entry=true, sort=true) at src/field-ops.c:278 #1 0x000055555555d194 in count_unique_values (op=, case_sensitive=true) at src/field-ops.c:640 #2 0x000055555555d5ec in field_op_summarize (op=0x55555557a5f0) at src/field-ops.c:963 #3 0x000055555555f5cb in summarize_field_ops () at src/datamash.c:539 #4 0x000055555555f88a in process_group (line=0x7fffffffe340) at src/datamash.c:589 #5 0x000055555555fab7 in process_file () at src/datamash.c:651 #6 0x000055555555786b in main (argc=, argv=0x7fffffffe5c8) at src/datamash.c:1291 (gdb) fra 1 #1 0x000055555555d194 in count_unique_values (op=, case_sensitive=true) at src/field-ops.c:640 640 in src/field-ops.c (gdb) fra 0 #0 0x000055555555c95c in field_op_get_string_ptrs (op=0x55555557a5f0, sort_case_sensitive=sort_case_sensitive@entry=true, sort=true) at src/field-ops.c:278 278 in src/field-ops.c Simply, field_op_get_string_ptrs, and probably datamash in general, assumes input will not contain embedded NULs: https://github.com/agordon/datamash/blob/v1.6/src/field-ops.c#L279 For my application, the embedded NULs are an accident, and I can resolve that and resume using datamash. datamash does not need to support inputs with embedded NULs. But it should not crash on such inputs, either. Perhaps output a message warning the user that such inputs are not supported. I was considering writing a patch. Is the github repository actively watched for pull requests? I noticed many patches on the mailing list awaiting review. Thanks for a great tool! Catalin --0000000000003c400e05b261acc3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello,

$ datamash =C2=A0--version
da= tamash (GNU datamash) 1.6

$ dd if=3D/dev/zero = bs=3D100 count=3D1 | datamash countunique 1
1+0 records in
1+0 record= s out
100 bytes copied, 0.000125612 s, 796 kB/s
Segmentation fault

backtrace:

(gdb) bt
= #0 =C2=A00x000055555555c95c in field_op_get_string_ptrs (op=3D0x55555557a5f= 0, sort_case_sensitive=3Dsort_case_sensitive@entry=3Dtrue, sort=3Dtrue)
= =C2=A0 =C2=A0 at src/field-ops.c:278
#1 =C2=A00x000055555555d194 in coun= t_unique_values (op=3D<optimized out>, case_sensitive=3Dtrue) at src/= field-ops.c:640
#2 =C2=A00x000055555555d5ec in field_op_summarize (op=3D= 0x55555557a5f0) at src/field-ops.c:963
#3 =C2=A00x000055555555f5cb in su= mmarize_field_ops () at src/datamash.c:539
#4 =C2=A00x000055555555f88a i= n process_group (line=3D0x7fffffffe340) at src/datamash.c:589
#5 =C2=A00= x000055555555fab7 in process_file () at src/datamash.c:651
#6 =C2=A00x00= 0055555555786b in main (argc=3D<optimized out>, argv=3D0x7fffffffe5c8= ) at src/datamash.c:1291
(gdb) fra 1
#1 =C2=A00x000055555555d194 in c= ount_unique_values (op=3D<optimized out>, case_sensitive=3Dtrue) at s= rc/field-ops.c:640
640 in src/field-ops.c
(gdb) fra 0
#0 =C2=A00x0= 00055555555c95c in field_op_get_string_ptrs (op=3D0x55555557a5f0, sort_case= _sensitive=3Dsort_case_sensitive@entry=3Dtrue, sort=3Dtrue)
=C2=A0 =C2= =A0 at src/field-ops.c:278
278 in src/field-ops.c

Simply,=C2=A0field_op_get_string_ptrs, and probably datamash in gene= ral, assumes input will not contain embedded NULs:
=
For my application, the embedded NULs are an accident, and I= can resolve that and resume using datamash.=C2=A0datamash does not need to= support inputs with embedded NULs. But it should not crash on such inputs,= either. Perhaps output a message warning the user that such inputs are not= supported.

I was considering writing a patch. Is = the github repository actively watched for pull requests? I noticed many pa= tches on the mailing list awaiting review.

Thanks = for a great tool!

Catalin
--0000000000003c400e05b261acc3-- From MAILER-DAEMON Tue Nov 10 08:42:51 2020 Received: from list by lists.gnu.org with archive (Exim 4.90_1) id 1kcTv3-0005Sf-2L for mharc-bug-datamash@gnu.org; Tue, 10 Nov 2020 08:42:46 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:53036) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kcTv0-0005SW-Dz for bug-datamash@gnu.org; Tue, 10 Nov 2020 08:42:42 -0500 Received: from mail-ot1-x336.google.com ([2607:f8b0:4864:20::336]:33383) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kcTuv-00067L-00 for bug-datamash@gnu.org; Tue, 10 Nov 2020 08:42:41 -0500 Received: by mail-ot1-x336.google.com with SMTP id i18so12523070ots.0 for ; Tue, 10 Nov 2020 05:42:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=3MErhAiUqJF6mL+c/mhI2J8vhzQVfLyHW5WYZUwUIT8=; b=V5mXSKiKZJbv0BfH7Mgclg8bECLoB3rSXXZVqRLSHNXRTxgN1+9Kw0vufoRT7N1snl m1RHJ5vrKAomEEddwJqitbnbGXcX2C6UrOUo+zQXXT5qOTgeBzZo4mgkoINpNBrTJvjS BfdLX8JlNGoLb2WxcfqrSJgimADVEcmZGh3REMrEJ1216pDNEx+YQvv19BpwhhiY4WmO CaMOXz8QTDoNcZ6YE/z13N6DrBpOv5CwaDtdeOn0uXk05fV8XUDwlSZkv4Ihoxpgx2Xp cfCn2lGZvAEVp3tPpT9GNoJrcj2eB62BARGit1BdsZI3BztwA2ry8o9eCAudmju2F9oh qVGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=3MErhAiUqJF6mL+c/mhI2J8vhzQVfLyHW5WYZUwUIT8=; b=QfvZKlaI0NUBihkb2SVURJeD/nF/KO30LJwgfXiTTb8ZXZitOSUkemek9fjzo1MvwB env9mAJskTv5KexjJmqn/zqRvO7XNJoMhXrWQ0iG7EBhi0xBrr3BYXMPWbRcFqSQ+No8 0Bjuq8PlY3Xjae50z220M8gUnGXRXMhQVL2siYtGeT55oEmxA+MMl6rZ6NECKRSRSFX3 dynMj7JQMCbbD6VQjNmnh0FZuzGb4FaUmX2T4EHoM5ulQb5RrRB8M1lbXL+P6LspryY+ aXyuDoeSBtJXgc26yeGQn9XoQ07FVcKAApIPsYcQry90gHCBPqYa8ToJEO9iav5T4Fz2 BRNg== X-Gm-Message-State: AOAM530jvOuA8vjdY5KawoJNLyaAxKZkFCvAZSDijoEca4aNIpJ1Eo/b ST0/fNCwwWwvI1qytK79v7QjidB2mPY/aCB0AEc= X-Google-Smtp-Source: ABdhPJz5qH8OJBtGCmmtAclpBmDnhRlIdSUElnL/EQ8o8PyrjX97+wKlhDtNjURbYG88tBf6jFz456w0c9hrujERvkI= X-Received: by 2002:a9d:6a19:: with SMTP id g25mr13692609otn.303.1605015754730; Tue, 10 Nov 2020 05:42:34 -0800 (PST) MIME-Version: 1.0 References: <97710b78-e398-1b28-92a5-07c5d46c152f@googlemail.com> In-Reply-To: <97710b78-e398-1b28-92a5-07c5d46c152f@googlemail.com> From: Shawn Wagner Date: Tue, 10 Nov 2020 05:42:22 -0800 Message-ID: Subject: Re: Basic calculation mistakes (e.g. mean/median) To: Andreas Sommer Cc: bug-datamash@gnu.org Content-Type: multipart/alternative; boundary="000000000000eb708205b3c0d9b3" Received-SPF: pass client-ip=2607:f8b0:4864:20::336; envelope-from=shawnw.mobile@gmail.com; helo=mail-ot1-x336.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: bug-datamash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Questions, Discussions and bug reports for GNU Datamash" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Nov 2020 13:42:43 -0000 --000000000000eb708205b3c0d9b3 Content-Type: text/plain; charset="UTF-8" Which of those are wrong and what should it be getting for them? On Tue, Nov 10, 2020 at 5:28 AM Andreas Sommer < andreas.sommer87@googlemail.com> wrote: > Hi, > > I just started using datamash on a small dataset and noticed its > calculations are off. And indeed it doesn't calculate basics correctly. Or > it somehow depends on the type of display (see below argument combinations): > > --- > > $ seq 1 3 | datamash mean 1 median 1 > 2 2 > > $ seq 1 3 | datamash -H mean 1 median 1 > mean(1) median(1) > 2.5 2.5 > > $ seq 1 3 | datamash -R 5 -H mean 1 median 1 > mean(1) median(1) > 2.50000 2.50000 > > --- > > $ seq 1 4 | datamash -H -R 2 mean 1 median 1 > mean(1) median(1) > 3.00 3.00 > > $ seq 1 4 | datamash -R 2 mean 1 median 1 > 2.50 2.50 > > $ seq 1 4 | datamash mean 1 median 1 > 2.5 2.5 > > --- > > Until that gets fixed, it means I can't trust the tool :( > > Cheers, > Andreas > > --000000000000eb708205b3c0d9b3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Which of those are wrong and what should it be getting for= them?

On Tue, Nov 10, 2020 at 5:28 AM Andreas Sommer <andreas.sommer87@googlemail.com> wr= ote:
Hi,

I just started using datamash on a small dataset and noticed its calculatio= ns are off. And indeed it doesn't calculate basics correctly. Or it som= ehow depends on the type of display (see below argument combinations):

---

$ seq 1 3 | datamash mean 1 median 1
2=C2=A0 =C2=A0 =C2=A0 =C2=A02

$ seq 1 3 | datamash -H mean 1 median 1
mean(1) median(1)
2.5=C2=A0 =C2=A0 =C2=A02.5

$ seq 1 3 | datamash -R 5 -H mean 1 median 1
mean(1) median(1)
2.50000 2.50000

---

$ seq 1 4 | datamash -H -R 2 mean 1 median 1
mean(1) median(1)
3.00=C2=A0 =C2=A0 3.00

$ seq 1 4 | datamash -R 2 mean 1 median 1
2.50=C2=A0 =C2=A0 2.50

$ seq 1 4 | datamash mean 1 median 1
2.5=C2=A0 =C2=A0 =C2=A02.5

---

Until that gets fixed, it means I can't trust the tool :(

Cheers,
=C2=A0Andreas

--000000000000eb708205b3c0d9b3-- From MAILER-DAEMON Tue Nov 10 08:48:14 2020 Received: from list by lists.gnu.org with archive (Exim 4.90_1) id 1kcU0M-0006pP-92 for mharc-bug-datamash@gnu.org; Tue, 10 Nov 2020 08:48:14 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:54166) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kcU0K-0006pH-Px for bug-datamash@gnu.org; Tue, 10 Nov 2020 08:48:12 -0500 Received: from beige.elm.relay.mailchannels.net ([23.83.212.16]:18276) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kcU0I-0007nc-R4 for bug-datamash@gnu.org; Tue, 10 Nov 2020 08:48:12 -0500 X-Sender-Id: dreamhost|x-authsender|brandon@invergo.net Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 4FDD81E1F41; Tue, 10 Nov 2020 13:48:07 +0000 (UTC) Received: from pdx1-sub0-mail-a1.g.dreamhost.com (100-98-118-84.trex.outbound.svc.cluster.local [100.98.118.84]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 546241E0901; Tue, 10 Nov 2020 13:48:06 +0000 (UTC) X-Sender-Id: dreamhost|x-authsender|brandon@invergo.net Received: from pdx1-sub0-mail-a1.g.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.18.10); Tue, 10 Nov 2020 13:48:07 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|brandon@invergo.net X-MailChannels-Auth-Id: dreamhost X-Stop-Hysterical: 5f11093c2e0fcec1_1605016086650_1048693476 X-MC-Loop-Signature: 1605016086650:2648465895 X-MC-Ingress-Time: 1605016086649 Received: from pdx1-sub0-mail-a1.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a1.g.dreamhost.com (Postfix) with ESMTP id BECDE7EFAC; Tue, 10 Nov 2020 05:48:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=invergo.net; h=references :from:to:cc:subject:in-reply-to:date:message-id:mime-version :content-type; s=invergo.net; bh=NcrEqIJNvA7o3AgqvQDIkrpAiR4=; b= uF2flmorB/avylqE9+W1FDdrKsgBJiEHS8gDB3oG2tACu+MmdxuT7t7/Qe2c1yeS 4tcGE24t8FbNj/MmOFm4Z5f3pyJXz6hQTWOSLPlg/azWBRpzgAiszxbAQ3QwgLtr f6CyPffycdr90VNYqCzPK8tGEHtE1mvUXeXUdChW+Nk= Received: from localhost (cpc88612-newt36-2-0-cust550.19-3.cable.virginm.net [86.22.2.39]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: brandon@invergo.net) by pdx1-sub0-mail-a1.g.dreamhost.com (Postfix) with ESMTPSA id 1AFD77EFA9; Tue, 10 Nov 2020 05:48:04 -0800 (PST) References: <97710b78-e398-1b28-92a5-07c5d46c152f@googlemail.com> User-agent: mu4e 1.4.10; emacs 27.1 X-DH-BACKEND: pdx1-sub0-mail-a1 From: Brandon Invergo To: Andreas Sommer Cc: bug-datamash@gnu.org Subject: Re: Basic calculation mistakes (e.g. mean/median) In-reply-to: <97710b78-e398-1b28-92a5-07c5d46c152f@googlemail.com> Date: Tue, 10 Nov 2020 13:48:02 +0000 Message-ID: <87pn4luxv1.fsf@invergo.net> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=23.83.212.16; envelope-from=brandon@invergo.net; helo=beige.elm.relay.mailchannels.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/10 08:48:07 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: bug-datamash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Questions, Discussions and bug reports for GNU Datamash" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Nov 2020 13:48:13 -0000 Andreas Sommer writes: > $ seq 1 3 | datamash -H mean 1 median 1 > mean(1) median(1) > 2.5 2.5 > > $ seq 1 3 | datamash -R 5 -H mean 1 median 1 > mean(1) median(1) > 2.50000 2.50000 > > --- > > $ seq 1 4 | datamash -H -R 2 mean 1 median 1 > mean(1) median(1) > 3.00 3.00 > > Until that gets fixed, it means I can't trust the tool :( All of those results are correct. The -H option is synonymous with --header-in and --header-out, so the first row (containing the value 1) is being treated as a header row not a data row. -- -brandon From MAILER-DAEMON Tue Nov 10 09:17:07 2020 Received: from list by lists.gnu.org with archive (Exim 4.90_1) id 1kcUSI-0000T3-TG for mharc-bug-datamash@gnu.org; Tue, 10 Nov 2020 09:17:06 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:60000) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kcUSH-0000Pi-0Z for bug-datamash@gnu.org; Tue, 10 Nov 2020 09:17:05 -0500 Received: from mail-ot1-x330.google.com ([2607:f8b0:4864:20::330]:33633) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kcUSE-0000fU-K4 for bug-datamash@gnu.org; Tue, 10 Nov 2020 09:17:04 -0500 Received: by mail-ot1-x330.google.com with SMTP id i18so12636732ots.0 for ; Tue, 10 Nov 2020 06:17:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=4V5BSzjd88Rxp6lLMoRwkGam+aYq+cL/jDn6xwSR8xE=; b=hBWr51Vyj5dVIcwxxKLRsvojFjX1wfkQlUT+3QTj3SVdMovbRsHLbhyhaBVwOPjPU2 xCLlQ74vhnaPZR44fDV6YTs+/Bf/KHJKOm4Q5LY6I6ErOClXbR81TIIZgxIzZbu0TMio GNmyt4sZ+STDbTOBrZWBAgIdb92me4TCuX6PrNqWSQ3cPPaB6Hzq22jpBaXFYs6s6crk ZbJSWlab96w6ZY5VyWmeqwjU6JZOECBANpKUjqb4WYTZBIXMCymM3/c0Ap5PSqh2DQR4 /pK5PByO6qTbA/MGmV+z7EL5I7gWsp7LcbHbiaLZqU5wTQfwhTlXkGhhi/iSNnpnvXy1 b60A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=4V5BSzjd88Rxp6lLMoRwkGam+aYq+cL/jDn6xwSR8xE=; b=DHVdQZ99cVoOjM1dXmkPOlF+6iAoVZ4S4hBJJf1dcWl1MrZ/X9SCYYK3iM/nw6M86b xZe+XREaNKXIJIocp9UMkQrhuDKrwOhGbvq8erxpdaFvtMAdJ+ECMmtEfzVI9OVAcL8h 6+3S4k6t61DGpEAO8a77LljoqB8t3vcfaVpgkhxWXwMjrmf7I4kspayAptEWubOEBApp 0KIV/6fZ23NA4MukFBA96KvYBzIkd3OaGVhL6a3WWsetF6GjUnr0J1c7ssRKmZmgb7Sd WTM9Zd/CMWIaMDg/8iusoz3Q0RXaPIA4pavujdli1UuuJaBNl5h5XLHhLggUYaBLvjnE UHww== X-Gm-Message-State: AOAM5339CqUqA/2VGwgWAYKvCdBqkhvQH8simjiSscVbXxoVOEeQ0QNN 0cs8diTzToqlUCfw2NiAtY73TGkmi+nq84owvoU= X-Google-Smtp-Source: ABdhPJw9IjbFur2py1mfHCX50Q8D53WrSQsDEmIPr2ycuh7QUiTdzLr9aL/qchD2unmJW6vOoNMMK4dzYDo6+mm/laU= X-Received: by 2002:a9d:22a9:: with SMTP id y38mr13458291ota.122.1605017821069; Tue, 10 Nov 2020 06:17:01 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Shawn Wagner Date: Tue, 10 Nov 2020 06:16:48 -0800 Message-ID: Subject: Re: "Segmentation fault" when input contains embedded NUL characters To: cronos586@gmail.com Cc: bug-datamash@gnu.org Content-Type: multipart/alternative; boundary="0000000000001547a205b3c155b9" Received-SPF: pass client-ip=2607:f8b0:4864:20::330; envelope-from=shawnw.mobile@gmail.com; helo=mail-ot1-x330.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: bug-datamash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Questions, Discussions and bug reports for GNU Datamash" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Nov 2020 14:17:05 -0000 --0000000000001547a205b3c155b9 Content-Type: text/plain; charset="UTF-8" Assaf hasn't replied to anything I've sent out to the mailing list since April; I've been thinking about making another effort to reach out to him and if I can't get a response, maybe talk to the GNU folk about what's involved in taking up maintainership. On Tue, Nov 10, 2020 at 5:28 AM Catalin Patulea wrote: > Hello, > > $ datamash --version > datamash (GNU datamash) 1.6 > > $ dd if=/dev/zero bs=100 count=1 | datamash countunique 1 > 1+0 records in > 1+0 records out > 100 bytes copied, 0.000125612 s, 796 kB/s > Segmentation fault > > backtrace: > > (gdb) bt > #0 0x000055555555c95c in field_op_get_string_ptrs (op=0x55555557a5f0, > sort_case_sensitive=sort_case_sensitive@entry=true, sort=true) > at src/field-ops.c:278 > #1 0x000055555555d194 in count_unique_values (op=, > case_sensitive=true) at src/field-ops.c:640 > #2 0x000055555555d5ec in field_op_summarize (op=0x55555557a5f0) at > src/field-ops.c:963 > #3 0x000055555555f5cb in summarize_field_ops () at src/datamash.c:539 > #4 0x000055555555f88a in process_group (line=0x7fffffffe340) at > src/datamash.c:589 > #5 0x000055555555fab7 in process_file () at src/datamash.c:651 > #6 0x000055555555786b in main (argc=, argv=0x7fffffffe5c8) > at src/datamash.c:1291 > (gdb) fra 1 > #1 0x000055555555d194 in count_unique_values (op=, > case_sensitive=true) at src/field-ops.c:640 > 640 in src/field-ops.c > (gdb) fra 0 > #0 0x000055555555c95c in field_op_get_string_ptrs (op=0x55555557a5f0, > sort_case_sensitive=sort_case_sensitive@entry=true, sort=true) > at src/field-ops.c:278 > 278 in src/field-ops.c > > Simply, field_op_get_string_ptrs, and probably datamash in general, > assumes input will not contain embedded NULs: > https://github.com/agordon/datamash/blob/v1.6/src/field-ops.c#L279 > > For my application, the embedded NULs are an accident, and I can resolve > that and resume using datamash. datamash does not need to support inputs > with embedded NULs. But it should not crash on such inputs, either. Perhaps > output a message warning the user that such inputs are not supported. > > I was considering writing a patch. Is the github repository actively > watched for pull requests? I noticed many patches on the mailing list > awaiting review. > > Thanks for a great tool! > > Catalin > --0000000000001547a205b3c155b9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Assaf hasn't replied to anything I've sent out to = the mailing list since April; I've been thinking about making another e= ffort to reach out to him and if I can't get a response, maybe talk to = the GNU folk about what's involved in taking up maintainership.
On Tue, = Nov 10, 2020 at 5:28 AM Catalin Patulea <cat@vv.carleton.ca> wrote:
Hello,

$ datam= ash =C2=A0--version
datamash (GNU datamash) 1.6

=
$ dd if=3D/dev/zero bs=3D100 count=3D1 | datamash countunique 1
1+0= records in
1+0 records out
100 bytes copied, 0.000125612 s, 796 kB/s=
Segmentation fault

backtrace:
(gdb) bt
#0 =C2=A00x000055555555c95c in field_op_get_string= _ptrs (op=3D0x55555557a5f0, sort_case_sensitive=3Dsort_case_sensitive@entry= =3Dtrue, sort=3Dtrue)
=C2=A0 =C2=A0 at src/field-ops.c:278
#1 =C2=A00= x000055555555d194 in count_unique_values (op=3D<optimized out>, case_= sensitive=3Dtrue) at src/field-ops.c:640
#2 =C2=A00x000055555555d5ec in = field_op_summarize (op=3D0x55555557a5f0) at src/field-ops.c:963
#3 =C2= =A00x000055555555f5cb in summarize_field_ops () at src/datamash.c:539
#4= =C2=A00x000055555555f88a in process_group (line=3D0x7fffffffe340) at src/d= atamash.c:589
#5 =C2=A00x000055555555fab7 in process_file () at src/data= mash.c:651
#6 =C2=A00x000055555555786b in main (argc=3D<optimized out= >, argv=3D0x7fffffffe5c8) at src/datamash.c:1291
(gdb) fra 1
#1 = =C2=A00x000055555555d194 in count_unique_values (op=3D<optimized out>= , case_sensitive=3Dtrue) at src/field-ops.c:640
640 in src/field-ops.c(gdb) fra 0
#0 =C2=A00x000055555555c95c in field_op_get_string_ptrs (o= p=3D0x55555557a5f0, sort_case_sensitive=3Dsort_case_sensitive@entry=3Dtrue,= sort=3Dtrue)
=C2=A0 =C2=A0 at src/field-ops.c:278
278 in src/field-o= ps.c

Simply,=C2=A0field_op_get_string_ptrs, an= d probably datamash in general, assumes input will not contain embedded NUL= s:

For my applicat= ion, the embedded NULs are an accident, and I can resolve that and resume u= sing datamash.=C2=A0datamash does not need to support inputs with embedded = NULs. But it should not crash on such inputs, either. Perhaps output a mess= age warning the user that such inputs are not supported.

I was considering writing a patch. Is the github repository actively= watched for pull requests? I noticed many patches on the mailing list awai= ting review.

Thanks for a great tool!
Catalin
--0000000000001547a205b3c155b9-- From MAILER-DAEMON Wed Nov 11 02:07:54 2020 Received: from list by lists.gnu.org with archive (Exim 4.90_1) id 1kckET-0004Sj-UE for mharc-bug-datamash@gnu.org; Wed, 11 Nov 2020 02:07:54 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:42650) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kckEP-0004SP-Tt for bug-datamash@gnu.org; Wed, 11 Nov 2020 02:07:49 -0500 Received: from mail-wm1-x32f.google.com ([2a00:1450:4864:20::32f]:51084) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kckEO-0001Wy-1Q for bug-datamash@gnu.org; Wed, 11 Nov 2020 02:07:49 -0500 Received: by mail-wm1-x32f.google.com with SMTP id h2so1286800wmm.0 for ; Tue, 10 Nov 2020 23:07:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=to:cc:references:from:subject:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=61Np415ITybtGZOHzMusqYk0PdJsp85ZslhKlojR2A8=; b=rwx+KMrC6YI2jxuKxTFFG3yd0V1FUb4fJSkNrG1ppheU73s9LlDgvvGjsbMv3mEElQ BJrDC8LBhXPHMoq7Lmps5IdpkLs/0qQeB+wWA5KvOP2Lkm6r6AFnYAXSZVr0uyCMsaoY HR7myKk75eXQwQfr7VJF1UzrX9ZFA3h8R9tjStVHp4Vm+jsvpuHGfwriHMKlR5EoLIEt 8tcRQDA/8tCO2SMlBqlinjUhCQKVnxs867SBhvaxwpYZ5+GnHTg7bLOexNnmGJgvOn8Q FMnUjpqnVVMXbHr1JFFjWKKvT0/VE8VmFGb4/sWOmnw0wWwYytPeSNRxquF6SbP1u2s9 7j5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:subject:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=61Np415ITybtGZOHzMusqYk0PdJsp85ZslhKlojR2A8=; b=lz+bxQoHwYWz+qsUCb3HmmzA6NP2E4XaWbpRVgspzxNuqA+P/yU0UYXk7mkd/KPTRs KBGZ1ol/3cJKupfKsI9boC0eZ6XN6VyFPpBEoCKHKdrbj1nSB9edO38kWvU3LAc4uT6g +OFNh5USlHNPBKlJFttXQWHV9cB9RjN+UChNqHc+uUNWjPukr7xVumB7pbva/gxGZuK0 6afzhaYshaHXX2e4M2gAb7juWCmUAU0oNONC/i0kT6YwRiUnscGNEh/SM4YlB+HNflbk 4UbrlNZ9nW7yPu1Cr282Xnf1wEt1C77/oyMOnsJeaQm9lbDx1ADMf/L2thxiMbG0rdmQ yUBQ== X-Gm-Message-State: AOAM532z5wwwCkxSmZPmn8FUzy7JP7pjDGQqBymcs8QpSwo+n6TXIfcF 6Tizc/BT2uYNNCuHfUS30UVzdc5IO+VDyQ== X-Google-Smtp-Source: ABdhPJyOqn34/Xid+4zHfv0Cl+AiBkhujcs9CNRKZKN5qxEieAvOf7x/9IgfFnp4KEzex6eyP/oLXA== X-Received: by 2002:a7b:c845:: with SMTP id c5mr2197148wml.135.1605078465854; Tue, 10 Nov 2020 23:07:45 -0800 (PST) Received: from asommer-mac.local (p200300c137015300d41abc9f118c398a.dip0.t-ipconnect.de. [2003:c1:3701:5300:d41a:bc9f:118c:398a]) by smtp.googlemail.com with ESMTPSA id v12sm1339359wro.72.2020.11.10.23.07.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 10 Nov 2020 23:07:45 -0800 (PST) To: Brandon Invergo Cc: bug-datamash@gnu.org References: <97710b78-e398-1b28-92a5-07c5d46c152f@googlemail.com> <87pn4luxv1.fsf@invergo.net> From: Andreas Sommer Subject: Re: Basic calculation mistakes (e.g. mean/median) Message-ID: <293371f7-ef57-df16-bc1b-68af5747d4e3@googlemail.com> Date: Wed, 11 Nov 2020 08:07:44 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 MIME-Version: 1.0 In-Reply-To: <87pn4luxv1.fsf@invergo.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2a00:1450:4864:20::32f; envelope-from=andreas.sommer87@googlemail.com; helo=mail-wm1-x32f.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: bug-datamash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Questions, Discussions and bug reports for GNU Datamash" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Nov 2020 07:07:50 -0000 On 2020-11-10 14:48, Brandon Invergo wrote: > > Andreas Sommer writes: > >> $ seq 1 3 | datamash -H mean 1 median 1 >> mean(1) median(1) >> 2.5 2.5 >> >> $ seq 1 3 | datamash -R 5 -H mean 1 median 1 >> mean(1) median(1) >> 2.50000 2.50000 >> >> --- >> >> $ seq 1 4 | datamash -H -R 2 mean 1 median 1 >> mean(1) median(1) >> 3.00 3.00 >> >> Until that gets fixed, it means I can't trust the tool :( > > All of those results are correct. The -H option is synonymous with > --header-in and --header-out, so the first row (containing the value 1) > is being treated as a header row not a data row. > Well that explains a lot. I have strongly expected that `-H` would print headers without side effects. Hiding `--header-out` in a long option seems strange. Also other Unix-y tools often use uppercase as negation, e.g. `zfs list -H` = without printing column headers. Anyway, I have the solution now and the developers can take this as wish to disambiguate the short options. I can guess that you don't want to change this parameter, but the documentation should clearly hint at it. The website (e.g. https://www.gnu.org/software/datamash/examples/) typically first shows an example `seq ... | datamash [without -H]` and in the next paragraph `