Hello,
$ datamash --version
datamash (GNU datamash) 1.6
$ dd if=/dev/zero bs=100 count=1 | datamash countunique 1
1+0 records in
1+0 records out
100 bytes copied, 0.000125612 s, 796 kB/s
Segmentation fault
backtrace:
(gdb) bt
#0 0x000055555555c95c in field_op_get_string_ptrs (op=0x55555557a5f0, sort_case_sensitive=sort_case_sensitive@entry=true, sort=true)
at src/field-ops.c:278
#1 0x000055555555d194 in count_unique_values (op=<optimized out>, case_sensitive=true) at src/field-ops.c:640
#2 0x000055555555d5ec in field_op_summarize (op=0x55555557a5f0) at src/field-ops.c:963
#3 0x000055555555f5cb in summarize_field_ops () at src/datamash.c:539
#4 0x000055555555f88a in process_group (line=0x7fffffffe340) at src/datamash.c:589
#5 0x000055555555fab7 in process_file () at src/datamash.c:651
#6 0x000055555555786b in main (argc=<optimized out>, argv=0x7fffffffe5c8) at src/datamash.c:1291
(gdb) fra 1
#1 0x000055555555d194 in count_unique_values (op=<optimized out>, case_sensitive=true) at src/field-ops.c:640
640 in src/field-ops.c
(gdb) fra 0
#0 0x000055555555c95c in field_op_get_string_ptrs (op=0x55555557a5f0, sort_case_sensitive=sort_case_sensitive@entry=true, sort=true)
at src/field-ops.c:278
278 in src/field-ops.c
Simply, field_op_get_string_ptrs, and probably datamash in general, assumes input will not contain embedded NULs:
For my application, the embedded NULs are an accident, and I can resolve that and resume using datamash. datamash does not need to support inputs with embedded NULs. But it should not crash on such inputs, either. Perhaps output a message warning the user that such inputs are not supported.
I was considering writing a patch. Is the github repository actively watched for pull requests? I noticed many patches on the mailing list awaiting review.
Thanks for a great tool!
Catalin