[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
GNU Parallel Bug Reports --dry-run prints non-ASCII characters as garbag
From: |
Zhiming Wang |
Subject: |
GNU Parallel Bug Reports --dry-run prints non-ASCII characters as garbage |
Date: |
Thu, 23 Mar 2017 22:00:12 -0400 |
Example:
$ sw_vers # Showing a session on macOS here, but also reproducible on Linux
ProductName: Mac OS X
ProductVersion: 10.12.3
BuildVersion: 16D32
$ echo $LC_ALL $LANG
en_US.UTF-8 en_US.UTF-8
$ parallel --version
GNU parallel 20170322
Copyright (C) 2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017
Ole Tange and Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
GNU parallel comes with no warranty.
Web site: http://www.gnu.org/software/parallel
When using programs that use GNU Parallel to process data for publication
please cite as described in 'parallel --citation'.
$ parallel --dry-run : ::: Ⅱ # That argument is U+2161 ROMAN NUMERAL TWO,
\xe2\x85\xa1 in UTF-8
: \�\�\�
$ parallel --dry-run : ::: Ⅱ | xxd # Inspecting actual bytes printed
00000000: 3a20 5ce2 5c85 5ca1 0a : \.\.\..
$ echo : Ⅱ # Expected output
: Ⅱ
$ echo : Ⅱ | xxd # Expected bytes printed
00000000: 3a20 e285 a10a : ....
As you can see, the non-ASCII UTF-8 character is escaped in the output, and in
an unintelligible form: a backslash is inserted before each byte, with the byte
intact (instead of being replaced with its hex or octal representation). I
think it is arguable that UTF-8 should be printed as is; if that's undesirable,
at least actual octal or hex escape sequences convery more information than raw
bytes which are usually displayed as boxes or question marks or a combination
thereof.
Of course, this is not limited to --dry-run; it is present in any situation
where the executed commands are printed, e.g., --bar.
By the way, I did check the man page and the tutorial to see if there's an
option to print non-ASCII bytes unescaped. I didn't find one, and I apologize
if I missed it.
Best,
Zhiming
signature.asc
Description: Message signed with OpenPGP
- GNU Parallel Bug Reports --dry-run prints non-ASCII characters as garbage,
Zhiming Wang <=