bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

GNU Parallel Bug Reports --dry-run prints non-ASCII characters as garbag


From: Zhiming Wang
Subject: GNU Parallel Bug Reports --dry-run prints non-ASCII characters as garbage
Date: Thu, 23 Mar 2017 22:00:12 -0400

Example:

    $ sw_vers  # Showing a session on macOS here, but also reproducible on Linux
    ProductName:        Mac OS X
    ProductVersion:     10.12.3
    BuildVersion:       16D32
    $ echo $LC_ALL $LANG
    en_US.UTF-8 en_US.UTF-8
    $ parallel --version
    GNU parallel 20170322
    Copyright (C) 2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017
    Ole Tange and Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    GNU parallel comes with no warranty.

    Web site: http://www.gnu.org/software/parallel

    When using programs that use GNU Parallel to process data for publication
    please cite as described in 'parallel --citation'.
    $ parallel --dry-run : ::: Ⅱ  # That argument is U+2161 ROMAN NUMERAL TWO, 
\xe2\x85\xa1 in UTF-8
    : \�\�\�
    $ parallel --dry-run : ::: Ⅱ | xxd  # Inspecting actual bytes printed
    00000000: 3a20 5ce2 5c85 5ca1 0a                   : \.\.\..
    $ echo : Ⅱ  # Expected output
    : Ⅱ
    $ echo : Ⅱ | xxd  # Expected bytes printed
    00000000: 3a20 e285 a10a                           : ....

As you can see, the non-ASCII UTF-8 character is escaped in the output, and in
an unintelligible form: a backslash is inserted before each byte, with the byte
intact (instead of being replaced with its hex or octal representation). I
think it is arguable that UTF-8 should be printed as is; if that's undesirable,
at least actual octal or hex escape sequences convery more information than raw
bytes which are usually displayed as boxes or question marks or a combination
thereof.

Of course, this is not limited to --dry-run; it is present in any situation
where the executed commands are printed, e.g., --bar.

By the way, I did check the man page and the tutorial to see if there's an
option to print non-ASCII bytes unescaped. I didn't find one, and I apologize
if I missed it.

Best,
Zhiming

Attachment: signature.asc
Description: Message signed with OpenPGP


reply via email to

[Prev in Thread] Current Thread [Next in Thread]