[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wget prints out information in unicode characters where ASCII could
Re: wget prints out information in unicode characters where ASCII could suffice
Tue, 24 Mar 2020 12:07:34 +0100
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0
it's likely your locale as the apostrophes are part of localization.
LC_ALL=en_US.utf8 wget ...
LC_ALL=en_US.utf8 wget ...
LC_ALL=C wget ...
So for scripts make sure that you use a defined locale when you are
going to parse wget's output. This is also true for most localized
executables out there.
But I see your point, and Wget2 won't use this kind of localization. But
for Wget 1.x we don't want to change the behavior because that easily
breaks existing scripts. And that would be *very* bad.
On 21.03.20 13:40, ah wrote:
> When wget gets a page successfully (consider for example: wget
> www.gnu.org), it reports something like this:
> ...output omitted...
> 2020-03-21 14:00:41 (1.43 MB/s) - ‘index.html’ saved [1114171/1114171]
> Please notice the two apostrophes enclosing the fetched filename are in
> unicode (U+2018 and U+2019, I guess?) whereas the ASCII apostrophe
> character ' is completely sufficient.
> What inplications does that have, except from polluting the terminal?
> For one, when a user tries to copy+paste the fetched filename (e.g.
> index.html) from wget's output, the apostrophes are either copied into
> the buffer and that messes up further commands or the apostrophes are
> not copied and the user needs to add apostrophes manually when pasting),
> e.g. try
> ls ‘index.html’
> it fails with
> ls: cannot access '‘index.html’': No such file or directory
> However, the single (ASCII) quotes are very important for a lot of users
> in the case where filenames contain spaces or other characters that the
> shell does not like and need escaping. So it's a good idea to have them,
> but who would have thought that the devil is idle and decided to replace
> all apostrophes in GNU software with unicode!
> So, ideally (AFAIC) wget, on successful completion, should have printed
> 2020-03-21 14:00:41 (1.43 MB/s) - 'index.html' saved [1114171/1114171]
> (notice the single ASCII apostrophe for opening AND closing the filename)
> and then the user could just copy that string and the apostrophes for
> further copy+paste.
> I understand that there is danger in copy+paste-ing information from a
> program's output. But this is not relevant here as it is none of wget's
> business to deter users from copy-pasting its output. If that's a real
> concern then consider printing the filename in hex or as an image or
> call the copy-paste police and snitch the user when he/she attempts to
> use it.
> But copy-paste is not the real issue here. There is another issue, far
> more important: shell scripts processing wget's output.
> That brings us to yet another case-in-point where this behaviour of wget
> makes our lives more difficult: using wget's output in a shell script in
> order to find out the name of the fetched filed. Now, all of a sudden
> our shell scripts must deal with unicode characters too. This is a no-go
> scenario in many industrial places. A shell script may be classified as
> sub-standard if it has to deal with unicode because of the cans of worms
> that opens.
> In conclusion, my opinion is that this bug is one of the most unpleasant
> and dangerous bugs in wget as it pollutes the terminal with UTF
> characters when ASCII characters are more than enough to convey the
> information to the user. It opens not one but a tonne of cans of worms
> and can have serious side effects to script processing in industry.
> I would therefore URGE you to reconsider the use of unicode characters
> for mere aesthetic reasons especially when ASCII characters can be used
> for the same purpose. Aesthetics is a very subjective criterion as you
> There must be serious reasons to give the KISS principle the capital
> punishment. Is this what GNU come to?
> On a parallel note, please accept my congratulations for the very good,
> otherwise, software wget is. I am using it daily and I thank you (and I
> too have contributed to public domain software and with GNU licencing,
> spreading the karma of GNU)
Description: OpenPGP digital signature