bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [PATCH] New option: --rename-output: modify output filena


From: Dagobert Michelsen
Subject: Re: [Bug-wget] [PATCH] New option: --rename-output: modify output filename with perl
Date: Thu, 1 Aug 2013 21:57:40 +0200

Hi Andrew,

Am 01.08.2013 um 21:22 schrieb Andrew Cady <address@hidden>:
> On Thu, Aug 01, 2013 at 10:36:00AM -0700, Micah Cowan wrote:
>> On Wed, Jul 31, 2013 at 06:32:18PM -0400, Andrew Cady wrote:
>>> By that, I assume you mean to execute the option in the shell.  So the
>>> existing usage:
>>> 
>>>  --rename-output=s/x/y/
>>> 
>>> would (almost) become:
>>> 
>>>  --rename-output='perl -lpe "BEGIN{\$|++}" -e s/x/y/'
>> 
>> It COULD, sure, but why on earth would someone choose to use that
>> instead of 'sed s/x/y/'?
> 
> With sed, you still need -u, or else there is a deadlock.  This
> knowledge should be embedded into wget because most people don't have
> it.

You are talking about GNU sed, please keep in mind that wget is portable to
systems without or just a subset of the GNU userland.

>>>  --name-filter=sed   --rename='sed code here'
>>>  --name-filter=perl  --rename='perl code here'
>>>  --name-filter=shell --rename='shell code here'
>>>  --name-filter=pcre  --rename='regex here'
>> 
>> All of the above seems like over-engineering to me, especially just
>> to avoid the occasional backslash... don't forget, the more complexity
>> you introduce, the more bugs as well (and wget already rather suffers
>> from such a scenario).
> 
> Well, I actually went ahead and implemented it already.  I even did
> pcre, although that's not done (/g does not work).
> 
> It's not complex.  The only difference between sed, perl, and shell is
> the command line that is given to execvp.  There's a simple switch() to
> decide which argument list to use.  It's basically just a simple alias
> mechanism.  Here's the relevant code:
> 
>  pipe2_t init_pipe_filter()
>  {
>    pipe2_t res;
>    char *code = opt.rename_output;
>    char *perl[] = { "perl", "perl", "-0lpe", "BEGIN{$|++}", "-e", code, 0 };
>    char *sed[] = { "sed", "sed", "-ue", code, 0 };
>    char *shprog = getshell();
>    char *shell[] = { shprog, shprog, "-c", code, 0 };
> 
>    char **cmd = 0;
>    switch (filter_type) {
>      case perl_filter:
>       cmd = perl, delim = 0;
>       break;
>      case sed_filter:
>       cmd = sed, delim = '\n';
>       break;
>      case shell_filter:
>       cmd = shell, delim = getdelimopt();
>       break;
>      default:
>       error(1, 0, "unpossible!");
>    }
> 
>    int in = -1, out = -1; /* initialize to silence warning */
>    if (open2(cmd, &in, &out, &res.pid) < 0)
>      error(1, errno, "init_name_filter: open2");
>    if ((res.out = fdopen(out, "w")) < 0)
>      error(1, errno, "init_name_filter: fdopen");
>    if ((res.in  = fdopen(in, "r")) < 0)
>      error(1, errno, "init_name_filter: fdopen");
> 
>    return res;
>  }
> 
> It's very simple, but a big win I think.

From a packagers perspective this is a nightmare because this feature
introduces weak dependencies to programs which need to be in PATH.
If PATH includes only tools shipped by the system it may be necessary
to explicitly set the path for each of the tools to have the switches
you use (for example /usr/bin/sed on Solaris does not have -u) or disable
them during configure time when the tools are not available. The exact
locations need to adjustable during configure for each of the tools you
use. PCRE support adds an additional library dependency to wget.

All this added complexity seems highly overengineered for a feature
that is not in the core functionality of the tool and that only a
fraction of the users use. Keep in mind: a good tool is one that does
a single job right.

>  I can use perl without any
> hassle.  Other people can use sed without even knowing about '-u'.
> Anybody who wants anything else can call it through the shell.  And
> platforms without pipe() will eventually get pcre.
> 
>> Sh is already well-tested...
> 
> Different systems have different shells.  When you have to try to escape
> for the system shell, you run into portability problems, and general
> confusion regarding double-escaping.  If you sit in freenode #openssh
> for a while, you can see these problems routinely, resulting from the
> fact that ssh remote commands are executed through the remote system
> shell.
> 
> You cannot impose the need to double-escape on the user in the name of
> avoiding bugs.  That is not avoiding bugs.  That is just making the user
> write the bugs for you!

As a user I would suspect you know your local shell (as opposed to the
funky remote shell on a system far away you rarely use), so the usecases
are slightly different.


Best regards

  -- Dago

-- 
"You don't become great by trying to be great, you become great by wanting to 
do something,
and then doing it so hard that you become great in the process." - xkcd #896




reply via email to

[Prev in Thread] Current Thread [Next in Thread]