bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] trouble with URL vs local file names


From: Andre Majorel
Subject: Re: [Bug-wget] trouble with URL vs local file names
Date: Thu, 18 Feb 2010 23:47:20 +0100
User-agent: Mutt/1.5.20 (2009-06-14)

On 2010-02-18 10:54 +0100, Tobias Senz wrote:

> wget -p -k www.google.de
> 
> creates locally the folders / files
> ./www.google.de
> ./www.google.de/index.html
> ./www.google.de/logos
> ./www.google.de/logos/olympics10-skeleton-hp.png
> 
> I'd prefer if that were ONLY files
> ./address@hidden
> ./address@hidden@olympics10-skeleton-hp.png
> 
> Or a URL like this
> http://www.google.de/csi?v=3&s=webhp&action=&e=17259,17311,22713,23[...]
> 
> locally as file name
> address@hidden@address@hidden@address@hidden@action@@address@hidden,17311,22713,23386,23[...]

You could download with Wget as usual and then make a flattened
copy of the tree Wget created with something along these lines
(not tested) :

  find tree -type f -print0 | perl -0ne 'chomp; $pathname = $_;
    $newpathname = $pathname; $newpathname =~ s{[&/address@hidden;
    symlink "$pathname", "$newpathname" or die "$newpathname: $!";'

The catch, of course, is that any links whose target contains
slashes will be broken. To fix that you would have to either
do the name translation from inside Wget (so -k knows about it)
or find a third party program to fix the links for you
afterwards.

Not sure how well symlink() works on Windows -- you might have to
do a straight copy instead.

> And i'm also having trouble with the way files are named locally, the
> "--restrict-file-names=" thingie.
> Is there any way to also block "%" "&" "=" (and possibly others i can't
> think of right now - "+" maybe?) locally as these seem to prevent
> further processing in batch scripts? As mentioned above i'm more of a
> fan of "@" for placeholders. Rarely (never?) used in http, and does not
> seem to make any trouble when scripting.
> 
> I'm on Windows with Cygwin and mixing of both batch (cmd.exe) and shell
> (sh, tcsh ...) scripting as well as (Win)DOS and Cygwin utilities might
> happen. In other words, "these are unsafe to me", when filename is
> passed to anything via command line. (Really haven't found any way to
> escape these in some situations. Different type of quotes a-plenty,
> backslashes too, nothing helps.)

Not familiar with cmd and tcsh but sh shouldn't have too many
problems with "%" (as long as it's not the argument of a job
control built-in), "&" (as long as it's quoted or escaped) or
"=" (as long as it's preceded by a command name). What is it
that you cannot get to work with sh ?

-- 
André Majorel http://www.teaser.fr/~amajorel/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]