[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Fwd: Re: [Bug-wget] Fwd: Trying to download HTML from Google's Cache. P

From: Micah Cowan
Subject: [Fwd: Re: [Bug-wget] Fwd: Trying to download HTML from Google's Cache. Pls help]
Date: Wed, 12 Nov 2008 10:32:34 -0800
User-agent: Thunderbird (X11/20080925)

-------- Original Message --------
Subject: Re: [Bug-wget] Fwd: Trying to download HTML from Google's
Cache. Pls help
Date: Wed, 12 Nov 2008 10:00:34 -0800 (PST)
From: Ben Smith <address@hidden>
To: Micah Cowan <address@hidden>
References: <address@hidden>

Adding -UFirefox allows the download.  So you should first wget
-UFirefox all the listed results pages from Google:

etc., up to start=570 (since there are 577 results).

Then grep each of the results files to find the line with links to the
all cached pages.  You can pipe that output into sed, which you can use
to remove everything but the links to the cached pages (replace the info
before, after, and between the cache links with a space).  Then simply
pipe that to wget -UFirefox, and you should get all your files.

----- Original Message ----
> From: Micah Cowan <address@hidden>
> To: Ben Smith <address@hidden>
> Cc: address@hidden
> Sent: Tuesday, November 11, 2008 3:27:05 PM
> Subject: Re: [Bug-wget] Fwd: Trying to download HTML from Google's Cache. Pls 
> help
> Ben Smith wrote:
>> Subject: Re: [Bug-wget] Re: Bug-wget Digest, Vol 1, Issue 10
>>> When replying, please edit your Subject line so it is more specific
>>>  than "Re: Contents of Bug-wget digest..."
> It's helpful if you adhere to this guideline; otherwise it's hard to
> follow threads. (I've fixed the subject in my reply.)
>> It would be theoretically possible by using grep and sed to strip out
>> the links to the cached files and piping that to wget.  However,
>> Google appears to block access to results pages and cached pages via
>> wget.  I tried to download several using wget and got a 403 Forbidden
>> response.
> http://wget.addictivecode.org/FrequentlyAskedQuestions#not-downloading
> should be helpful for such problems (using -U is the most applicable
> suggestion, but you may also run into the others). Please also consider
> adding --limit-rate or --wait.

Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq

reply via email to

[Prev in Thread] Current Thread [Next in Thread]