bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Wget cannot get same page as browser


From: Giuseppe Scrivano
Subject: Re: [Bug-wget] Wget cannot get same page as browser
Date: Wed, 22 Jun 2011 16:21:15 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)

It seems the server is looking at the user agent in the HTTP request.

Spoofing the User-Agent header seems to do the trick:

wget --user-agent="Mozilla/5.0 (X11; Linux i686; rv:2.0.1) Gecko/20110503 
IceCat/4.0.1" \
    http://www.amazon.com/Vocabulary-School-Student-Norman-Levine/dp/1567651151 
\
    -O 1567651151.html

You can find more information about --user-agent in the wget texinfo
manual (http://xkcd.com/912/). 

Cheers,
Giuseppe



Gary Yang <address@hidden> writes:

> I use wget to retrieve links. However, the page I got with “wget” is
> different than the page I got from the browser. To debug it, I copied
> and pasted the link below to the browser’s address bar. Then, I view
> the HTML source code from browser. I searched the keyword,
> offer-listing. I found nine of them.
>
> Below is one of nine keyword offer-listing I found:
>
> <div class="mbcOlpLink"><a class="buyAction" 
> href="/gp/offer-listing/1567651151/ref=dp_olp_all_mbc?
>
> Below is the URL:
> http://www.amazon.com/Vocabulary-School-Student-Norman-Levine/dp/1567651151
>
> The command below saved result to the file, “1567651151”. But, I
> cannot find any “offer-listing” in it. The page got by wget is
> different than the browser with the same URL. What was wrong?
>
> wget 
> http://www.amazon.com/Vocabulary-School-Student-Norman-Levine/dp/1567651151
>
>
> Thanks,
>
> Gary



reply via email to

[Prev in Thread] Current Thread [Next in Thread]