[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Wget cannot get same page as browser
From: |
Giuseppe Scrivano |
Subject: |
Re: [Bug-wget] Wget cannot get same page as browser |
Date: |
Wed, 22 Jun 2011 16:21:15 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux) |
It seems the server is looking at the user agent in the HTTP request.
Spoofing the User-Agent header seems to do the trick:
wget --user-agent="Mozilla/5.0 (X11; Linux i686; rv:2.0.1) Gecko/20110503
IceCat/4.0.1" \
http://www.amazon.com/Vocabulary-School-Student-Norman-Levine/dp/1567651151
\
-O 1567651151.html
You can find more information about --user-agent in the wget texinfo
manual (http://xkcd.com/912/).
Cheers,
Giuseppe
Gary Yang <address@hidden> writes:
> I use wget to retrieve links. However, the page I got with “wget” is
> different than the page I got from the browser. To debug it, I copied
> and pasted the link below to the browser’s address bar. Then, I view
> the HTML source code from browser. I searched the keyword,
> offer-listing. I found nine of them.
>
> Below is one of nine keyword offer-listing I found:
>
> <div class="mbcOlpLink"><a class="buyAction"
> href="/gp/offer-listing/1567651151/ref=dp_olp_all_mbc?
>
> Below is the URL:
> http://www.amazon.com/Vocabulary-School-Student-Norman-Levine/dp/1567651151
>
> The command below saved result to the file, “1567651151”. But, I
> cannot find any “offer-listing” in it. The page got by wget is
> different than the browser with the same URL. What was wrong?
>
> wget
> http://www.amazon.com/Vocabulary-School-Student-Norman-Levine/dp/1567651151
>
>
> Thanks,
>
> Gary