[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Downloading a web page's html code: Wget vs. Chrome's "Save Page WE"
From: |
michel . kempeneers |
Subject: |
RE: Downloading a web page's html code: Wget vs. Chrome's "Save Page WE" extension |
Date: |
Tue, 7 Jan 2020 01:28:17 +0100 (CET) |
Hi Taylor,
thx for your feedback.
-eBay:
you may have a point about eBay, I don't know.
But even in that case, it doesn't make sense to me.
- UserAgent:
I don't really know what this means, but yes, why not?
According to
[ https://www.whatsmyua.info/ | https://www.whatsmyua.info/ ]
this would be my UA:
Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/75.0.3770.80 Safari/537.36
But when I add the following U-A key
--user-agent="Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36"
nothing much changes.
The Wget file gets a few more lines (around 2330, which is the same in the
heavy Chrome version --- I forgot to point that out in my previous message)
- Java:
I'm not a developer, alas. At best I can trick my way around a bit with batch
files.
So I'm bound hands & feet to whatever CL freeware is out there.
So no real improvement, I'm afraid...
M.
Van: "Taylor" <address@hidden>
Aan: "Michel Kempeneers" <address@hidden>
Cc: "bug-wget" <address@hidden>
Verzonden: Vrijdag 3 januari 2020 14:16:05
Onderwerp: Re: Downloading a web page's html code: Wget vs. Chrome's "Save Page
WE" extension
Wget to download the html code of this eBay page
Downloading a web page's html code: Wget vs. Chrome's "Save Page WE" extension
Is there a reason why Wget only seems to find a minimal version of the code
(or maybe the correct question is: why the html file which is
saved by that extension, is so much taller?)
Some ideas:
- it is probably __NOT__ WGET that steals ca 90% of content
- eBay maybe explicitly supports Chrome ("Best viewed in any
browser as long as it is the latest version of Chrome or IE")
- user agent (try to set it to Chrome's value)
- JawaScript (little to do, the extension can do
much JawaScript magic that WGET obviously can't)