[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Bug or missing option?

From: Micah Cowan
Subject: Re: [Bug-wget] Bug or missing option?
Date: Sun, 28 Jun 2009 18:45:17 -0700
User-agent: Thunderbird (X11/20090608)

Hash: SHA1

Michael Kerkhoff wrote:
> Hi,
> i am using Wget 1.11.4 under Windows.
> If i download
> http://plus7.arte.tv/de/detailPage/1697660,CmC=2714304,scheduleId=2676378.html
> (I found this page in
> http://plus7.arte.tv/de/streaming-home/1697480.html
> by clicking on a video)
> with wget
> with the Options:
> -k -v
> or with the Options:
> -k -v --output-document="test.html"
> i become instead of the text
> "Karambolage
> Sonntag 28 Juni 2009 um 20.00
> Dauer: 12min
> Karambolage, Magazin, Sonntag, 28.06. "
> only the text
> "-->"
> (in the html file stands "<div id="detailContent">--><!--[if 
> !IE]><![endif]--></div>")
> Is this a bug in wget or should i use an additional option for getting the 
> complete www side without the missing text?

If you download the page from your browser and then open it, you'll see
exactly the same thing (though it's a little harder to see, since
without wget's nice conversion you won't get the images, either).

The source of the trouble is that that "-->" gets replaced in a "live"
viewing by JavaScript code that runs. Wget can't convert links that
occur in JavaScript, so it doesn't find part of the code necessary for
it to run the way the "live" one does it. Most of those links come in
the form of <script src="..."> links, which Wget _can_ deal with; but
specifically, there's code in a function named "startCarousel" that
generates a (relative) URL dynamically, so it doesn't point at the right
place when it's evaluated from a local copy. Adding an appropriate <base
/> tag didn't seem to fix it for me :\

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
Maintainer of GNU Wget and GNU Teseq
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


reply via email to

[Prev in Thread] Current Thread [Next in Thread]