bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] [bug #50935] TEXTHTML not properly set if page is already dow


From: Tim Ruehsen
Subject: [Bug-wget] [bug #50935] TEXTHTML not properly set if page is already downloaded
Date: Sat, 13 May 2017 11:24:19 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0

Follow-up Comment #5, bug #50935 (project wget):

> As for making a head request, how expensive is that?

Not expensive, response just contains the HTTP headers, the body/payload is
empty. And with -p this would just be one request/response cycle.

> Is using a heuristic like if it begins with "<!DOCTYPE html" or an html tag
too messy?

You can find a description on what to do here:
https://www.w3.org/TR/2011/WD-html5-20110113/parsing.html#determining-the-character-encoding

Also see two related Wget2 issues that I opened due to your report here:
https://gitlab.com/gnuwget/wget2/issues/209
https://gitlab.com/gnuwget/wget2/issues/210

The 'xattr' feature would/could give us the mime type of a downloaded
document, but is not supported on all file systems.

> Anyways, is wget2 ready for daily use at all? Are there stable releases?

No releases yet, but pretty stable (automated CI testing on Debian, CentOS,
Fedora, OSX, Solaris, manual testing on Windows).
Though not all features/option from Wget1.x are implemented yet (but Wget2
already has many more features)

We badly need any reports from testers, so if you can afford the time give it
a try and open as many issues as you like on
https://gitlab.com/gnuwget/wget2/issues.

There is currently pretty much activity on fixing issues (alone three GSOC
students performing very well).


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?50935>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]