[Bug-wget] just download HTML content

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] just download HTML content

From:	Richard Baron Penman
Subject:	[Bug-wget] just download HTML content
Date:	Sun, 28 Jun 2009 21:31:52 +1000

hello,

When mirroring a website how do I just download HTML content (whether
static, PHP, ASP, etc) and ignore images, css, js, and everything else?
At first I thought of creating an accept list, but I can't rely on the file
extension because many HTML pages do not include an extension (eg
http://en.wikipedia.org/wiki/Foo)
Then I thought of a reject list, but there are so many different kinds of
non-HTML content.

Is there a way to do this with wget?

thanks, Richard

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-wget] just download HTML content, Richard Baron Penman <=
- Re: [Bug-wget] just download HTML content, Micah Cowan, 2009/06/28
  - Re: [Bug-wget] just download HTML content, Richard Baron Penman, 2009/06/28
    - Re: [Bug-wget] just download HTML content, Micah Cowan, 2009/06/28

Prev by Date: Re: [Bug-wget] Fw: help on wget with IIS forms authentication
Next by Date: Re: [Bug-wget] just download HTML content
Previous by thread: [Bug-wget] Fw: help on wget with IIS forms authentication
Next by thread: Re: [Bug-wget] just download HTML content
Index(es):
- Date
- Thread