[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] WARC File Creation - Scope Issues
From: |
McFate, Mark |
Subject: |
[Bug-wget] WARC File Creation - Scope Issues |
Date: |
Thu, 11 Apr 2013 15:13:57 +0000 |
This is not a 'bug' by any means, but I could find no better place to post this
so please forgive me...
I've used 'wget' for years but am just now discovering the real power it has.
Lately I have upgraded to v1.14 so that I can take advantage of WARC file
creation. But I need to learn a lot more. In particular, I'm having trouble
controlling the scope of the content returned by wget when using the -warc-file
option (or even when not). The -mirror option is nice, but in many
circumstances it returns far too much information, and limiting the return
using the -l option requires trial and error as I am never sure how deep to set
it.
For example, I would like to retrieve the following set of pages as a WARC, but
don't really want anything else from this domain:
https://webarchive.jira.com/wiki/display/wayback/Wayback+Installation+and+Configuration+Guide#WaybackInstallationandConfigurationGuide-URLsandWebApplications.
Is it even possible using wget to capture a complete WARC containing only
this document?
So, I'm looking for guidance that might be pertinent to using wget for WARC
retrieval. Please point me to anything you think might be helpful. Thanks.
Mark A. McFate
Digital Library Applications Developer
Burling Library, Grinnell College
Grinnell, IA 50112-1690
address@hidden<mailto:address@hidden>
- [Bug-wget] WARC File Creation - Scope Issues,
McFate, Mark <=