[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] how to make wget always create REALLY complete, REALLY offlin
From: |
Marco Fioretti |
Subject: |
[Bug-wget] how to make wget always create REALLY complete, REALLY offline archives? |
Date: |
Wed, 02 Oct 2019 08:30:57 +0000 |
Greetings,
I define making a "really complete, really offline archive" of a web page as
downloading that page AND ALL the pieces a browser needs to display it as it
normally appears from the internet, even on a computer that is totally offline:
images, css, javascript, avatar icons, everything... no matter where those
other files originally were on the internet.
I have noticed several times in the past that the wget options supposed to do
this (--span-hosts, -mirror, -k, whatever) do not always work as advertised.
The latter case is the one I just documented here:
https://github.com/pirate/ArchiveBox/issues/276
(https://github.com/pirate/ArchiveBox/issues/276)
where, for example, wget does make local copies of javascript files linked from
the HTML page, but does NOT modifies the HTML to point to them, instead of
their original servers. But it seems to me archivebox uses wget with the same
options listed in the countless tutorials about "how to make offline copies
with wget", so either those tutorials are all wrong, or there are
bugs/intrinsical limits inside wget itself.
What do you think is happening in that case? And, in general, how do you use
wget to ALWAYS make a "really complete, really offline archive" of a web page?
Thanks,
Marco
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Bug-wget] how to make wget always create REALLY complete, REALLY offline archives?,
Marco Fioretti <=