Hi, in src/warc.c three methods are provided to generate uuids: libuuid, uuid functions from libc, and a fallback method. At least OpenBSD, FreeBSD and NetBSD provide those uuid functions in their li
Hi, I think that's correct: Wget doesn't write the subfield length in the "extra field" section of the header. After the subfield ID "sl" it should write the length LEN (see RFC 1952 [1]), but it doe
Hi, as David Ryskalczyk stated, just two printf format specifiers might cause the havoc. I think, there is not need to use wgint instead of off_t. @Guiseppe: please apply the appended patches (maybe
Hello Gis, just out of curiosity. What about setting the compiler option -D _FILE_OFFSET_BITS=64 on these systems ? Since off_t is used in many places for file length, there should be many more probl
For what it's worth, I confirmed that Heritrix (Internet Archive's crawling tool) produces WARC files without the angle brackets for WARC-Target-URI. Best regards, William Prescott
Hello Mark, to capture a single document just execute e.g. wget --warc-file single_page 'https://webarchive.jira.com/wiki/display/wayback/Wayback+Installation+and+Configuration+Guide#WaybackInstallat
Am Freitag, 29. März 2013 schrieb Andy Jackson: Just a very quick test (before I go to bed) shows an unexpected behaviour to me: $ wget -O tempname --warc-file="output" "http://example.com" results
Hi, There is a small bug in the WARC methods. The function gzdopen () is called with 'wb+9'. The '+' is ignored by zlib 1.2.3.*, but it causes an error with zlib 1.2.4. The attached patch removes the
That's good to hear. There's one other small adjustment that you may want to make, see the attached patch. One of the WARC functions uses the basename function, which causes problems on OS X. Includ
Hi Giuseppe, Thanks for your reply. I've attached a new version of the patch that includes a fallback function that generates UUIDs from rand (version 4 from RFC 4122, the UUID description). The only
URL: <https://savannah.gnu.org/bugs/?59086> Summary: --page-requisites not always working when creating a warc file Project: GNU Wget Submitted by: thomasegense Submitted on: Wed 09 Sep 2020 08:52:02
URL: <http://savannah.gnu.org/bugs/?47281> Summary: WARC URI Headers Improperly Quoted Project: GNU Wget Submitted by: None Submitted on: Sat 27 Feb 2016 11:13:29 UTC Category: Program Logic Severity
[Please CC me directly, as I'm not subscribed to the list.] Yes, thanks. Updated patch is attached. 2015-02-14 Eli Zaretskii <address@hidden> Gisle Vanem <address@hidden> * warc.c (windows_uuid_str)
The patch I suggest is below. It uses the fallback method if Rpcrt4.dll cannot be loaded, or if the functions from that DLL fail for some reason. 2015-02-14 Eli Zaretskii <address@hidden> Gisle Vane
This is not a 'bug' by any means, but I could find no better place to post this so please forgive me... I've used 'wget' for years but am just now discovering the real power it has. Lately I have upg
Hi Giuseppe, * I've changed the configure.ac and src/Makefile.am. * I've added a ChangeLog entry. See the new version of the patch. I've also attached a patch with just the changes in these three fil
Hi, I believe I found a bug. While downloading a large file with wget, the connection failed multiple times. Wget retried with a range request until it had the entire file downloaded. In the resultin
Good morning, New to wget and web archiving in general here. I've been trying to use wget to mirror a couple of my websites and output WARC files however I am unable to view the WARCs in webarchivepl
Hello, It seems that there may be some ambiguity in the WARC standard regarding the usage of angle brackets surrounding the URI given for a WARC-Target-URI field. In short, while the BNF grammar incl