[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #59086] --page-requisites not always working when creating a warc f

From: Thomas Egense
Subject: [bug #59086] --page-requisites not always working when creating a warc file
Date: Wed, 9 Sep 2020 04:52:04 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0


                 Summary: --page-requisites not always working when creating a
warc file
                 Project: GNU Wget
            Submitted by: thomasegense
            Submitted on: Wed 09 Sep 2020 08:52:02 AM UTC
                Category: Program Logic
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: 
        Originator Email: 
             Open/Closed: Open
                 Release: None
         Discussion Lock: Any
        Operating System: GNU/Linux
         Reproducibility: None
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: None



Url example: https://jyllands-posten.dk/

How to reproduce:
echo "https://jyllands-posten.dk/"; >> url_list.txt

wget --level=1 --recursive --warc-cdx --page-requisites --warc-file=jp
--warc-max-size=1G -i url_list.txt

The source code for the page is downloaded in the warc (last record). But none
of the images are downloaded and links are also followed (--recursive

It is probably due to some HTTPS redirection, but since the
source code is downloaded correct, it should still be possible to follow links
and download page requisites.


Reply to this item at:


  Message sent via Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]