bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #66248] Sending wget spider to the background will avoid issue with


From: anonymous
Subject: [bug #66248] Sending wget spider to the background will avoid issue with Bad file descriptor
Date: Tue, 24 Sep 2024 10:40:32 -0400 (EDT)

URL:
  <https://savannah.gnu.org/bugs/?66248>

                 Summary: Sending wget spider to the background will avoid
issue with Bad file descriptor
                   Group: GNU Wget
               Submitter: None
               Submitted: Tue 24 Sep 2024 02:40:29 PM UTC
                Category: Program Logic
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: freddieventura
        Originator Email: creativefreddieventura@gmail.com
             Open/Closed: Open
         Discussion Lock: Any
                 Release: trunk
        Operating System: GNU/Linux
         Reproducibility: Every Time
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: None


    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: Tue 24 Sep 2024 02:40:29 PM UTC By: Anonymous
Hi,

I was just doing a simple webcrawl, trying to gather a list of all the urls in
a .txt , doing a spider check first.

I am following the first response on this thread.
https://stackoverflow.com/questions/52610592/wget-spider-a-website-to-collect-all-links

But I am doing

```
wget --spider --force-html --span-hosts -np --limit-rate=20k -e robots=off
--wait=3 --random-wait -r -l2 https://developers.google.com -o wget.log &
```

It works. But I just wanted to run it on the foreground (not send it to the
bacground)
So I am just doing this

```
wget --spider --force-html --span-hosts -np --limit-rate=20k -e robots=off
--wait=3 --random-wait -r -l2 https://developers.google.com -o wget.log 
```

This last doesnt work , it takes 3 seconds to exit `wget` giving some lines on
the log like thisone


```
developers.google.com: No such file or directory
developers.google.com/index.html.tmp.tmp: Bad file descriptor
Cannot write to ‘developers.google.com/index.html.tmp.tmp’ (Bad file
descriptor).
Found no broken links.
```







    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?66248>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]