lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lynx-dev] circumventing blocking sites


From: Stefan Caunter
Subject: Re: [Lynx-dev] circumventing blocking sites
Date: Sat, 4 Feb 2017 12:06:39 -0500

On Sat, Feb 4, 2017 at 11:28 AM, Nelson H. F. Beebe <address@hidden> wrote:
> For several years, I have used lynx (and also wget, and rarely, curl)
> to access publisher Web pages for new journal issues.  Recently, I
> noticed that a lynx pull of an page from Elsevier ScienceDirect would
> never complete:
>
>         % lynx -source -accept_all_cookies -cookies  --trace 
> http://www.sciencedirect.com/science/journal/00978493/62 > foo.62
>         
> parse_arg(arg_name=http://www.sciencedirect.com/science/journal/00978493/62, 
> mask=1, count=5)
>         parse_arg 
> startfile:http://www.sciencedirect.com/science/journal/00978493/62
>         ... no further output, and no job completion ...
>
> Similarly, I also find that wget and curl fail to complete.
>
> This new behavior suggests that the publisher site has thrown up
> http-agent-specific, rather than IP-address-specific blocks, because
> accessing the same URL in a GUI browser on the SAME machine gets an
> immediate return of the expected journal issue contents.
>
> If I add the --debug option to wget, I find that it reports
>
>         ---request begin---
>         GET /science/journal/00978493/62 HTTP/1.1
>         User-Agent: Wget/1.14 (linux-gnu)
>         Accept: */*
>         Host: www.sciencedirect.com
>         Connection: Keep-Alive
>
>         ---request end---
>
> Thus, it identifies itself as wget, and I assume that lynx probably
> self identifies as well.
>
> Does anyone on this list have an idea how to circumvent these apparent
> blocks?
>

put -useragent="Googlebot" or "Mozilla" in your command line:

lynx -useragent="Mozilla"  -accept_all_cookies -dump
http://www.sciencedirect.com/science/journal/00978493/62

gets me a long list of links in the html result



reply via email to

[Prev in Thread] Current Thread [Next in Thread]