bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] no_proxy domain matching


From: Tomas Hozza
Subject: Re: [PATCH] no_proxy domain matching
Date: Wed, 20 Nov 2019 12:41:18 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1

On 7. 11. 2019 21:30, Tim Rühsen wrote:
> On 07.11.19 15:21, Tomas Hozza wrote:
>> Hi.
>>
>> In RHEL-8, we ship a wget version that suffers from bug fixed by [1]. The 
>> fix resolved issue with matching subdomains when no_proxy domain definition 
>> was prefixed with dot, e.q. "no_prefix=.mit.edu". As part of backporting the 
>> fix to RHEL, I wanted to create an upstream test for no_prefix 
>> functionality. However I found that there is still one corner case, which is 
>> not handled by the current upstream code and honestly I'm not sure what is 
>> the intended domain matching behavior in that case. Man page is also not 
>> very specific in this regard.
>>
>> The corner case is as follows:
>> - no_proxy=.mit.edu
>> - download URL is e.g. "http://mit.edu/file1";
>>
>> In this case the proxy settings are used, because domains don't match due to 
>> the leftmost dot in no_proxy domain definition. This is either intended or 
>> corner case that was not considered. One could argue, that if the no_proxy 
>> is set to ".mit.edu", then leftmost dot means that the proxy settings should 
>> not apply only to subdomains of "mit.edu", but proxy settings should still 
>> apply to "mit.edu" domain itself. From my point of view, after reading wget 
>> man page, I don't think that the leftmost dost in no_proxy definition has 
>> any special meaning.
> 
> Hello Tomas,
> 
> hard to decide how to handle this. I personally would like to see a
> match with curl's behavior (see https://github.com/curl/curl/issues/1208).
> 
> Given the docs from GNU emacs, you are right. "no_proxy=.mit.edu" means
> "mit.edu and subdomains" are excluded from proxy settings.
> (see https://www.gnu.org/software/emacs/manual/html_node/url/Proxies.html)
> 
> The caveat with emacs' behavior is that you cannot exclude just all
> subdomains of mit.edu without mit.edu itself. Effectively, that creates
> a corner case that can't be handled at all. (but if curl also does it
> that way, let's go for it).
> 
> Maybe you can find out about the current no_proxy behavior of typical
> and wide-spread tools (regarding leftmost dot) !? Once we have that
> information, we can make a confident decision.
> 
> Regards, Tim

Hi Tim.

It took me some time to go through the current situation and to be honest, it 
is kind of a mess. While each tool handles the no_proxy env a little bit 
differently, there are some similarities. Nevertheless I was not able to find 
any standard.

curl's behavior:
- "no_proxy=.mit.edu"
  - will match the domain and subdomains e.g. "www.mit.edu" or 
"www.subdomain.mit.edu"
  - will match the host "mit.edu"
- "no_proxy=mit.edu"
  - will match the domain and subdomains e.g. "www.mit.edu" or 
"www.subdomain.mit.edu"
  - will match the host "mit.edu"
- downside: can not match only the host; can not match only the domain and 
subdomains

current wget's behavior:
- "no_proxy=.mit.edu"
  - will match the domain and subdomains e.g. "www.mit.edu" or 
"www.subdomain.mit.edu"
  - will NOT match the host "mit.edu"
- "no_proxy=mit.edu"
  - will match the domain and subdomains e.g. "www.mit.edu" or 
"www.subdomain.mit.edu"
  - will match the host "mit.edu"
- downside: can not match only the host

wget's behavior with proposed patch:
- "no_proxy=.mit.edu"
  - will match the domain and subdomains e.g. "www.mit.edu" or 
"www.subdomain.mit.edu"
  - will match the host "mit.edu"
- "no_proxy=mit.edu"
  - will match the domain and subdomains e.g. "www.mit.edu" or 
"www.subdomain.mit.edu"
  - will match the host "mit.edu"
- downside: can not match only the host; can not match only the domain and 
subdomains
- it would be consistent with curl's behavior

emacs's behavior:
- "no_proxy=.mit.edu"
  - will match the domain and subdomains e.g. "www.mit.edu" or 
"www.subdomain.mit.edu"
  - will match the host "mit.edu"
- "no_proxy=mit.edu"
  - will NOT match the domain and subdomains e.g. "www.mit.edu" or 
"www.subdomain.mit.edu"
  - will match the host "mit.edu"
- downside: can not match only subdomains

python httplib2's behavior:
- "no_proxy=.mit.edu"
  - will match the domain and subdomains e.g. "www.mit.edu" or 
"www.subdomain.mit.edu"
  - will match the host "mit.edu"
- "no_proxy=mit.edu"
  - will NOT match the domain and subdomains e.g. "www.mit.edu" or 
"www.subdomain.mit.edu"
  - will match the host "mit.edu"
- downside: can not match only subdomains

To sum it up. Each approach has some downsides. Given the change that I 
provided, wget's behavior would be consistent with curl's behavior. However it 
will have more downsides that it currently has, specifically it will loose the 
ability to not to match the host, but only domain and subdomains. Emacs's 
behavior is similar to Python's httplib2 behavior regarding the leftmost dot.

Honestly I have a soft preference for keeping the current wget's behavior. But 
I admit that making the behavior consistent with curl's behavior makes sense. 
Please let me know how you would like to proceed.

To make the behavior consistent with curl, the previously attached changes 
should be OK. If you find those new conditions too complicated, I can try to 
rethink it, but I already tried to make it as little complicated as possible 
and at the same time trying to not completely rewrite the function.

If you'll decide to keep the current behavior, I'll modify the test that I 
added to cope with the behavior.

Thanks,

Regards,
Tomas

>> I think that this corner case should be either fixed, or alternatively wget 
>> manpage should be made more specific about the intended behavior.
>>
>> Anyway, I'm attaching patches fixing the corner case and adding test case 
>> for no_proxy behavior. And one small fix for the test framework - HttpTest 
>> begin() function was not returning a result value, but always None.
>>
>> Please let me know if the corner case is really an intended behavior and 
>> I'll change the test case and can fix the man page instead of the code.
>>
>> [1] 
>> http://git.savannah.gnu.org/cgit/wget.git/commit/?id=fd85ac9cc623847e9d94d9f9241ab34e2c146cbf
>>
>> Thank you.
>>
>> Regards,
>> Tomas
>>
> 

-- 
Tomas Hozza
Associate Manager, Software Engineering - EMEA ENG Core Services

PGP: 1D9F3C2D
UTC+1 (CET)
Red Hat Inc.                 http://cz.redhat.com

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]