Hi Tim,
Thank you for your email.
First, the current behavior of wget 1.x contradicts the URI standard
defined in RFC 3986, which allows semicolons in the userinfo segment.
Ensuring compliance with this widely accepted standard is essential for
_interoperability and correctness_.
Second, the discrepancy in how wget 1.x handles semicolons in the
userinfo segment of a URL can potentially lead to _security
vulnerabilities_. Here are a few ways this could happen:
*Misinterpretation of Userinfo Data:*
Authentication Details: If wget 1.x incorrectly parses the userinfo
segment, it may fail to properly handle authentication details. For
example, if a URL contains credentials such as user:pass or user;pass,
misinterpretation could lead to failed authentication attempts or the
exposure of sensitive information.
*Phishing and Spoofing Attacks:*
Host Header Manipulation: Attackers might craft URLs that, due to
incorrect parsing by wget 1.x, lead to connections to unintended hosts.
This can be exploited in phishing attacks where the user believes they
are connecting to a legitimate server, but are actually redirected to a
malicious one.
*Man-in-the-Middle Attacks:*
Incorrect DNS Resolution: If wget 1.x misinterprets the userinfo segment
as part of the hostname, it could result in DNS queries to incorrect or
malicious domains. This can be leveraged in man-in-the-middle attacks
where the attacker intercepts and manipulates the communication.
*Data Leakage:*
Insecure Handling of Userinfo: When userinfo is misinterpreted,
sensitive information (like usernames and passwords) might be logged or
displayed in error messages, leading to unintended exposure of credentials.
Thanks,
Bachir
On Sat, Jun 1, 2024 at 5:42 PM Tim Rühsen <tim.ruehsen@gmx.de
<mailto:tim.ruehsen@gmx.de>> wrote:
Hi Bachir,
wget2 "a;bc@xyz"
wget2: Failed to resolve 'xyz' (Name or service not known)
Is there a real-life problem that requires wget 1.x to accept a
semicolon in the userinfo field?
Regards, Tim
On 5/14/24 12:42, Bachir Bendrissou wrote:
> Hi,
>
> The URL example below contains a semicolon in the userinfo segment.
>
> In the example, wget does not recognise the userinfo segment, and
> instead treats it as part of the hostname. When the semicolon is
> removed, the userinfo is recognised and is no longer processed as
> hostname. The rejection of semicolons in userinfo creates a parsing
> discrepancy with other URL parsers.
>
> curl "a;bc@xyz"
> curl: (6) Could not resolve host: xyz
>
> wget "a;bc@xyz"
> wget: unable to resolve host address ‘a;bc@xyz’
>
> wget "abc@xyz"
> wget: unable to resolve host address ‘xyz’
>
> You can replicate the above cases after disconnecting from your DNS.
>
> Thank you,
> Bachir
>