bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#46342: 28.0.50; socks-send-command munges IP address bytes to UTF-8


From: Eli Zaretskii
Subject: bug#46342: 28.0.50; socks-send-command munges IP address bytes to UTF-8
Date: Fri, 12 Feb 2021 17:04:16 +0200

> From: "J.P." <jp@neverwas.me>
> Cc: 46342@debbugs.gnu.org
> Date: Fri, 12 Feb 2021 06:30:32 -0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Then they are what we call "raw bytes", and encoding them with
> > raw-text-unix should suffice.
> 
> Thanks. Unfortunately, this produces the same utf-8 encoded bytes.
> 
>   (encode-coding-char 192 'raw-text-unix)
>   ⇒ "\303\200"

192 is not a raw-byte, it's a character whose Unicode codepoint is
192.  So you get its UTF-8 sequence.

> It looks like raw-text-unix is an alias for binary [1], the coding
> system already used by the network process sending the erroneous
> request.

The problem is with how the original request is generated, not how it
is encoded.

> I suppose it's always possible to strong arm it like
> 
>   (encode-coding-char (or (decode-char 'eight-bit c) c) 'raw-text-unix)
>   ⇒ "^@" ... "\377"

That's one way, yes.  But it isn't the best one.

> But what about your original latin-1 suggestion? Is that no longer in
> contention?

No, it isn't.

>   (encode-coding-char 192 'latin-1)
>   ⇒ "\300"

Not every byte above 127 is a valid character that Latin-1 can
meaningfully encode.  It is wrong to use Latin-1 for raw bytes.  What
you need is a way of generating a unibyte string from a series of raw
bytes,

> > How does the code which calls socks.el create these raw bytes?
> 
> This library has an entry-point function that's part of the url-gateway
> dispatch mechanism. I can't say for certain, but it looks like url-http
> is the only library directly using this facility. Regardless, the
> function gets called with a (possibly multibyte) host name, which in
> rare cases may be an ASCII IP address created by url-gateway.
> 
> With SOCKS4, that's kind of moot, since all names are looked up through
> socks-nslookup-host, which returns an IPv4 address as a list of fixnums.
> Its caller is an internal helper that converts this list into a
> multibyte string for socks-send-command to emit onto the wire (where
> it's then rejected by the service).
> 
> Currently, IP addresses aren't used at all for v5 connect-command
> requests. And raw-byte IP addresses do not yet appear anywhere [2]. This
> patch would introduce them, either as an argument to socks-send-command
> or as something ephemeral produced by it (the current idea).

So what is the problem with using unibyte-string for producing a
unibyte string from a list of bytes?  It sounds like it's exactly
what is needed here, and is actually used in some places in socks.el.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]