bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request


From: Ted Zlatanov
Subject: bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
Date: Thu, 11 Aug 2016 08:57:50 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux)

On Thu, 11 Aug 2016 15:31:11 +0300 Dmitry Gutov <dgutov@yandex.ru> wrote: 

DG> On 08/11/2016 11:53 AM, Ted Zlatanov wrote:
>> Could you add to your patch the cases you've tested? There's a specific
>> place for URL parsing tests in test/lisp/url/url-parse-tests.el that
>> would help everyone.

DG> Sure, but only one of the patches affects URL parsing (and Lars prefers the
DG> other one).

Maybe the tests should be in a separate patch then. Neither your Russian
example nor Lars' example have a parallel in the tests AFAICS. I'd also
add the example hostname that Katsumi Yamaoka gave from the w3m source.

Somewhat related: it would be nice if the URL parser also listed the
non-ASCII scripts used in the domain name. Then eww and other programs
could do one of the typical defenses: either ensure only one script is
used; or allow only scripts that match the user's locale; or catch any
non-ASCII domain names. Typically they'd use Punycode to display such
suspicious domain names:
https://en.wikipedia.org/wiki/IDN_homograph_attack

I bring it up since explicitly allowing non-ASCII domain names
automatically opens up these security concerns, and it's a bit hard to
collect the confusables externally:
https://elpa.gnu.org/packages/uni-confusables.html

On Thu, 11 Aug 2016 13:05:12 +0200 Lars Ingebrigtsen <larsi@gnus.org> wrote: 

LI> Yes, the fix here should be in url-http-create-request, not in the URL
LI> parsing functions.  The main issue here is that the URL request buffer
LI> is a multibyte buffer and (as with all network connection buffers), it
LI> shouldn't be.  (Or, rather, that function just creates a string instead
LI> of a buffer, but the same principle applies.)

I think this is correct: the URL parsing should not care about the
provenance or potential use of that URL to make a HTTP request or
otherwise. But maybe the URL parsing can be smart enough to return both
the IDNA version and the original domain name, plus some parsing
information like the list of scripts I suggested above, to save user
agents from doing that extra work?

Ted





reply via email to

[Prev in Thread] Current Thread [Next in Thread]