help-libidn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: Disagreement between libidn2 and Python idna


From: Ian Eldred Pudney
Subject: Re: Fwd: Disagreement between libidn2 and Python idna
Date: Sun, 8 Nov 2020 20:55:17 -0800

Hi Tim,

Since sending that email, I've actually found a more serious disagreement. This time, I've found a domain that successfully encodes with both libraries, but produces different results! Configuration was UTS#46 Nontransitional, no STD3 rules.
  • Domain name:

    a.İ᷹

  • Domain name hex codepoints:

    ['61', '2e', '130', '1df9']

  • Python idna punycode (difference highlighted):

    a.xn--i-9bb708r

  • libidn2 punycode:

    a.xn--i-9bb808r

If I attempt to decode these punycode domains, they can each decode their own, but Python idna says libidn2's punycode is not in NFC, whereas libidn2 will decode Python idna's punycode. I don't actually know which of these libraries is behaving correctly. 

On Sun, Nov 8, 2020 at 10:03 AM Tim Rühsen <tim.ruehsen@gmx.de> wrote:
Hi Ian,

thanks for reaching out and reporting those issues.

A differential fuzzer is a nice thing to have - I agree that different
implementation should lead to the same (correct) results.

Issue #3 indeed seems to be a matter of upgrading to Unicode 12 as we
currently use tables from Unicode 11.0.

I'll look into this and the other issues likely during the next 5-7 days.

Cheers, Tim

On 07.11.20 00:22, Ian Eldred Pudney wrote:
> Hello,
>
> I'm from security at Google. I'm working on a differential fuzzer
> between libidn2 and the Python idna package. (Essentially, I've written
> a program that rapidly tries inputs for libidn2 and Python idna, and
> makes sure that the same input produces the same result). I was writing
> this to find bugs in the Python idna package, but I think I've found 3
> bugs in libidn2 instead. I'm reaching out to report these 3 bugs.
>
> In all of these cases, libidn2 rejects encoding the specified domain
> name with an error, but Python idna encodes it fine. Also, in all of
> these cases, libidn2 will happily /decode/ the punycode generated by
> Python idna, into the same input that it refuses to encode.
>
> This input causes libidn2 to report an error of "domain name longer than
> 255 characters." However, the punycode domain name is only 146 characters.
>
>   * Domain name:
>
>     髦暩晦晦晦獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳筳獳
>     싂.퐀쓄쓄쓄쓄쓄쓄쓄쓄쓄쓄쓄쓼쓄쓄쓄쓄쓄쓄쓄쓄쓄㻄쓄쓄럄䄀싂.뼀猀獳獳
>     獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳ⱁ㩁
>
>   * Domain name hex codepoints:
>
>     ['9ae6', '66a9', '6666', '6666', '6666', '7373', '7373', '7373',
>     '7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373',
>     '7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373',
>     '7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373',
>     '7b73', '7373', 'c2c2', '2e', 'd400', 'c4c4', 'c4c4', 'c4c4',
>     'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4',
>     'c4fc', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4',
>     'c4c4', 'c4c4', '3ec4', 'c4c4', 'c4c4', 'b7c4', '4100', 'c2c2',
>     '2e', 'bf00', '7300', '7373', '7373', '7373', '7373', '7373',
>     '7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373',
>     '7373', '7373', '7373', '7373', '7373', '2c41', '3a41']
>
>   * Punycode:
>
>     xn--lkvaa9xr87caaaaaaaaaaaaaaaaaaaaaaaaaaa7968dcp2n7tvk.xn--p9mx3db62rwgjlncaaaaaaaaaaaaaaaaaaaba41m468u.xn--bfj606ben8bfnaaaaaaaaaaaaaaaaaa79563b
>
>
> This input causes libidn2 encoding to report an error of "string has
> forbidden bi-directional properties". To determine which library was
> wrong, I implemented the bidi rule myself, and I believe this should be
> valid.
>
>   * Domain name:
>
>     ਗ਼.ÿ߽̃̃̃
>
>   * Domain name hex codepoints:
>
>     ['a17', 'a3c', '2e', 'ff', '7fd', '303', '303', '303']
>
>   * Punycode:
>
>     xn--lkvaa9xr87caaaaaaaaaaaaaaaaaaaaaaaaaaa7968dcp2n7tvk.xn--p9mx3db62rwgjlncaaaaaaaaaaaaaaaaaaaba41m468u.xn--bfj606ben8bfnaaaaaaaaaaaaaaaaaa79563b
>
>
> This input causes libidn2 to report a disallowed character. This appears
> to not be a "bug", but rather out-of-date tables in libidn2. The
> offending character
> <https://www.fileformat.info/info/unicode/char/0e90/index.htm> was only
> added to Unicode in 2019.
>
>   * Domain name:
>
>     ຐ.xyz <http://xn--46c.xyz>
>
>   * Domain name hex codepoints:
>
>     ['e90', '2e', '78', '79', '7a']
>
>   * Punycode:
>
>     xn--46c.xyz <http://xn--46c.xyz>
>
>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]