[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Libcdio-devel] Vulnerable use of strcpy in iso9660_fs.c
From: |
Thomas Schmitt |
Subject: |
Re: [Libcdio-devel] Vulnerable use of strcpy in iso9660_fs.c |
Date: |
Tue, 09 Apr 2024 09:00:18 +0200 |
Hi,
Pete Batard wrote:
> Or maybe there's a mathematical proof that
> a UTF-8 glyph byte encoding can never be larger than 1.5 the UTF-16 glyph
> byte encoding
I thought to have given one. Let me try again:
https://datatracker.ietf.org/doc/html/rfc3629
"In UTF-8, characters from the U+0000..U+10FFFF range (the UTF-16
accessible range) are encoded using sequences of 1 to 4 octets."
The table after this statement shows that it can encode 21 bits that
way.
The older FSS-UTF proposal of 1992 had up to 6 octets for up to 31 bits
but was restricted in 2003 to 21 bits by above RFC. This is also defined
in ISO/IEC 10646:2014 to ISO/IEC 10646:2020.
My proof is that UCS-2 encodes the Unicode points U+0000 to U+FFFF
in 2 bytes which is in UTF-8 encoded in at most 3 bytes.
If the producer of the ISO uses UTF-16 instead of the older UCS-2,
then the input Unicode range is like with UTF-8: U+0000..U+10FFFF.
Characters which do not fit into 2 bytes (and thus possibly not into
3 UTF-8 bytes) get represented as 4 bytes. Given that UTF-8 cannot
exceed 4 bytes, the number of bytes cannot grow during conversion.
(My proposal would accomodate up to 6 UTF-8 bytes for 4 UTF-16 bytes
and thus even suffice for FSS-UTF.)
> So I'm going to stick to i_fname for length, with the expectation that we're
> unlikely to see realistic truncations outside of images designed to trigger
> one,
I try to obey specs and to avoid speculations about what of their
provisions would possibly not happen in practice.
To my experience this pays off on the long run.
> I'm not
> sure I like the idea of trying to be too smart about or expecting specs not
> to change the deal.
My proposal with name allocation of 3*if_name/2 and a result size
parameter of _iso9660_recname_to_cstring() would be as safe against
result overflow as would be yours.
It would additionally guarantee that all valid UCS-2 names lead to valid
and untruncated UTF-8 names.
(One would separately have to check what the character conversion in
libcdio makes out of invalid UTF-16 byte sequences. Whatever the
proposed size check would avoid memory corruption in
_iso9660_recname_to_cstring().)
Have a nice day :)
Thomas