[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
UTF-16 surrogate pair handling in grep -i option
From: |
Corinna Vinschen |
Subject: |
UTF-16 surrogate pair handling in grep -i option |
Date: |
Wed, 14 Aug 2013 18:32:42 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Hi,
two days ago we got a report on the Cygwin mailing list that under some
circumstances grep on Cygwin SEGVed. I tracked this down to grep's -i
option, which calls the function mbtolower. This function works fine on
systems with UCS-2 or UCS-4 wchar_t's, but it doesn't handle UTF-16
surrogates on UTF-16 wchar_t systems. It does especially not handle
the case where wcrtomb returns 0, which is what causes the SEGV.
The below patch fixes this at least for Cygwin. Actually, I don't
know any other OS which uses UTF-16 and provides this set of functions,
so I assume that this solution is very system-specific, unless another
Newlib based OS uses UTF-16 wchar_t as well.
I hope the patch is ok to go into mainline. I added a lot of comment
to explain what happens. Feel free to ask any question.
Please keep me CCed, I'm not subscribed to bugs-grep.
Thanks,
Corinna
--
Corinna Vinschen
Cygwin Maintainer
Red Hat
0001-src-searchutils.c-mbtolower-Handle-UTF-16-surrogate-.patch
Description: Text document
pgpyDP7pbA7AR.pgp
Description: PGP signature
- UTF-16 surrogate pair handling in grep -i option,
Corinna Vinschen <=
- Re: UTF-16 surrogate pair handling in grep -i option, Jim Meyering, 2013/08/16
- Re: UTF-16 surrogate pair handling in grep -i option, Corinna Vinschen, 2013/08/16
- Re: UTF-16 surrogate pair handling in grep -i option, Jim Meyering, 2013/08/18
- Re: UTF-16 surrogate pair handling in grep -i option, Corinna Vinschen, 2013/08/19
- Re: UTF-16 surrogate pair handling in grep -i option, Corinna Vinschen, 2013/08/19
- Re: UTF-16 surrogate pair handling in grep -i option, Paul Eggert, 2013/08/19
- Re: UTF-16 surrogate pair handling in grep -i option, Corinna Vinschen, 2013/08/20
- Re: UTF-16 surrogate pair handling in grep -i option, Jim Meyering, 2013/08/25
- bug#15192: UTF-16 surrogate pair handling in grep -i option, Corinna Vinschen, 2013/08/26
- bug#15192: UTF-16 surrogate pair handling in grep -i option, Jim Meyering, 2013/08/27