[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-16 surrogate pair handling in grep -i option
From: |
Corinna Vinschen |
Subject: |
Re: UTF-16 surrogate pair handling in grep -i option |
Date: |
Fri, 16 Aug 2013 21:00:50 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Hi Jim,
first of all, thanks for your review.
On Aug 16 07:42, Jim Meyering wrote:
> Hi Corina,
>
> Thanks a lot for the patcb. It is almost perfect.
>
> [ - the git one-line summary should be readable.
> - comment nit: s/ as/ as a/
> - a style issue: we want curly braces around the 1-line
> else body in the first #ifdef block
> - please attribute the reporter (or a list URL) in the commit log
> ]
I attached a new patch which hopefully fixes these points. It also
contains a suggestion for a NEWS entry.
> Do any of the existing tests trigger this malfunction?
Unfortunately not. The testsuite runs fine except for a few failures
in fmbtest, but they are not related to this problem (...but I didn't
took another look into them, either)
> If not, can you create a small example that triggers the
> problem on cygwin? Even better would be the addition of a new
> script in tests/, which is required for any bug-fix patch.
I'm not quite sure how the test script as a whole should look like, but
the test content is very simple; just create input which contains a
valid 4 byte UTF-8 character. Without the patch, grep SEGVs on Cygwin,
with the patch it runs fine:
Before:
$ printf "\xf0\x90\x90\x85\n" | ./grep -i blurb
Segmentation fault (core dumped)
$
After:
$ printf "\xf0\x90\x90\x85\n" | ./grep -i blurb
$
However, I just noticed that it's still not possible to grep for such
a character. Apparently the regex compiler does not recognize UTF-16
surrogate pairs either. Grep fails, grep -F works:
$ printf "\xf0\x90\x90\x85\n" | grep '𐐅'
$ printf "\xf0\x90\x90\x85\n" | grep -F '𐐅'
𐐅
Not sure if that's such a terrible restriction, though...
> Also, it'd be great if you would add a NEWS entry that
> describes your fix.
My suggestion for a NEWS entry is part of the attached patch.
> That said, there's no pressure.
> If you can tell me how to reproduce the failure, I'll
> make time to write both the test and NEWS addition, and
> amend them onto your patch.
>
> PS. Your timing is great. I'm planning to make a release pretty soon.
Sounds good :)
Thanks,
Corinna
--
Corinna Vinschen
Cygwin Maintainer
Red Hat
0001-fix-Cygwin-UTF-16-surrogate-pair-handling-with-i.patch
Description: Text document
pgpx26YKJtBjP.pgp
Description: PGP signature
- UTF-16 surrogate pair handling in grep -i option, Corinna Vinschen, 2013/08/14
- Re: UTF-16 surrogate pair handling in grep -i option, Jim Meyering, 2013/08/16
- Re: UTF-16 surrogate pair handling in grep -i option,
Corinna Vinschen <=
- Re: UTF-16 surrogate pair handling in grep -i option, Jim Meyering, 2013/08/18
- Re: UTF-16 surrogate pair handling in grep -i option, Corinna Vinschen, 2013/08/19
- Re: UTF-16 surrogate pair handling in grep -i option, Corinna Vinschen, 2013/08/19
- Re: UTF-16 surrogate pair handling in grep -i option, Paul Eggert, 2013/08/19
- Re: UTF-16 surrogate pair handling in grep -i option, Corinna Vinschen, 2013/08/20
- Re: UTF-16 surrogate pair handling in grep -i option, Jim Meyering, 2013/08/25
- bug#15192: UTF-16 surrogate pair handling in grep -i option, Corinna Vinschen, 2013/08/26
- bug#15192: UTF-16 surrogate pair handling in grep -i option, Jim Meyering, 2013/08/27
- bug#15192: UTF-16 surrogate pair handling in grep -i option, Corinna Vinschen, 2013/08/27
- bug#15192: UTF-16 surrogate pair handling in grep -i option, Jim Meyering, 2013/08/31