bug#60697: GNU grep mishandles \b near encoding errors

bug-grep

From:	Paul Eggert
Subject:	bug#60697: GNU grep mishandles \b near encoding errors
Date:	Mon, 9 Jan 2023 15:00:15 -0800
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0

Here's a shell session illustrating the problem on Fedora 37, which hasGNU grep 3.7. The same bug is still in bleeding-edge GNU grep.


  $ export LC_ALL=en_US.utf8
  $ printf '\300\n' | grep '\b'
  grep: (standard input): binary file matches
  $ printf '\300\n' | grep -P '\b'
  $

Plain grep finds a word boundary in the input even though the inputcontains no words (just an encoding error). 'grep -P' does the right thing.

The underlying issue is in the glibc regex code so the fix should be inglibc / Gnulib, but I thought I'd report it here before I forgot it.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#60697: GNU grep mishandles \b near encoding errors, Paul Eggert <=
- bug#60697: GNU grep mishandles \b near encoding errors, Jim Meyering, 2023/01/12