[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#17460: new snapshot available: grep-2.18.143-b298

From: Jim Meyering
Subject: bug#17460: new snapshot available: grep-2.18.143-b298
Date: Sat, 10 May 2014 22:43:09 -0700

Here's the latest, in preparation for a grep-2.19 release.
Please give it a good work-out and let us know of any problems.

This release includes an unusually large number of bug fixes and
impressive performance improvements, thanks to a lot of work
by Norihiro Tanaka and Paul Eggert.

grep snapshot:
  http://meyering.net/grep/grep-ss.tar.xz      1.2 MB

Here are the new parts of the NEWS file, followed by git shortlog entries:

** Improvements

  Performance has improved, typically by 10% and in some cases by a
  factor of 200.  However, performance of grep -P in UTF-8 locales has
  gotten worse as part of the fix for the abovementioned crashes.

** Bug fixes

  grep no longer mishandles patterns like [a-[.z.]], and no longer
  mishandles patterns like [^a] in locales that have multicharacter
  collating sequences so that [^a] can match a string of two characters.

  grep no longer mishandles an empty pattern at the end of a pattern list.
  [bug introduced in grep-2.5]

  grep -C NUM now outputs separators consistently even when NUM is zero,
  and similarly for grep -A NUM and grep -B NUM.
  [bug present since "the beginning"]

  grep -f no longer mishandles patterns containing NUL bytes.
  [bug introduced in grep-2.11]

  Plain grep, grep -E, and grep -F now treat encoding errors in patterns
  the same way the GNU regular expression matcher treats them, with respect
  to whether the errors can match parts of multibyte characters in data.
  [bug present since "the beginning"]

  grep -w no longer mishandles a potential match adjacent to a letter that
  takes up two or more bytes in a multibyte encoding.
  Similarly, the patterns '\<', '\>', '\b', and '\B' no longer
  mishandle word-boundary matches in multibyte locales.
  [bug present since "the beginning"]

  grep -P now reports an error and exits when given invalid UTF-8 data.
  Previously it was unreliable, and sometimes crashed or looped.
  [bug introduced in grep-2.16]

  grep -P now works with -w and -x and backreferences. Before,
  echo aa|grep -Pw '(.)\1' would fail to match, yet
  echo aa|grep -Pw '(.)\2' would match.

  grep -Pw now works like grep -w in that the matched string has to be
  preceded and followed by non-word components or the beginning and end
  of the line (as opposed to word boundaries before).  Before, this
  echo a@@a| grep -Pw @@ would match, yet this
  echo a@@a| grep -w @@ would not.  Now, they both fail to match,
  per the documentation on how grep's -w works.

  grep -i no longer mishandles patterns containing titlecase characters.
  For example, in a locale containing the titlecase character
  'grep -i Lj' now matches both 'LJ' (U+01C7 LATIN CAPITAL LETTER LJ)
  and 'lj' (U+01C9 LATIN SMALL LETTER LJ).

Changes in grep since v2.18:

Jim Meyering (18):
      maint: post-release administrivia
      maint: dfa: pass NULL, not 0, as 2nd arg to setlocale
      tests: make a performance-measuring test less system-sensitive
      tests: avoid false-positive failure on some AMD CPUs
      maint: fix "make dist"
      tests: placate "make syntax-check" re compare arg ordering
      build: avoid OS X 10.8.5 build failure due to lack of static_assert
      maint: avoid sc_po_check syntax-check failure (kwset.c)
      tests: detect an infloop-inducing bug in grep -P (pcre-8.35)
      dfa: avoid new NULL dereference
      maint: Revert "dfa: avoid new NULL dereference"
      build: reenable some compiler warning options
      tests: use consistent spelling for locale name, en_US.UTF-8
      grep: fix new heap write buffer overrun
      gnulib: update to latest
      maint: make ChangeLog generation more robust
      maint: mark some breakless cases with /* fallthrough */ comment
      gnulib: update submodule to latest, and bootstrap

Norihiro Tanaka (33):
      grep: don't match line-by-line for case-insensitive with grep and awk
      grep: remove trivial_case_ignore
      grep: optimization of bracket expression for non-UTF8 locales
      grep: revert removal of trivial_case_ignore
      grep: avoid to add same character to a bracket expression
      grep: optimization for fgrep with changing the macher to grep macher.
      grep: perform the kwset-helping DFA match in narrower range
      grep: take mbrtowc_cache into new member of struct dfa
      dfa: avoid re-building a state built previously
      grep: reuse multibyte DFA buffers in non-UTF8 locales
      grep: fix performance bug with regex in line-by-line mode
      grep: optimization with the superset of DFA
      grep: use the Galil rule for Boyer-Moore algorithm in KWSet
      grep: prefer regex to DFA for ANYCHAR in multibyte locales
      grep: no match for the empty string included in multiple patterns
      grep: open CSET and transform into uppercase when MB_CUR_MAX == 1
      dfa: speed up by checking multibyte characters on demand
      grep: speed-up for exact matching with begline and endline constraints.
      grep: may also use Boyer-Moore algorithm for case-insensitive matching
      grep: speed-up by using memchr() in Boyer-Moore searching
      grep: avoid wasting memory for large patterns in dfamust
      grep: skip checking of multibyte character boundary, reaching at eolbyte
      grep: speed up for a case to repeat failure in DFA after success in kwset
      kwset: improve performance by inlining tr
      dfa: optimize memory allocation
      grep: simplify superset
      grep: adjust timing back to kwset when dfaisfast is true
      grep: fix the bug in previous patch.
      grep: make KWset and DFA agree about invalid sequences in patterns
      dfa: speed up 'dfaisfast'
      grep: improve performance of -v when combined with -L, -l or -q
      dfa: fix inconsistency in multibyte locales
      grep: retry DFA superset after matching multiple lines

Paul Eggert (90):
      grep: fix multiple bugs with bracket expressions
      * src/dfa.c (parse_bracket_exp): Parenthesize.
      * src/dfa.c (prednames): POSIX allows [[:xdigit:]] to match
multibyte chars.
      grep: remove lint
      grep: fix bugs with -i and titlecase
      grep: avoid 'inline' when it doesn't matter
      grep: minor tuning for mb_case_map_apply
      doc: describe titlecase fix better
      grep: fix some unlikely bugs in trivial_case_ignore
      grep: fix comment
      maint: remove differences from gnulib regex code
      doc: do not overpromise --ignore-case's behavior
      build: update gnulib submodule to latest
      grep: fix case-fold mismatches between DFA and regex
      fgrep: fix case-fold incompatibility with plain 'grep'
      maint: pacify 'make dist'
      dfa: port to freestanding DJGPP (Bug#17056)
      egrep, fgrep: go back to shell scripts
      grep: fix and simplify grep -iF optimization
      dfa: avoid undefined behavior
      egrep, fgrep: improve diagnostics from shell scripts
      dfa: improve port to freestanding DJGPP
      dfa: cache results of mbrtowc for speed
      dfa: avoid an indirection and port wint_t usage
      dfa: improve port to freestanding DJGPP
      grep: simplify dfa.c by having it not include mbsupport.h directly
      grep: minor improvements to previous patch
      grep: cleanup DFA superset optimization
      grep: minor cleanups for Galil speedups
      grep: simplify memory allocation in kwset
      grep: remove trival_case_ignore
      grep: prefer bool in DFA internals
      grep: port better to hosts with nonstandard nl_langinfo
      grep: remove bool_bf
      grep: cleanup for empty-string fix
      grep: cleanup for HAS_DOS_FILE_CONTENTS issue
      grep: improvements for the open-CSET patch
      build: update gnulib submodule to latest
      dfa: clarify memory allocation and port to IRIX
      dfa: avoid unnecessary work and other initialization
      dfa: better size-overflow check
      dfa: simplify transition table allocation
      dfa: simplify range char allocation
      dfa: simplify multibyte_prop allocation
      dfa: simplify position set and element count allocation
      dfa: simplify memory allocation
      dfa: avoid duplicate strlen when allocating memory
      dfa: simplify freelist
      dfa: simplify dfmust initialization
      dfa: trans reallocation microoptimization
      dfa: minor cleanup
      dfa: fix pointer type conversion bug
      dfa: fix bug that caused NUL to be mishandled in patterns
      dfa: minor improvements to previous patch
      grep: -P now rejects invalid input sequences in UTF-8 locales
      kwset: simplify Boyer-Moore with unibyte -i
      kwset: simplify and speed up Boyer-Moore unibyte -i in some cases
      dfa: omit static variables that limited dfaexec to one struct dfa
      dfa: fix memory leak reintroduced by previous patch
      build: suppress unsafe-loop-optimizations warnings
      dfa: minor tuneup of dfamust memory savings patch
      dfa: fix incorrect comment that led to heap overrun
      dfa: simplify and be more consistent about MB_CUR_MAX
      dfa: minor simplification of dfaexec
      misc: fix doc and test bugs re grep -z
      dfa: fix recently-introduced memory leak
      dfa: fix index bug in previous patch, and simplify
      kwset: improve performance when large Boyer-Moore key doesn't match
      kwset: speed up by using memchr2
      kwset: improve performance by inlining more
      grep: simplify EGexecute further
      grep: clarify EGexecute slightly
      tests: improve coverage for prefix-of-multibyte
      grep: simplify and fix problems with KWset-DFA agreement patch
      dfa: minor simplification
      grep: fix encoding-error incompatibilities among regex, DFA, KWset
      grep: improve internal API for multibyte boundary
      grep: fix -w match next to a multibyte letter
      dfa: minor performance improvement for previous change
      dfa: clarify use of "if"
      doc: mention performance changes
      grep: simplify and clarify invert-related code
      maint: fix indenting to pacify 'prohibit_tab_based_indentation'
      dfa: don't assume unsigned int is exactly 32 bits wide
      dfa: assume C89 for CHAR_BIT
      grep: minor improvements to retry-DFA-superset patch
      grep: -A 0, -B 0, -C 0 now output a separator
      tests: add test case for -C 0 change
      dfa: fix bug with \< etc in multibyte locales
      dfa: omit double includes

Stephane Chazelas (2):
      grep -P: fix it so backreferences now work with -w and -x
      align grep -Pw with grep -w

Changes in gnulib since v2.18:

* gnulib 497f4cd...c2e80b7 (49):
  > update from texinfo
  > autoupdate
  > autoupdate
  > autoupdate
  > gitlog-to-changelog: revert inclusion of git-log-fix file
  > maint.mk: Relax the copyright check to cater for non FSF projects
  > physmem: use sysinfo if _SC_PHYS_PAGES unavailable
  > exclude: port to strict C99
  > regex: do not depend on malloc-gnu
  > autoupdate
  > expl: avoid incorrect expl(small_value) on OpenBSD 5.4
  > xalloc: allow x2nrealloc (P, PN, S) where P && !*PN
  > fts: avoid unnecessary strlen calls
  > fts: avoid unnecessary strlen calls
  > fts: avoid unnecessary strlen calls
  > autoupdate
  > autoupdate
  > obstack: Remove ancient NeXTSTEP gcc support conditional
  > obstack: merge with glibc changes
  > strftime: wrap macros in "do {...} while(0)"
  > modechange: avoid memory leaks for invalid octal modes
  > autoupdate
  > gitlog-to-changelog: include a dummy git-log-fix file
  > autoupdate
  > update from texinfo
  > gitlog-to-changelog: also include the file, git-log-fix
  > autoupdate
  > regex: port to OS X 10.8.5 en_US.UTF-8 locale
  > maint: fix ChangeLog to match commit record
  > stdint, read-file: fix missing SIZE_MAX on Android (tiny change)
  > parse-datetime: fix crash or infloop in TZ="" parsing
  > * NEWS: Recent changes are not that important.
  > savedir: new symbol for fast-read version
  > unistd: port readlink to Mac OS X 10.3.9
  > * NEWS: Document recent change to diffseq.
  > diffseq: remove TOO_EXPENSIVE heuristic
  > savedir: simplify by using stpcpy
  > spawn: fix link error on uclibc
  > m4: fix gl_TIMER_TIME() detection of threads on uClibc
  > maintainer-makefiles: provide AC_PROG_SED for older autoconf
  > exclude: add support for posix regexps
  > maintainer-makefiles: use $(SED) for syntax check
  > update from texinfo
  > savedir: add sorting arg to savedir, streamsavedir; remove fdsavedir
  > autoupdate
  > update from texinfo
  > update from texinfo
  > file-type: add support for doors and other less-common file types
  > update from texinfo

reply via email to

[Prev in Thread] Current Thread [Next in Thread]