bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gawk] 4.0.0 Regex Patterns Choke on Exotic Chars


From: David Millis
Subject: [bug-gawk] 4.0.0 Regex Patterns Choke on Exotic Chars
Date: Sat, 10 Sep 2011 23:13:25 -0700 (PDT)

# A bug in GNU AWK 4.0.0's regex handling?
# 3.1.6 (GnuWin32)/3.1.7 (Jgawk?, had |& intact) worked.
# It cripples manipulation of mildly exotic chars.
# In Windows anyway (Binary: http://www.klabaster.com/freeware.htm#dl).
# I couldn't reproduce it in Debian with 4.0.0.

BEGIN {
  # For this, escaping is no different from pasting the genuine char.
  badChar = "\x95";
  # This is a bullet (\x95, vim: ctrl-v+149) in the Win-1252 codepage.
  # It happens to be in the \x80-\x9f range
  #   where Win-1252 diverges from strict Latin-1.
  # Most apps don't care, but this might be the issue...
  # Hmm, middledot (\xb7, vim: ctrl-v+183) shows the same behavior.

  print badChar; # Print's fine
  print gensub(/\x95/, "@", "", badChar); # Error

  # The char is acceptable as the gsub/gensub replacement arg.
  # But not as the pattern: be it /literal/ or "string".
  # Upon reaching the line, gsub/gensub throw "unbalanced )".
  # Or an "internal error" if used in a character class /[\x95]/.

  # Mundane escapes like \x22 for double-quote are fine.
}


I sent this to Eli Zaretskii, who replied:
> This also happens in 3.1.8 (on Windows).
>
> Please send this bug report to address@hidden,
> I have no idea what is wrong with this character,
> and why only on Windows.


David




reply via email to

[Prev in Thread] Current Thread [Next in Thread]