[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gawk] 4.0.0 Regex Patterns Choke on Exotic Chars
From: |
David Millis |
Subject: |
[bug-gawk] 4.0.0 Regex Patterns Choke on Exotic Chars |
Date: |
Sat, 10 Sep 2011 23:13:25 -0700 (PDT) |
# A bug in GNU AWK 4.0.0's regex handling?
# 3.1.6 (GnuWin32)/3.1.7 (Jgawk?, had |& intact) worked.
# It cripples manipulation of mildly exotic chars.
# In Windows anyway (Binary: http://www.klabaster.com/freeware.htm#dl).
# I couldn't reproduce it in Debian with 4.0.0.
BEGIN {
# For this, escaping is no different from pasting the genuine char.
badChar = "\x95";
# This is a bullet (\x95, vim: ctrl-v+149) in the Win-1252 codepage.
# It happens to be in the \x80-\x9f range
# where Win-1252 diverges from strict Latin-1.
# Most apps don't care, but this might be the issue...
# Hmm, middledot (\xb7, vim: ctrl-v+183) shows the same behavior.
print badChar; # Print's fine
print gensub(/\x95/, "@", "", badChar); # Error
# The char is acceptable as the gsub/gensub replacement arg.
# But not as the pattern: be it /literal/ or "string".
# Upon reaching the line, gsub/gensub throw "unbalanced )".
# Or an "internal error" if used in a character class /[\x95]/.
# Mundane escapes like \x22 for double-quote are fine.
}
I sent this to Eli Zaretskii, who replied:
> This also happens in 3.1.8 (on Windows).
>
> Please send this bug report to address@hidden,
> I have no idea what is wrong with this character,
> and why only on Windows.
David
- [bug-gawk] 4.0.0 Regex Patterns Choke on Exotic Chars,
David Millis <=