[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: commit-msg hook

From: Eli Zaretskii
Subject: Re: commit-msg hook
Date: Mon, 13 Apr 2015 18:48:44 +0300

> Date: Sat, 11 Apr 2015 13:09:12 -0700
> From: Paul Eggert <address@hidden>
> Cc: address@hidden
> Since this appears to be the real problem, I'd rather go back to the 
> all-ASCII 
> script, for portability to pre-POSIX shells (probably doesn't matter 
> nowadays, 
> but shouldn't hurt).

Actually, I've been thinking: why do we need to rely on system
libraries to implement UTF-8 and [:print:] correctly?  According to
the Unicode Standard, [:print:] should reject only a small number of
special characters:

  . control characters between 0x01 and 0x1f and 0x7f to 0x9f
  . surrogates
  . unassigned codepoints

I think we should not detect unassigned codepoints, since hundreds of
them are assigned every year, so we are likely to trigger false

As for the other two groups, it should be quite easy to detect their
UTF-8 sequences with relatively simple regular expressions, without
relying on system libraries or up-to-date Gawk/whatever, if we assume
that commit log messages must always be UTF-8 encoded.  WDYT?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]