bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gettext] broken handling of unicode code point escapes in Tcl


From: Guido Berhoerster
Subject: Re: [bug-gettext] broken handling of unicode code point escapes in Tcl
Date: Wed, 26 Jun 2013 11:27:22 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

* Daiki Ueno <address@hidden> [2013-06-26 04:22]:
> Guido Berhoerster <address@hidden> writes:
> 
> > I still wonder why you're substituting \u escapes with unicode
> > characters at all, as that potentially allows unescaped control
> > sequences which make the .po file quite fragile?
> 
> I agree that interpreting \u escapes might cause confusing output for
> Unicode control characters, but I don't think it is totally unuseful.
> 
> I can think of at least a couple of benefits of the current behavior:
> 
> 1. translators are provided with decoded (human-readable) strings
> 2. strings escaped in different escaping schemes (e.g. \U in Python) can
>    be unified
> 
> Perhaps an idea might be to introduce gettext-specific Unicode escaping
> scheme (which may only escape control characters) and add an option to
> xgettext to use it.

It can be a bit more complicated than just control characters,
e.g. certain space characters such as U+00A0, U+202F or U+2001
are also non-obvious but not control sequences. Maybe a better
option would be to offer substitution of only alphanumeric and
punctuation characters rather than non-control characters.
Or you could simply add an option to not substitute \u escapes
at all, that is the behavior of the diverse native Tcl
.msg-format extractors that float around (e.g. thos included in
in tkabber or coccinella) and what I'd personally prefer.
-- 
Guido Berhoerster



reply via email to

[Prev in Thread] Current Thread [Next in Thread]