[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: dealing with non-ASCII-safe encodings
From: |
Bruno Haible |
Subject: |
Re: dealing with non-ASCII-safe encodings |
Date: |
Sat, 06 Mar 2021 21:17:57 +0100 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-203-generic; KDE/5.18.0; x86_64; ; ) |
Paul Eggert wrote:
> However, my worry is that good support for non-ASCII-safe encodings like
> Shift-JIS is hard to do, and that any such support we'd add to
> Gnulib/coreutils/etc. would not only increase maintenance costs and
> reduce runtime performance
Shift_JIS is not the only non-ASCII-safe encoding; GB18030, BIG5, BIG5-HKSCS,
and GBK are as well, and among these GB18030 is used as locale encoding
in China. Therefore it is important for programs to support these locale
encodings.
Gnulib has the support for it:
- It has replacement functions that operate correctly with these locale
encodings:
strstr, c_strstr -> mbsstr
strchr -> mbschr
strrchr -> mbsrchr
strspn -> mbsspn
strcspn -> mbscspn
strpbrk -> mbspbrk
strsep -> mbssep
strtok_r -> mbstok_r
- It has warnings (through _GL_WARN_ON_USE) for uses of the functions
that are not OK for non-ASCII-safe encodings.
- It has modules mbchar, mbiter, mbfile for iterating through the
multibyte characters of a string or file, that work for all locale
encodings.
Yes, it does reduce the performance to use these safer functions.
I have shown in the past, through coreutils patches, how to accommodate
both a "fast path" and a "safe path" in the same binary.
Bruno