bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Cut not working with multi-byte UTF-8 characters


From: Eric Blake
Subject: Re: Cut not working with multi-byte UTF-8 characters
Date: Mon, 10 Jul 2006 06:21:27 -0600
User-agent: Thunderbird 1.5.0.4 (Windows/20060516)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Patrik Hirvinen on 7/9/2006 8:48 AM:
> Hi,
> 
> This bug was found on an Ubuntu 5.10 GNU/Linux x86 using cut version
> 5.2.1. Locale used was en_US.UTF-8.
> 
> When fed text that includes multi-byte characters, cut makes the
> assumption that one byte corresponds to one character, even though the
> locale would clearly suggest otherwise.

Unfortunately, no one has yet submitted a clean implementation of
multi-byte handling to upstream coreutils, so it is a known deficiency
that the bulk of coreutils' text utilities do not understand multibyte
characters.  Would you care to help by writing a patch?  If so, use this
list as a springboard for discussion; Jim already has several requirements
for what a multibyte implementation must do before he will incorporate it.

- --
Life is short - so eat dessert first!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEskZH84KuGfSFAYARAiVdAJ9dzP+3EBD/e8Ng03+RyBrLnUGjQQCfXlKA
fjBSqZvJwrIc99Bu2wAYkI0=
=CmWn
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]