coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multibyte support for sort, uniq, join, tr, cut, paste, expand, unex


From: Eric Fischer
Subject: Re: Multibyte support for sort, uniq, join, tr, cut, paste, expand, unexpand, fmt, fold, and pr
Date: Wed, 10 Jan 2018 12:20:33 -0800

You were right that I needed to pay attention to character widths. My
changes in

  https://github.com/ericfischer/coreutils/tree/multibyte

will now handle character widths in all the places where POSIX counts
"column positions" instead of characters.

I have also introduced a "grapheme" abstraction to handle raw bytes
transparently when the input contains character encoding errors. Having
this structured character type has also been useful for finding a few
additional places that assumed that text was bytes.

I think the only work left to do is a little more on tr, to eliminate its
need to know the largest possible wide character encoding.

I have requested and received the copyright assignment paperwork, but my
employer would like to dedicate my changes to the public domain or as CC0
rather than assign or disclaim copyright. Would this be acceptable?

Eric


reply via email to

[Prev in Thread] Current Thread [Next in Thread]