bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: questions following upcoming POSIX issue 8 release - iconv


From: Eric Blake
Subject: Re: questions following upcoming POSIX issue 8 release - iconv
Date: Thu, 13 Jun 2024 13:11:35 -0500
User-agent: NeoMutt/20240425

On Wed, Jun 12, 2024 at 02:36:00AM GMT, Bruno Haible wrote:
> Eric Blake wrote:
> > - https://austingroupbugs.net/view.php?id=1635
> 
> I added a couple of comments there, to explain the problem.
> It is irritating to see that the Austin Group in defect 1007 apparently
> acted like "we don't understand the rationale of the defect and we
> don't think that GNU does worthy implementations, therefore let's
> take what Solaris did and standardize that instead".
> 
> > Questions on whether iconv() needs to be able to distinguish between
> > errors on the input (an invalid multibyte sequence) from errors on the
> > output (no character available to properly represent the
> > transliteration of the recognized input).
> 
> This is also a misunderstanding of the problem. For the standard's
> purpose is would be OK to fail with EILSEQ for invalid input and
> with some other errno value for unconvertible valid input; it would
> "just" cause a problem to glibc, which cannot change its behaviour
> after hundreds of applications have seen EILSEQ come out for the last
> 23 years.

Is there a way, perhaps via the (currently non-standardized) iconvctl
[1], to opt in to which errno behavior to get?  Or maybe the addition
of new modifiers in the encoder, comparable to how //TRANSLIT is
already a modifier?

[1] Oracle: https://docs.oracle.com/cd/E36784_01/html/E36874/iconvctl-3c.html
MacOS: 
https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/iconvctl.3.html
FreeBSD: 
https://man.freebsd.org/cgi/man.cgi?query=iconvctl&apropos=0&sektion=3&manpath=FreeBSD+10-current&format=html

The Austin Group also noted today that we may want to add a
//NOTRANSLIT, and leave it implementation-defined whether //TRANSLIT
or //NOTRANSLIT is the default behavior when neither encoder modifier
was specified.

> 
> The problem is that the applications want to get *notified* about
> non-identical conversions, in a way that they can check the quality
> (or possibly insert replacement characters of their own choice) and
> then continue the conversion.

Indeed - a separate errno value to distinguish between "I stopped
because there are non-character bytes in the input" and "I stopped
because even though I recognize the input, I have no loss-less way to
produce output" seems to be what is wanted.  Is the problem now one of
figuring out how to implement that distinction for programs that want
it without breaking back-compat for older programs not expecting it?

> 
> Bruno
> 
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]