bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER


From: Gavin Smith
Subject: Re: UTF-8 conversion problem in Texinfo 6.6 with TEXINFO_XS_PARSER
Date: Mon, 18 Feb 2019 20:09:33 +0000
User-agent: Mutt/1.5.23 (2014-03-12)

On Mon, Feb 18, 2019 at 05:36:00PM +0200, Eli Zaretskii wrote:
> Thanks, I think I see the problem.  It's because the code manages
> input_encoding on the input_stack.  Which means each included file
> starts up with input_encoding of zero (which happens to stand for
> latin-1), and when reading of the include file is exhausted, the code
> pops input_stack, so any @documentencoding set by an include file is
> thrown away, and any file included after @documentencoding has its
> encoding reset to latin-1.  But @documentencoding is a global setting,
> and once set, it should remain in effect for any stuff read
> thereafter, until it is changed by another @documentencoding, or until
> EOF.  I think this means input_encoding should be part of global_info,
> not of input_stack.

Okay, this makes sense.  I don't know if I'll have time to fix this in 
the next few days.

I can't remember what I was thinking when I wrote this code, but I may 
have been thinking about the case of different encodings in different
input files.  I don't think that use case is worth supporting, though.

> Btw, I think there's a more general issue here.  It sounds like in the
> absence of any @documentencoding directive, the C parser assumes
> Latin-1, something that doesn't seem to be documented in the Texinfo
> manual, and perhaps isn't even the best default nowadays.  It means,
> for example, that a document with UTF-8 encoded non-ASCII characters
> but without @documentencoding will have its non-ASCII characters
> "converted" on output.  Is that the intended behavior, and is it
> consistent with what the Perl parser does?  If so, I think it should
> be prominently documented, and we should perhaps consider changing the
> default to UTF-8.

It is supposed to be consistent with what the Perl code does and what 
TeX does.  (I would have to check whether texinfo.tex did assume Latin 
1.)  TeX most naturally works with single-byte encodings.

It would be good to make the default UTF-8, as long as there are not too 
many manuals out there that assume Latin 1.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]