help-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Loss of search facility in info in newer releases of Texinfo


From: Alan Mackenzie
Subject: Re: Loss of search facility in info in newer releases of Texinfo
Date: Mon, 11 Oct 2021 20:47:40 +0000

Hello, Gavin.

On Mon, Oct 11, 2021 at 16:43:21 +0100, Gavin Smith wrote:
> On Mon, Oct 11, 2021 at 11:35:06AM +0000, Alan Mackenzie wrote:
> > If there are any other formatting characters above 0x7f inserted by
> > Texinfo, I would also like their "ASCII" equivalents to be used instead.

> I've checked with a test file and the output without @documentencoding
> is close to what you ask for.

OK, thanks.  But this appears not to be documented in the Texinfo
manual.  In fact, the effect of omitting @documentencoding is entirely
undocumented on the page @documentencoding.  This is not good.

I think it is also undocumented that texi2any puts Unicode punctuation
characters around things like @code{foo}.  It sometimes uses ASCII
punctuation characters instead.  Which it uses and when, I think is also
undocumented.  If so, that is also not good.

@documentencoding appears to be doing three jobs, which I think really
ought to be done by separate directives: (i) It specifies the encoding
used in the .texi file; (ii) It specifies the encoding to be used in the
..info file; (iii) It specifies whether to use Unicode or ASCII markers
around @code{foo}, etc.  I'm not sure it does any of these jobs well.

I think my request is really to separate out (iii) from (i) and (ii).

After spending a lot of the weekend and today on this topic I am now
thoroughly confused about character encodings in Texinfo.

> \input texinfo

> @c @documentencoding UTF-8

> @dfn{foo}

> @code{code}

> `bar'

> `hello'

> ``oompa''

> a---b

> c--d

> Herr M@"uller will Sie sprechen.

> @bye

> $ ./texi2any.pl  test.texi -c OPEN_QUOTE_SYMBOL=\` -c CLOSE_QUOTE_SYMBOL=\'
> test.texi: warning: document without nodes
> $ cat test.info
> This is test.info, produced by texi2any version 6.8dev+dev from
> test.texi.

> "foo"

>    `code'

>    'bar'

>    'hello'

>    "oompa"

>    a--b

>    c-d

>    Herr Müller will Sie sprechen.



> Tag Table:

> End Tag Table


> Local Variables:
> coding: utf-8
> End:
> $ 

Where did this "coding: utf-8" in the Local Variables: come from?  Is
UTF-8 now some sort of default in Texinfo?  This coding: setting doesn't
appear at the end of my copy of texinfo.info, even though texinfo.texi
also lacks a @documentencoding command.

What would have happened if the ü had appeared in its utf-8 encoding
0xc3, 0xbc rather than @"u, given that there's no @documentencoding
directive in the source?  This seems also to be undocumented in the
manual.  I should really try this out myself, but I'm too tired at the
moment.

> Notice the OPEN_QUOTE_SYMBOL wasn't used in some of the cases.

> With the @documentencoding line not commented out it is:

> This is test.info, produced by texi2any version 6.8dev+dev from
> test.texi.

> “foo”

>    `code'

>    ‘bar’

>    ‘hello’

>    “oompa”

>    a—b

>    c–d

>    Herr Müller will Sie sprechen.



> Tag Table:

> End Tag Table


> Local Variables:
> coding: utf-8
> End:

> again with the OPEN_QUOTE_SYMBOL and CLOSE_QUOTE_SYMBOL not affecting the
> the output for ` and ' - arguably a bug.

> > > If you remove "@documentencoding UTF-8" from a file, the file is still
> > > assumed to be in UTF-8, but less Unicode is used in the output where it
> > > is not necessary.  Does that help?

It helps my understanding a bit.  It doesn't help me in running texi2any
/ makeinfo, where the files.texi are going to have @documentencoding
UTF-8 in them.  What I really need is a command line switch to tell
texi2any which sort of textual markers to use when the output encoding
is UTF-8.

> > Not really.  I've got too many info files on my system (Gentoo
> > GNU/Linux) to remove that directive from them all each time there's a
> > new version of the file.texi.

> > So, I'm asking you to implement such an option in the next version of
> > Texinfo, or perhaps accept a patch from me which would do this.

> Yes I think it is a valid desire to have such an option, especially as such
> an output is already available by changing the use of @documentencoding.
> (That's why I made @documentencoding have this effect in the first place,
> to give the chance to avoid having unnecessary UTF-8 sequences in Info files.)
> Look at where the 'no_extra_unicode' flag is set in
> Texinfo/Convert/Plaintext.pm - any option should use the same code as this.

OK, I understand that bit of the code now, thanks!

-- 
Alan Mackenzie (Nuremberg, Germany).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]