[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #55195] "help" for core functions contains odd
[Octave-bug-tracker] [bug #55195] "help" for core functions contains odd symbols for non-ASCII characters
Mon, 10 Dec 2018 12:48:55 -0500 (EST)
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0
Summary: "help" for core functions contains odd symbols for
Project: GNU Octave
Submitted by: mmuetzel
Submitted on: Mon 10 Dec 2018 05:48:53 PM UTC
Severity: 4 - Important
Priority: 5 - Normal
Item Group: Regression
Assigned to: None
Discussion Lock: Any
Operating System: Any
Character encoding strikes again. Does the lexer keep track of whether .m
files are from core?
When Octave is configured to use an mfile_encoding other than UTF-8, help text
of function files that are encoded in UTF-8 is displayed with odd characters.
On Windows, this happens with Octave's default settings. Other systems aren't
affected by default (but only if the user configures to use a different
E.g.: "help sym" displays a lot of scrambled characters. That is because that
file is encoded in UTF-8 but we assume it to be encoded in the configured
mfile_encoding. Converting it from SYSTEM (CP1252 in my case) to UTF-8 creates
these odd characters.
This is a regression. (Before, we didn't worry about encoding but had problems
handling string vectors from user functions or interacting with the file
That conversion is done in input.cc in function "file_reader::get_input".
Can we differentiate between .m files from the core or packages (which
probably always are UTF-8) on the one hand and user created .m files (which
could have any encoding) on the other hand at that point? Does the lexer keep
track of this?
What about texinfo settings such as "@documentencoding UTF-8"? Should we parse
for them and do the conversion only conditionally?
Should we skip the help text in the conversion completely? In that case, we
might have to move the conversion elsewhere (to the lexer?).
Alternatively, we could revert the conversion in "help.m" (only if we discover
an "@documentencoding" command?) for functions from the core or from
But text in strings in functions from core Octave or from packages are
probably encoded in UTF-8 as well (independent from the current
mfile_encoding). So we shouldn't convert functions from core or packages at
all and only do the codepage conversion on user functions.
This might also affect how we should open function files from core Octave (or
from packages) in the embedded editor.
Reply to this item at:
Message sent via Savannah
- [Octave-bug-tracker] [bug #55195] "help" for core functions contains odd symbols for non-ASCII characters,
Markus Mützel <=