octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Encoding issues : the DESCRIPTION file


From: Julien Bect
Subject: Encoding issues : the DESCRIPTION file
Date: Tue, 20 Jan 2015 14:27:47 +0100
User-agent: Mozilla/5.0 (X11; Linux i686; rv:31.0) Gecko/20100101 Thunderbird/31.3.0

Hello everyone,

I have started to investigate encoding issues in the generate_html package, following this discussion :

http://octave.1599824.n4.nabble.com/generate-html-breaks-documentation-encoding-tp4668154.html

Before fixing anything, I am currently trying to see what the state of affairs is...

I would like to discuss the specific case of the DESCRIPTION file first (but others, such as NEWS or COPYING, raise similar issues). Reminder : the content of DESCRIPTION is used to create "overview.html".

My question is: which encoding can be used, or should be used, in this file ? Here are a few facts.

=== BEGIN FACTS ===

1) There is no mention of a specific encoding in the documentation

https://www.gnu.org/software/octave/doc/interpreter/Creating-Packages.html

and, to the best of my knowledge, it is not currently possible to indicate which encoding is actually used in the DESCRIPTION file of a given package (or more generally in all the files of a given package).

2) Currently, the "octave-forge" style in the generate_html package currently assumes "iso-8859-1".

3) As far as I can tell, most packages only use US-ASCII (7 bits ASCII) which is a proper subset of both ISO-8859-1 and UTF-8.

4) Some packages already use UTF-8 in their DESCRIPTION file. For instance, there is an "ø" (C3B8, LATIN SMALL LETTER O WITH STROKE) in the generate_html package and a "ë" (C3AB, LATIN SMALL LETTER E WITH DIAERESIS) in the image package.

5) For the packages where DESCRIPTION contains UTF-8 characters, I assume (sorry, not exactly a fact anymore) that the html produced by generate_package_html () has been manually edited to replace "charset=iso-8859-1" by "charset=utf-8". @Søren, Carnë: is that correct ?

=== END FACTS ===

I would like to come up with a solution that is clear and consistent for the *automatic* processing of DESCRIPTION files (no more manual editing should be needed).

Here are some options.

A) Assume US-ASCII. Error if any character > 0x7F is encountered.

A') Same as A, unless a optional ENCODING file is present, in which case DESCRIPTION (and COPYING, and NEWS) is assumed to have the encoding indicated in that file.

B) Assume ISO-8859-1. For "ø" and "ë" this wouldn't be a problem (F8 and EB) but sooner or later a package manager whose name cannot be written in ISO-8859-1 will join the project...

B') Assume ISO-8859-1 with an optional ENCODING file.

C) Assume UTF-8.

C') Assume UTF-8 with an optional ENCODING file (for package manager that *really* don't want to use UTF-8).

D) In A', B' or C', use a new optional field in DESCRIPTION instead of an ENCODING file.

I would vote for A' (just requires a small number of packager managers to add an ENCODING file) or C (doesn't seem to require any additional work at all).

Any thoughts ?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]