[help-texinfo] xml id characters

From: Karl Berry
Date: Sun, 31 Dec 2006 19:19:10 -0600

Hello Per,

Since you've done so many improvements in the makeinfo Docbook output, I
thought I'd ask you about this.  A Texinfo user has 8-bit characters in
his node names.  They are being munged to dashes in the Docbook output.
For example:

    Here is what's in the French XML file:
    <sect1 label="" id="Pr-requis-pour-Debian">
    The accented characters are replaced by "-". It should have been:
    <sect1 label="" id="Prérequis-pour-Debian">

This is happening in the xml_id function in makeinfo/xml.c:

    { /* Check if a character is allowed in ID attributes.  This list differs
         slightly from XML specs that it doesn't contain underscores.
         See, ``9.3 Name''  */
      if (!strchr 
("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.", *p))

In the reference given, I don't see LCNMCHAR being defined, but ok, I
guess I can believe it is just good old ASCII.

However, I had thought that XML, being based on Unicode, allowed more or less
anything in its id's.  E.g.,

Can you shed any light on this?  Can we just allow anything (except ")
in the Docbook/XML and Texinfo/XML id values?


