[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Texmacs-dev] XML

From: Joris van der Hoeven
Subject: Re: [Texmacs-dev] XML
Date: Tue, 3 May 2005 13:03:33 +0200
User-agent: Mutt/1.5.6+20040907i

On Mon, May 02, 2005 at 07:05:33PM +0000, Felix Breuer wrote:
> > Recall that we still should replace entities &char; by markup
> > <tmsym>char</tmsym> or <tmsym name="char"> in the TMML format too.
> The drawback of <tmsym name="char"/> is that this cannot be easily
> transformed via XSL.

And what about <tmsym>name</tmsym> ?

> Nonetheless we need to get rid of the &char; entity names,
> because the XML standard only defines &lt; &gt; &quot; and &amp;
> There are standartized names for many unicode characters, but these have
> to be declared inside an XML document for it to be well-formed.

Yes, we definitely should produce completely standard XML.

> One possibility would be to produce correct character codes (e.g. &#ABCD;)
> This would yield TMML documents that are well-formed XML, and could this
> be easily transformed into other formats via XSL. True, this is not as
> readable.

There are two points here: now and then I add new characters,
and I may forget to update all tables. We still need a mechanism
which produces correct XML even in such situations, or automatically
update the necessary tables (in a backward-compatible way!).

Secondly, Unicode indeed supports a range of characters which
may be application-dependent. This is what we may use if someone
feels for completing David's unicode tables.

Finally, you should remind that the TeXmacs characters which are
not in Unicode are quite "exceptional". One should not bother
too much about them, except that the TeXmacs<->XML converters
should be painstakingly correct.

> An alternative would be to write a proper TMML DTD where all needed
> entity names are declared. Then we could write &char; and the doc would
> be well-formed. Another advantage would be, that we could treat TeXmacs
> symbols that _are_ in unicode and those that _are not_ identically: In
> the DTD we could declare (pseudo-syntax):
>   &somecharinunicode;    -> &#AB34;
>   &somecharnotinunicode; -> <tmsym name="somechar"/>

I would rather like the generation of such a DTD to be automatic.
Besides, this approach has the disadvantage that the generated documents
are no longer stand-alone.

On the other hand, the automatic generation of DTD's is an interesting
topic on its own. Besides increased readability of character names,
we might want to generate a DTD for the TeXmacs tags as well.
However, I feel that this still has to be postponed for a while.

> >From within TeXmacs we would always generate entity names.
> I could try to implement such a DTD, but it will take quite some time
> till I get to it.

It would probably be best to:
  1) Implement a safe <tmsym>name</tmsym> scheme to which we may fall back.
  2) Define a clean table of TeXmacs character extensions and
     map it to an application-dependent Unicode range.

> No, I have not forgotten about the literate programming plugin :)

Yes, I think that it would be more interesting to concentrate on that.
Maybe you could just do (1) above and start (2) if you have time.

Best wishes, Joris

reply via email to

[Prev in Thread] Current Thread [Next in Thread]