[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Saving markup formats

From: Oliver Scholz
Subject: Re: Saving markup formats
Date: Tue, 19 Jun 2007 09:43:08 +0200
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux)

Juri Linkov <address@hidden> writes:

>> This is a very old project of mine, and an abandoned one, I am afraid.
>> Of course, anybody is free to make use of the codebase, but I for
>> myself am convinced that it is the wrong approach.
> Could you tell why do you think it is the wrong approach?  This would help
> someone who will do something similar to avoid mistakes you think you made.

Basically the approach was too naive. Basically I started like this:
"Hey, implementing RTF can't be too hard. Let's just take the RTF
spec, write a parser for it, get the text with some text properties
into a buffer and write a major mode for editing it."

Until I realised, a) if you want word processing in Emacs this SHOULD
be designed with different target formats right from the start and b)
even for RTF alone this is not sufficient. IIRC I was originally
planning for flat data structure: RTF's paragraph formatting
properties and character formatting properties each stored as lists or
vectors in a separate text property. Font-lock then would resolve
character formatting properties and apply faces, Fill-paragraph would
resolve whitespace formatting properties. This would work for simple
cases. But it wouldn't in all cases preserve the logical structure of
the original document, if you got it from somebody using a different
word processor. This is a very bad thing; a _reliable_ word
processor---as opposed to an unreliable hack---shouldn't make any
changes to the logical structure of a document unless explicitly
ordered to do it. Also, while it is o.k. to implement only a subset of
RTF in the beginning, the design (or lack thereof) of the data
structure would eventually lead to a dead end.

> How would you do things over if you had enough time?

I'd start designing the data structure. I would do it with an eye on
the various specifications for XML (most notably: the XML info set and
the style properties in CSS), for the simple reason that they were
designed to cover a wide range of needs for text/data representation,
formatting, text processing etc. and that in this area they are tested
by a lot of people. So looking at XML right from the start could help
avoiding shortcomings in the design that lead to dead ends or crude
kludges later. Also, for a word processing suite in Emacs, XML file
formats would be the major target formats besides RTF: XHTML, TEI XML,
DocBook, eventually techinfo XML.

So, IMNSHO spending thought about text representation and rendering in
Emacs is the _very first_ thing to do. Once you have a capable data
structure, parsing RTF is not too hard. You can regard an RTF document
as some sort of weird s-expression, with "{" and "}" instead of
parentheses. It is still a dreadful file format, because of its lack
of constraints, but, again, if and only if you have a well designed
data structure, you have a fighting chance to deal with those dreads.

I'd start with designing a data structure that is a realisation of the
XML info set. (This has nothing to do with pointy brackets. The XML
info set is a specification of requirements for a data structure. IIRC
there is a W3C technical report out there.) This doesn't have to be
DOM. I am pretty confident, that it is rather straight forward to
parse RTF into an instance of the XML info set.

Unfortunately, this is were the trouble starts. With the XML info set
the logical structure of the internal data structure is clear (except
for style properties). But the specifics depend on Emacs' ability to
render text on-screen. Of course, there are ways to implement a
tree-like data structure even with CDATA as text in a buffer right
now---I was experimenting with overlays, for instance. But eventually,
if you really go for word processing, you'd have to enhance the
display engine anyways to deal with certain style properties. So you'd
might as well design both, the tree-like data structure and how the
display engine deals with it, right from the beginning---thus gaining,
possibly, maximum reliability.

Also, after I discarded the naive approach I have spent a lot of
thought on UI issues and I would advice anybody to do the
same---again, right from the beginning, before you write a single line
of code. What is word processing exactly? Current word processors are
hybrid beasts, undecided between DTP software (like Adobe Indesign,
Quark Express, Macromedia Freehand, Inkscape, Scribus) and programmes
dealing the logical structure of documents. Word processors have an
ambiguous editing model, resulting from a long history starting with
their origin as replacements for typewriters. Emacs' editing model on
the other hand is mostly about dealing with text/plain, even were it
goes beyond that. So if you come from the Emacs world alone, you are
bound to unwillingly design your UI according to your best known
editing model (especially because it is easiest to implement in Emacs)
and maybe you even make decisions for the design of the data structure
which make that mandatory. Thus, you'd add to the confusion that's
already there. The result being probably the worst word processor
ever, not the best. In that case, would be better to stick with
packages like Muse or emacs-wiki or something similar, which are
one-way (no reading of other file formats), for your own documents
(that's what I do); and use Abiword or OpenOffice if you receive
documents from somebody else. Word processors are good for document
exchange, that's why they are more popular than DTP software. So,
should the UI make the logical structure which would be stored in the
file explicit? What are the objects a user wants to interact with?
I.e. if there are four spaces at the right margin, and if the user can
put the cursor on those spaces, copy them etc., then those spaces are
an object, the use can interact with. They are _there_. But is this
what he or she wants? Or does she just want a left margin that is 4em
wide? How do you distinguish a 4em margin from four spaces at the

If somebody is seriously going to implement WP with a good large-scale
battle plan, then please drop me a line. I'd might find a little time
to contribute.

1 Messidor an 215 de la Révolution
Liberté, Egalité, Fraternité!

reply via email to

[Prev in Thread] Current Thread [Next in Thread]