gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Re: File-tpye plug-in architecture for Arch?


From: Tom Lord
Subject: [Gnu-arch-users] Re: File-tpye plug-in architecture for Arch?
Date: Mon, 22 Dec 2003 12:46:11 -0800 (PST)


    > From: michael josenhans <address@hidden>

    >> Pick your favorite generic XML-diff/patch tool and tell me under what
    >> conditions it is 

    >> (a) guaranteed to produce an XML document valid under that DTD when
    >>     applying diffs between meta.xml versions A and B to a meta.xml
    >>     version C.

    > That is what I would expect from such tools. I think some tools
    > I have seen do this already.

    > An application must be able to cope with any valid meta.xml
    > file.

Let's start with three XML documents, A, B and C.

All three documents conform to a particular DTD -- we could be
specific and say "The DTD for meta.xml files in OpenOffice documents".

We run something like:

        % xmldiff A B > A-B.xmldiff

        % xmlpatch A-B.xmldiff C > D

Is D _guaranteed_ to be valid according to the DTD?

Why does this matter?  Because applications that process OpenOffice
files _as_ OpenOffice files are only required to read meta.xml files
that _do_ pass the DTD.  Even if they are tolerant of meta.xml files
that do not pass the DTD, it is impossible that they can do anything
generally useful with them (for a non-programmer or programmer who is
not expert in the meta.xml DTD).

It is _essentially_impossible_ that you have seen generic XML
diff/patch tools which guarantee that D will conform to the DTD (with
the exception of a small subset of possible DTD's) and simultaneously
that they actually provide diff/patch functionality.  There is not
enough information present in the DTD, A, B, and C to allow such tools
to work in general.

There are only two ways such a tool can work in this case:

(1) if it knows _more_ about the meta.xml format than is present
    in the files or the DTD --- if it is not an XML-diff/patch but
    is instead a meta.xml-diff/patch.  This is in fact what I propose
    that people interested in these matters set about to build -- and
    my advice is not leap to the conclusion that it is a simple hack
    on top of a generic XML-diff/patch tool.

(2) If the generic XML-diff/patch algorithm is standard, well
    described, and easy to reason about mathematically -- and the
    designers of the meta.xml format thought about it carefully while
    designing their DTD.

The irony here is that the meta.xml format itself is sufficiently
simple that quite possibly, simply by coincidence, the outcome is the
same as if (2) had taken place for some J. Random XML-diff/patch tool.
While that _might_ be true, it is unlikely that the coincidence all
applies to other OpenOffice document components such as style.xml and
content.xml.

Finally, and _again_, mere validity of the output is probably not a
useful level of functionality.   More likely, we'd want the diff/patch
tool to introduce _entirely_new_ markup to explain to a user what it
has done.   But you at least agree on that point:


    >> (c) guaranteed to produce a meta.xml output which is not only valid 
    >>     but useful

    > Defintely not.


Regarding:

    >> (d) guaranteed to produce byte-wise identical-to-B output when
    >>     applying the diff between versions A and B to A.

    > It will generate an XML equivalent XML file.

In other words, the guarantee is not provided.   This is actually a
fairly serious drawback.


    > xml-diff(patch( A, xml-diff(A,B)), B) = empty.

Yes, but (for example):

        % xml-diff A B | patch A | md5sum > ,x
        % md5sum B > ,y
        % cmp ,x ,y || echo bzzzzt -- this blows
        bzzzzt -- this blows


    >> Repeat that for each of the other XML-subset types used in OO format.

    > The statements above will apply to all diff files.

Except for the false one regarding (a).  And the statement regarding
(d) is not exactly an endorsement of the approach of using a generic
XML-diff/patch.

    >> And at _second_best_ you are backtracking to say "well, we won't use
    >> the XML-diff/patch tools for _that_" but then why would we bother with
    >> them at all when ordinary diff or xdelta would do just as well for the
    >> more restricted purposes?

    > I do not see how tranditional patch tools help with XML-files,
    > especially thosed edited with various dedicated XML or other
    > dedicated editors.

You're missing context.  In this "second best" scenario, the only use
of the diff tool is to reduce the size of changesets and more generic
tools, which are not XML-specific, can do that roughly as well.

    > Likely every SVG editor will save an SVG file with a different layout 
    > and represenation.

Yes, and since these files are _both_ XML documents _and_ bytestreams,
it is important not to gloss over those differences.

Consider, for example, the new complexity you are proposing to create
regarding the otherwise simple problem "Are these two filesystem trees
identical in content?"

-t





reply via email to

[Prev in Thread] Current Thread [Next in Thread]