[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz] 14th
From: |
B. Fallenstein |
Subject: |
[Gzz] 14th |
Date: |
Sun, 14 Jul 2002 22:11:33 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:0.9.9) Gecko/20020414 Debian/0.9.9-6 |
Wow-- three things became clear today.
1. Diffs
Thinking about something else, by chance I hit on the solution for our
problem with generating diffs. The key is to go back to last summer's
scheme of actually diffing two versions of a space, instead of merging
the changes in an undolist together. Merging the undolist can be great,
but it's incompatible with Ted's slice model (as we discussed before--
it's not *impossible* to implement the two together, but it taked the
simplicity of either approach away).
The problems with diffing the current against the last saved versions
were a) that it was too complex and b) that it was horribly slow.
Now, b) came from the terribly stupid way we saved vstreams. If we
estimate that we have only a few thousant "regular" connections in most
slices, I think we should do fine. And for a), I've found a cure.
Let's assume we are diffing some dimension and we have two maps, 'old'
and 'new,' that contain the posward connections along that dimension
(i.e., it contains the connections, where the key is the negward and the
value is the posward cell in the conn). It doesn't matter whether the
maps contain String ids or Cell objects-- except that they of course
have to agree on what they use.
Now, here's the code to generate the diff:
Set connects = new HashSet(new.entrySet());
connects.removeAll(old.entrySet());
Set disconnects = new HashSet(old.entrySet());
disconnects.removeAll(new.entrySet());
Easy, huh?
2. Extending and embedding (XML)
Today I was thinking about Gzz and XML. I've thought about that before,
but today I convinced myself of an idea that's quite radical for our
group: not only supporting an XML file format, but making all cell
contents plain XML. This would mean cells would not contain xanalogical
media content, but XML structures, most simply storable as strings.
This would allow the zz and xu-media parts of Gzz to be decoupled very
nicely, interfacing only in a standards-based way. The xu-media module
would represent xanalogical text as XML fragments, e.g.:
<gzz:span block="XXX" start="12" length="17"/>
<gzz:span block="XXX" start="183" length="8"/>
The client module would then make the zz module store the XML generated
by the xu-media module, and it would use the xu-media module to expand
the XML into actual text.
But we wouldn't be limited to this kind of XML fragment in cells. The
next step would be formatted text, for which we could simply use XHTML:
<gzz:span block="XXX" start="12" length="4"/>
<html:strong>
<gzz:span block="XXX" start="16" length="3"/>
</html:strong>
<gzz:span block="XXX" start="12" length="1"/>
This would be expanded to:
<gzz:span block="XXX" start="12" length="4">foo </gzz:span>
<html:strong>
<gzz:span block="XXX" start="16" length="3">bar</gzz:span>
</html:strong>
<gzz:span block="XXX" start="12" length="1">.</gzz:span>
Which, marking bold stuff like *this*, would be rendered as:
foo *bar*.
(Of course, we can use something else than XHTML if we want to.) Writing
the necessary XML transformation tools seems like a pretty easy task to
me, and more so if we can use Jython, and it would make the xu-media
module much more useful on its own (you could store web pages with it,
and have some server-side script that annotates the pages with links and
transclusions when serving them).
I like this because I'm terribly sick of making schemes for formatted
text in Gzz, and here we could just take a standard and get the
interoperability that comes with it (and all the code that already works
with it) for free. Yet, by having the transformers in the xu-media
module, we get XHTML with external markup.
(As a cell's content can be any XML, not just XHTML, this also allows an
easy unified way for putting image and other non-textual spans in cells:
just have e.g. <gzz:page-span block="XXX" page="17"> in the cell.)
Storing just the XML on the zz side would make that module's job much
easier: it wouldn't have to worry about the media model; in fact it
could treat all content as plain strings (something that *all* known zz
implementation so far are able to do), and still be used by our client
with all the bravado of formatted xanalogical content.
For us, though, that would be only the first step; what I really want is
representing the XML in a zz structure. This would make Gzz an XML
browser, which is nice to show of Gzz's capabilities; but more
importantly, it would allow to make arbitrary zz connections to XML
nodes, *enhancing XML with zz connectability*. Of course this becomes a
problem when the XML is edited; how can we keep our connections? My
answer is to simply put in attributes containing the cell id
corresponding to a node, if any; of course then the editing program has
to keep the attribute, which some might not do, but then it's their problem.
(Of course this is only interesting as long as there are many programs
for handling XML data, but pretty much none for handling zz data. Alas,
it's going to stay that way for some time to come.)
ZZ-connecting XML data may be only mildly interesting when you consider
XHTML, but the point here is that a cell can contain anything you can
represent in XML-- for example, MathML formulas. Consider cloning a
MathML formula into the XHTML structure representing an article you're
writing, while having the rootclone in a ZZ structure where you keep
your formulas. Of course we could create our own ZZ structure
representing formulas, but this way you could readily view your article
in a standard browser. (And at some point, we can still create our own
ZZ structure and have a spacepart that shows it as MathML.)
I called this section "Extending and Embedding," because I believe that
this can be part of making Gzz usable inside something else. The point
is for Gzz to be able to interoperate with XML formats seamlessly--
instead of having converters from XML formats to Gzz structures and
back, we could simply organize XML data in Gzz cells. If you have some
data in an app of yours and you'd like to use ZZ to organize it, you
should be able to-- and since most data nowadays can be serialized as
XML, being able to put XML in cells would be a good bet here. Especially
because other formats than XML would certainly not allow us to put in
our cell ids, while XML will (if the app cooperates).
I also want us to provide an XML serialization, so that Gzz data can be
put inside an XML document. Indeed, I believe that if the above proposal
is accepted, it would make sense to have XML as our native format; given
all the tool support, it should be quite trivial to write scripts that
format it nicely (and that we can pipe them files into). On the other
hand, we could of course use an alternative serialization of the same
data ourselves, one that uses some more readable and/or less
space-consuming format.
Bah, far too much rambling and not enough time to make it shorter.
Anyways. The idea is recorded, and we can discuss it here or on IRC.
One last thing: It seems all very easy to implement to me. If it doesn't
to you, it may be that the explanation is just confusing. Please
comment, and I'll try to re-explain if I didn't make myself clear.
3. Mediaserver indexing
Okay, I'll make this one short. The connection between xu transclusions
are implicit. The connection between a xu link and a document that
overlaps with one of that link's endsets is also implicit. To resolve
both links and transclusions, the lookup we need is "which blocks do
reference span X?". Then we can load those blocks, see whether they're
links or documents, and take appropriate action.
Now, we know that once we go p2p, this will be the *difficult* lookup
(because it's something current p2p systems don't do). However, not only
on a local system but also on webpages it should be relatively doable.
Currently, in a mediaserver pool we have b_xxx files that store blocks
and p_xxx files that store information about pointers. Now, we can
simply have i_xxx files that list all blocks which reference spans in
block xxx ('i' for 'index'). However, the blocks listed in the i_xxx
files would be only blocks which are in that same pool.
This works fine for local pools, and it also works for pools on a
webserver, as long as all links and transclusions we want to follow are
between documents on that same webserver. If that's not enough we could
augment the above scheme with a facility for a webserver to tell us
about other webservers. We could have j_xxx files (for 'jump'), which
would list URLs of pools that have blocks with references to spans from
block xxx-- then the client could go to these pools and look in their
respective i_xxx files.
The data obtained that way wouldn't be comprehensive, but this scheme
would already be Web+ -- i.e., be as good as the web and even better
(you can point to arbitrary other places plus you have *some* implicit
linking). If we could implement this, it would be a really good starting
point for the real p2p search later.
- Benja
- [Gzz] 14th,
B. Fallenstein <=