[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz] Address meanings, not contents! (Re: Storm blocks and metadata)
From: |
Reto Bachmann-Gmuer |
Subject: |
[Gzz] Address meanings, not contents! (Re: Storm blocks and metadata) |
Date: |
Thu, 27 Mar 2003 17:10:15 +0100 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Benja
It is necessary for the interpretation of the data we get; and it's
usually easy to agree on (people won't too often assign different mime
types to the same bytes). One thing about content hashes is, when two
people put the same file into a hash-based system, they will use the
same identifier for it. With MIME types, that's still pretty much
true; with more elaborate metadata, it isn't.
I certainly wouldn't argue to put even more metadata in the URI.
Using the same identifier is important for queries like, "Which
documents include this image?" If the three documents that use the
image use three different kinds of IDs for it (because they refer to
three different kinds of metadata), you're out of luck.
In the common sense meaning of the question "Which documents include
this image?", "this image" is not defined by the sequence of bytes that
make up a specific jpeg version of "this image" but rather by a
specific visual representation of a thing. Giving an URI to the image
(in the defined, encoding independent common sense meaning) itself and
referencing this URI rather than the URI of the byte-sequence wherever
possible allows answering queries that are closer to our real world
understanding of things (what is concrete for us, is fairly abstract
for the computer, computers deal with abstractions over the raw data to
get the stuff non mathematicians can deal with, this
"abstraction-process" is to be pushed further to get the semantic web).
By the way mime-type isn't so unambiguous, e.g. a text using only a
restricted set of characters may be encoded to the same sequence of
bytes using different encodings.
(...)
Higher level applications should not use block-uris anyway but deal
with an abstraction representing the content (like http urls should).
You mean as in, with content negotiation applied? You use a single URI
which maps to different representations of the same resource?
You name it, the *same* resource. (But each representation is also a
resource itself).
An example to be more explicit:
<urn:urn-5:G7Fj> <DC:title> "Ulisses"
<urn:urn-5:G7Fj> <DC:decription> "bla bli"
This, for example, I would not include here. :-) Firstly, it is
something I would want to be versioned independently: if I change the
description of an image, that should not create a new version of the
image.
Surely not! Where I used literal in the examples one could use a uri
representing the meaning of "bla bli", an attribute value of this URI
would then be a URI for the english expression of that meaning, an
attribute of this URI would be an URI representing this expression
spoken by John, an attribute of this URI would be a byte storm-block
with the mp3 encoding of it.
I think you need a generic versioning system for rdf statement rather
than for the data, later statement must have a mean to put earlier
statement out of the graph (while the older still should be accessible
in the style of the reification "i used to believe (s p v)"
Secondly, I don't see a reason why the URI of the image would need to
refer to this.
me neither ;-). There must be a misunderstanding here.
Thirdly, I don't think that when a file is put into the system-- and
thus given its identifier-- is necessarily the time to create this
kind of metadata. It would seem to hold up the task at hand. Rather,
I'd like to be able to add it later on, and maybe someone else can do
that even better than me-- like a librarian who has scientific
background in giving metadata about stuff.
Of course. Mechanisms of the application should probably add some
metadata that give the user a chance to find the data later, but there
should always be the possibility to enter a new version of the metadata.
(...)
In this example application should reference "urn:urn-5:G7Fj" (which
does not have a mime type) rather than "urn:content-hash:
Dj&/fjkZRT68" (which has a mime type in a specific context) wherever
possible, in many cases a higher abstraction "urn:urn-5:lG5d" can be
used .
Um, using a urn-5 doesn't work since it's just a random number-- if we
use just a random number, we cannot check whether the data we may
retrieve from a p2p network is really what the person making the
reference wanted us to see. We would need to use "urn:foo:ref:[blah]",
which would be the above RDF data, from which we could then get the
specific representation.
The urn-5 URIs are intended to reference a certain
concept/idea/meaning/topic, peoples are free to associate attributes to
existing URIs. They may be subject to change like terms in natural
language are, if somebody wants to use a term in a specific sense she
has to make this explicit, maybe using digital signature stuff, but
more often I think a key free trust system
(http://www.w3.org/2002/03/key-free-trust.html) is not only enough, but
more adapted to "fuzzy" trust levels in a P2P network.
While you can only deficiently use http to server a block,
Why?
The only http-header you can send back is the length and if you put it
in the URI the content-type, most http features are unused.
you could server the uri of both the abstractions (urn:urn-5:G7Fj and
urn:urn-5:lG5d) directly using http 1.1.features.
(Again, you'd have to use hashes, or you could be arbitrarily spoofed.)
(Again. No good networking without trust mechanisms ;-)
(...)
And how do you split the metadata in blocks
Well, depends very much on the application. How do you split
metadata into files? :-)
Not at all ;-). The splitting into file is rudimentary represented
meta-data, if you use RDF the filesystem is a legacy application.
Um, but if you put metadata on an http server, you split it too?
My approach would be to split the data just in time. To make it
accessible over http a standard request the server could return all the
statements where a specific URI occurs, or only where it is the
subject. An extended request could contain the level of expansion
requested.
(...)
Cheers,
Reto
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (Darwin)
iD8DBQE+gyJtD1pReGFYfq4RAgiFAKCEEvE6v/NwTl1ebjge5YPx9UAtqACgqXvF
RpcbVqiDuvMrGt9ReDMGZLI=
=TRAL
-----END PGP SIGNATURE-----