gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: arch with 'special files'


From: tomas
Subject: Re: [Gnu-arch-users] Re: arch with 'special files'
Date: Wed, 6 Apr 2005 11:20:26 +0200
User-agent: Mutt/1.5.3i

On Tue, Apr 05, 2005 at 03:34:25PM +0200, Jan Hudec wrote:
> On Tue, Apr 05, 2005 at 11:03:28 +0200, address@hidden wrote:

[...]

> > Yup. This metadata business doesn't solve everything. Especially with
> 
> The metadata business does not solve *anything*. Only think it can do is
> prepare ground for actual solutions. And there is no point in
> implementing it until you know what the solutions will actually use.

Well put.

[...]

> Yes. But "metadata" is two broad a term to be useful. We need to tell
> more about how it should behave.

That would be the point of such discussions. The question I am pondering
about is ``is it possible to provide a mechanism which is generic and
simple enough to be worth it and leave policy to the individual instances/
users/whatever? Or would we be just sliding the knot from here to there,
uselessly?´´

> Eg. many formats can be detected by some kind of magic number. And there
> a metadatum saying "files with magic numbers X should get treating Y" is
> a lot more useful than listing those files...!

A kind of `file´ heuristics. And what do you do about changing heuristics?

> > Inexact patching for jpeg images anyone? Or for (ugh!) XML RSS files[1]? 
> > Or...
> 
> I believe there is some kind of xml-diff, that compares the trees, not
> the text ;-). One such is built into openoffice...

As has been said elsewhere -- inexact patching won't work very satisfactorily.
The reason, I think is that inexact patching is closely tied to semantics.
And your classical `hand-edited´ source file bears (by sheer coincidence) a
semantical dimension in the fact that disjoint lines are loosely coupled.

> I fear inexact patching of jpegs won't work, because they are lossy. But
> pngs... And for eg. xcf (Gimp format), I can even imagine a _useful_
> one...

That depends on what you understand by `work´. Do the results just have
to `look similar´ (which, roughly speaking is the equivalence relation
defining `a jpeg image´).

> And even if it's not inexact patching, instead of two versions, you can
> store one version and a difference. And knowing the nature of the data
> can make this more efficient.

This is a completely different dimension. One of the things which confused
me at the beginning is that in Arch (or in CVS, SVN), the cute diff+patch
trick is being used for different things:

 - storage efficiency. This is the most visble, but also the
   least important. Binary diffs or whatever work here as
   well.

 - merge related but different changes, i.e. inexact patching.
   This is the really cool thing about version management, but
   it has a semantic dimension. You can't throw it at any kind
   of file. Nowadays it just kind-of-works for your good-old-
   plain-source-file.

[about file ids and metadata]

> Hm, it's not that easy. There will have to be a set of data types
> (content, type, permissions, ...) that will be recorded for each file,
> and a set of procedures to diff and patch them. This set should be
> extensible and might be different on each platform. However, all the
> standard mapping would have to be built in.
> 
> Note, that for security reasons arch must not run archive-provided
> scripts, so the diff algorithm specification has to be flexible enough
> to be actually useful.

*This* could turn out to be a real killer. I think it'd be difficult
to get things right in the first place. As your attribute set evolves
(along the archive's life) your attribute-related algorithms might
want to evolve too.

How does one solve that?

Regards
-- tomás

Attachment: pgp2vEt5nAWA7.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]