gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] revc responses


From: Eric Wong
Subject: Re: [Gnu-arch-users] revc responses
Date: Thu, 20 Oct 2005 16:57:08 -0700
User-agent: Mutt/1.5.9i

Thomas Lord <address@hidden> wrote:
>  Eric> If work were to continue on revc (and I really hope it does
>  Eric> even though I can't afford to fund it), I'd like to see it
>  Eric> moved to an archive format that:
>         
>  Eric>         1. does not rely on binary metadata formats
>  Eric>         2. uses gzip instead of raw zlib compression
> 
>  Eric> Since most files are pretty small, I highly doubt that any
>  Eric> performance gains by using raw zlib or a binary metadata format
>  Eric> would have a noticeable impact on performance.
> 
>  Eric> I (and I suspect many Arch users) really like the transparent
>  Eric> and orthogonal aspects of Arch, and would like to see Arch 2.0
>  Eric> continue that tradition.
> 
> I chose the binary metadata formats because:
> 
>   a) they cut down code size and speed performance (look ma, no
> parsing!)
>   b) they elegantly solve "whitespace in _____ names" without escaping
> 
> Note that although he metadata formats are partly binary, they
> internally use plain text wherever to do otherwise would make the 
> formats platform-specific.
> 
> Also note that, with only one exception (the format of directory
> blobs), the formats are strings that can't contain a nul character,
> separated by nul characters.   There is at least a minor convention
> in some GNU tools that recognize that format as an alternative to
> newline-separated lines.

 From hexdumping an unzipped prereqs file, I can see that it has a list
of project-name.<commit>/<sha1>+<sha1> text in them, and that it's
zero-padded, but it also has a few arbitrary characters scattered
inside the zero-padded area that I haven't gotten around to
understanding (perhaps you could tell me what they are).

Also, why the extra zero-padding in all those binary files for
alignment?

URL-encoding is a fairly well supported (by other
applications/libraries) way to fix the whitespace in names issue.  It
can break shell scripts, but shell scripters generally have enough sense
to not name files with spaces in them :)

Code size: The tickets already have a parser, and there's also hackerlab
in the code base.
Performance: These are small files we're talking about here.

> I'm unclear what advantage you see (or even what exactly you mean)
> about gzip vs. zlib.  Do you mean fork/exec the gzip program?  Why
> would that be an improvement?

No, there's no need to fork/exec the gzip binary in revc, but you can
create gzip-readable files using the zlib library.

The gzip format adds some extra headers on top of raw zlib compression
(what revc and git are currently using), so it's slightly less
efficient, but still hardly noticeable.

My apologies if you feel I'm being nitpicky, but simple plain-text data
formats give me warm fuzzy feelings :)

-- 
Eric Wong

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]