monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] zlib vs gzip


From: Jack Lloyd
Subject: Re: [Monotone-devel] zlib vs gzip
Date: Mon, 29 Oct 2007 11:36:36 -0400
User-agent: Mutt/1.5.11

On Fri, Oct 26, 2007 at 03:06:22PM -0700, Nathaniel Smith wrote:
> > Okay, in the meantime, I figured that gzip and zlib really are two 
> > different encapsulation formats (for the same type of compression 
> > algorithm, though). Matt's [1] mail explains that quite well.
> > 
> > Looks like zlib (> 1.2) supports both, but botan only supports zlib 
> > style headers and footers. We could (and probably should) use zlib only, 
> > instead of gzip, but that would cost a netsync flag day.
> 
> Or maybe we could flutter our eyelashes at Jack Lloyd and get gzip
> support upstream into Botan, like I guess we should have a while ago
> just this doesn't come up very often?
> 
> Jack?

I have held off on gzip mostly because the alternatives,
implementation wise, have been use zlib 1.2's support or use something
like Monotone's gzip.cpp using the raw deflate interface (or use zlib
1.1's support for gzip file I/O using trickery with fdopen and a
pipe). Or write my own zlib/gzip implementation, which seems like it
would be interesting and informative, but is not ranking very high on
my list of ways to spend my time right now. The first I initially
avoided due to versioning issues, and the seconds two feel like hacks
(though then again zlib's support for gzip itself looks to be pretty
much a hack, so I don't know that that is too serious an objection).

Of the four options, my preference is pretty strongly to use zlib 1.2
(1.2.0 is 4.5 years old, 1.2.3 is 2+ years old and has many security
fixes vs old releases - I'd say assuming 1.2 is fairly conservative at
this point), at least for now while zlib is (nominally, at least)
being actively maintained. The raw deflate and headers-by-hand system
seems to be potentially better since zlib's support for gzip headers
is nil (but the trick is to be actually better instead of potentially
so).

So what I would want to see in some happy ideal world: gzip.cpp/.h
(plus any other files needed for impl) in modules/comp_zlib. I know
gzip and zlib could share a lot of code, and if someone wanted to
rewrite/restructure that whole thing to make that easier I would not
object, the code is not great. I would have zero issues with any/all
of the code in that dir assuming zlib 1.2 (for instance using the
new-in-1.2 callback-based inflateBack is supposed to be faster,
according to the header, and both zlib and gzip could use that - and
90% of the buffering logic is going to be the same). I would be
especially happy if gzip header support was moderately functional and
a mini-gzip was written for inclusion in the examples
directory. Keeping API compatability with the existing comp_zlib
filters would be moderately important (and easy to do, I think). I
don't care about binary compatability between major releases though so
a totally different internal structure is fine.

A specific concern I would have with checking the current Monotone
gzip.cc in is the large amount of duplicated code. (The duplicated
checksums, CRC and Adler, is not ideal but not a huge cost, Adler32 is
pretty fast).

So: request is noted. You may see it from me someday (but I have a lot
going on right now). If someone sends me a patch (against
net.randombit.botan) I can live with, I'd check it in.

One comment: zlib (>=1.2) (supposedly) transparently supports both
gzip and zlib decompression. So Monotone could start compressing and
decompressing using zlib now without any problems, and old gzip data
would be silently handled. And considering gzip's only 'extras' over
zlib format are a header (unused) and a slightly stronger checksum
(CRC vs Adler: since SHA-160 and/or RSA will be used anyway, doesn't
matter much), I don't see much reason against doing that. (Keeping in
mind that I haven't tried or tested this, haven't checked every place
Monotone uses gzip, and don't understand how netsync works yet).

-Jack




reply via email to

[Prev in Thread] Current Thread [Next in Thread]