gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: [PATCH] arch speedups on big trees


From: Aaron Bentley
Subject: Re: [Gnu-arch-users] Re: [PATCH] arch speedups on big trees
Date: 09 Jan 2004 09:22:06 -0500

On Thu, 2004-01-08 at 17:51, Miles Bader wrote:

> 
> I don't know -- the current codebase doesn't seem to use
> arch_binary_files_differ before diffing; maybe your code-base does

His does, as does mine.

> Doing a binary comparison before diffing is a solution to this problem,
> but of course ends up reading the files twice.  

For hard-linked trees, my branch will avoid the read.

> This sort of thing is
> _normally_ covered up in NFS (and in linux, in local filesystems too) by
> short-term caching, _but_ I'm not really sure how confident I can be about
> this; for instance, what if there are lots of really big files, will only
> parts of them be cached, resulting in redundant reads even when very close
> in time?  

Remember that binary_files_differ is run immediately before diff, so
when diff is run, the last-read files will have been the two files
passed to binary_files_differ.  The liklihood that the files are still
in the cache is very high.  Diff won't accept two file descriptors, so
solving this problem would mean

- integrating diff into tla, or
- splitting diff into libdiff + a front-end or
- adding a --3 argument to diff to treat file descriptor 3 as a second
stdin
- defining a "fast-filesystem", e.g. tmpfs or ext2, that tla writes
files to during binary_files_differ and invokes diff on.

These are pretty drastic options, so you'd want to be certain they
solved a serious performance problem, and even then. . .

> Does the added efficiency of not invoking the diff program make
> it worthwhile anyway?  I guess the answer probably depends on what
> filesystem you're using...

Yes, and partly on how quickly your box can invoke diff.

Aaron

-- 
Aaron Bentley
Director of Technology
PanoMetrics, Inc.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]