[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Pathological diff behaviour with mutually-recursive symlinks

From: Colin Watson
Subject: Pathological diff behaviour with mutually-recursive symlinks
Date: Tue, 22 Sep 2009 13:59:29 +0100
User-agent: Mutt/1.5.18 (2008-05-17)


It's quite easy to get diff to exhibit exponential behaviour in the size
of the tree. If you have a tree involving symlinks some of which point
upwards, but to siblings of an ancestor directory rather than to a
direct ancestor, then this forces diff's loop detection to compute all
permutations before it manages to break the loop. Here's a trivial
example, tested with diffutils 2.8.1-13 from Debian (built on Ubuntu) -
run this in an empty scratch directory:

  mkdir a b
  for x in `seq 1 8`; do mkdir b/$x; for y in `seq 1 8`; do ln -s ../$(($y + 
1)) b/$x/$y; done; done
  diff -Nru a b

The output starts as follows:

  diff: b/1/1/1: recursive directory loop
  diff: b/1/1/2/1: recursive directory loop
  diff: b/1/1/2/2: recursive directory loop
  diff: b/1/1/2/3/1: recursive directory loop
  diff: b/1/1/2/3/2: recursive directory loop
  diff: b/1/1/2/3/3: recursive directory loop
  diff: b/1/1/2/3/4/1: recursive directory loop
  diff: b/1/1/2/3/4/2: recursive directory loop
  diff: b/1/1/2/3/4/3: recursive directory loop
  diff: b/1/1/2/3/4/4: recursive directory loop

and emits 191801 lines of errors in total.

Sadly, this is not a theoretical case. The udev package includes a test
suite containing a number of sample Linux /sys subtrees, which tend to
include many upward-pointing symlinks like this. We noticed this bug
(https://bugs.launchpad.net/soyuz/+bug/314436) because diffing two
versions of the udev package produced 104 gigabytes of output before it
filled up the disk. For comparison, uncompressed udev tarballs are on
the order of three megabytes.

My gut feel is that the best way to fix this would be to check for
symlinks pointing up in the directory tree but not pointing to a direct
parent, and skip them if they're underneath a directory which we intend
to compare recursively anyway. The latter check is slightly hairy, as it
probably interacts oddly with -N or lack thereof and whether the
directory is present in the other tree being compared, but I think it
would only go wrong in examples even more pathological than this. The
only alternative I could think of is to keep a note of all directories
we've looked at rather than just the parent chain, and I'm definitely
not happy about the unbounded space consumption that would involve.

What do you think?


Colin Watson                                       address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]