[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ALL my tar.bz2-backups unreadable, CRC-error!
From: |
Hans-Bernhard Broeker |
Subject: |
Re: ALL my tar.bz2-backups unreadable, CRC-error! |
Date: |
8 Aug 2002 11:06:48 GMT |
Ralph Corderoy <address@hidden> wrote:
> If that was the problem, then it might be possible to speculatively turn
> the dodgy characters pairs back into a single linefeed and see if bzip2
> gets further.
That's quite a can of worms you're about to open. The problem being
that, as you find a CR+LF sequence in the file, there's no way of
knowing, offhand, whether that was a lone LF in the unmutilated
original, or a CR+LF, too.
In a compressed file format, the bytes should be essentially random,
so there's a chance of 1 in 256 that a LF would be preceeded by a
random CR in the original. For each 64K of original file length this
would mean you'ld expect 255 LFs without a CR in front of them, and
one random CR+LF coincidence in the input. But you don't know which
of the 256 CR+LF you're left with it was. Quite a lot of tries would
be needed to find it. Not to mention there could have been several
"real" CR+LFs. If there happen to be 3 in one 64K CRC-checked block,
you'ld have to check 255*254*253 / 3!, i.e. about 2.7 million cases.
It also depends on how "clever" the line end conversion routine was
trying to be --- some would have converted an incoming CR+LF to
CR+CR+LF, but the cleverer ones would try to avoid such doubled CRs
and return a CR+LF, causing the hard problem mentioned above.
--
Hans-Bernhard Broeker (address@hidden)
Even if all the snow were burnt, ashes would remain.