[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Decorrupting a .tar file

From: Jakob Bohm
Subject: Re: Decorrupting a .tar file
Date: Wed, 11 Nov 2020 21:01:52 +0100
User-agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0

On 2020-11-11 00:39, I. Hope Nothing wrote:
Hello all,

I have a large (183 GB) .tar file that has become corrupted.  This is actually the _secondary_ backup of this data.  The primary backup (a USB HDD) was lost, so I was disappointed to find that _this_ backup isn't easily accessible.

From inspection and memory, it seems that this .tar file was corrupted by a poorly invoked file transfer operation, e.g., FTP with mixed up ASCII/binary settings.  Each line ends with '^M' before the '\n', and because this tarball has a lot of binary data in it `dos2unix -f` is unlikely to restore all occurrences of mangled line endings.

The first line of the .tar file is "Password:", and I can think of several possibilities as to how this could have happened.

I have made a copy of the file to perform surgery on it. Unsurprisingly, the results of `dos2unix -f corrupted_tar_file.tar` crash out after only a couple of dozen entries when listing: `tar tvf corrupted_tar_file_unix_eol.tar`.

There's a lot of binary data I want to keep on here.  I am willing and keen to learn how to forensically retrieve my data, and I would greatly appreciate any help pointing me in the right direction.  Thank you for reading this far already!!

If you need transcripts of anything please let me know!!

This is simple hints for attempting manual rescue.

1. If possible, obtain a less corrupted copy of the tar file.
  For example, if it was corrupted when extracting it from a tape
  over ssh or rlogin, try extracting it again using a binary-safe
  protocol.  Similarly if it was corrupted after decompressing with
  gzip, bzip2 or any other such tool, try decompressing again.

2. Try to obtain a dos2unix implementation that doesn't try to be
  "smart", basically, you need to do a binary search replace from
  \r\n to \n while leaving alone any other bytes with the value 13.
  This will still loose any \r\n sequence that was in the original
  data, but there will probably be less corruption than in the file
  that was erroneously subjected to the opposite search replace.

3. Look up the tar file format specifications, it is actually a
  relatively simple file format and you will need to understand it
  to do the manual data rescue.  In particular, you will need to
  understand the PAX and GNU extensions to the format.

4. Using a binary file viewer, look for the tar header that marks
  the start of a much wanted file.  Then look for the tar header
  of the next file in the archive.  The bytes between the two
  headers are supposed to be your file contents and the header
  before the contents should give the number of bytes in the
  uncorrupted file.  If you did step 2 above, the actual data
  will probably be slightly too short due to too many removed \r
  characters, or due to the terminal protocol also removing some
  other bytes.

5. Use knowledge of your actual file format to figure out where
  an \r was probably lost and use the correct file length from
  the tar header as a cross check of your efforts.

6. Repeat steps 4 and 5 for each file.

Good luck, you will need it.

Jakob Bohm, CIO, Partner, WiseMo A/S.
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

reply via email to

[Prev in Thread] Current Thread [Next in Thread]