bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tar is creating corrupt archives when soft links are present


From: Timothe Litt
Subject: Re: tar is creating corrupt archives when soft links are present
Date: Thu, 1 Dec 2022 16:14:42 -0500
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0

Thanks for the quick response.   I think I've cut this down even further based on your suggestion.  Here are the results, and some additional context.  Also, another problem with creating archives - very large files are skipped.

i've seen this on Fedora Core 4; the report is on FC 6.  (Yes, they're old.  But tar is new, built from source downloaded from ftp.gnu.org.)

The disk volume is a newly created (VirtualBox) vdi; 2 partitions,  ext3, with the root mounted on hda2.  (boot is on hda1).

The file structure was initialized on a newer Linux machine and the archive extracted.  It's been a long few days, I don't remember if it was fc34 or debian...both were involved in putting things back together.

The original reproducer is cut down from about 130G (a 34G compressed archive).  There are "only" 107 files in /bin.

Here is the information from your suggestions.

The hard link problem reproduces with this (note the two soft links turning into a soft and a hard(!) - according to tar:

# ( cd / && ls -li bin/awk bin/bash && tar -cf - bin/awk bin/bash | tar -tvf - )
22683669 lrwxrwxrwx 1 root root  4 Nov 28 08:45 bin/awk -> gawk
22683657 lrwxrwxrwx 1 root root 21 Nov 28 08:45 bin/bash -> ../usr/local/bin/bash
lrwxrwxrwx root/root         0 2022-11-29 14:37 bin/awk -> gawk
hrwxrwxrwx root/root         0 2022-11-29 14:37 bin/bash link to bin/awk

Clearly, the bin/bash (a) is not a hard link on disk, and (b) does not link to bin/awk.

The attached "hardlink_strace.txt" comes from a simplified command to reduce volume, but it should show the same syscalls:

 ( cd / && strace 2>hardlink_strace.txt tar -cf - bin/awk bin/bash >/dev/null )

A full ls -li is in full_ls.txt

In extract_from_tar_archive_showing_extent.txt is the first ~1900 lines of tar -tvf from an archive that merged all the soft links to "vi" when extracted to disk.  Note that the listing (a) shows the links as hard links (they were all soft on the original disk), and (b) shows the links as to "bin/ex", when in fact they were extracted as "vi".

To me, this all points to soft links being processed as if they were hard - mostly.

Going further with the toy example, we see that while tar reports the links as hard, they are extracted as soft, but with the wrong target for the second link. 

foo]# ( cd / && tar -cf - bin/awk bin/bash | tar -C /root/foo -xvf -  )
bin/awk
bin/bash
foo]# ls -li bin !! This is bin extracted from the archive
total 0
17418579 lrwxrwxrwx 2 root root 4 Dec  1 15:23 awk -> gawk
17418579 lrwxrwxrwx 2 root root 4 Dec  1 15:23 bash -> gawk
foo]# ls -li /bin/awk /bin/bash  || This is the bin that was archived
22683669 lrwxrwxrwx 1 root root  4 Nov 28 08:45 /bin/awk -> gawk
22683657 lrwxrwxrwx 1 root root 21 Nov 28 08:45 /bin/bash -> ../usr/local/bin/bash

To close the shell wildcard lead: if we now use (shell) wildcards, which pick up a couple of extra files), note that the bash link (to ../usr/local...) is still extracted as a soft link to gawk.

Here's the modified test case:

foo]# ( cd / && tar -cf - bin/aw* bin/bas* | tar -C /root/foo -xvf -  )
bin/awk
bin/basename
bin/bash
bin/bash.old
:foo]# ls -li bin
total 732
17418579 lrwxrwxrwx 2 root root      4 Dec  1 15:32 awk -> gawk
17418580 -rwxr-xr-x 1 root root  18484 Oct 31  2007 basename
17418579 lrwxrwxrwx 2 root root      4 Dec  1 15:32 bash -> gawk
17418581 -rwxr-xr-x 1 root root 722684 Jul 12  2006 bash.old

An strace of the above in strace_wild.txt was obtained as shown below (the inode #s are different)

foo]# ( cd / && ls -li bin/aw* bin/bas* && strace 2>/root/strace_wild.txt tar -cf - bin/aw* bin/bas* >/dev/null  )
22683669 lrwxrwxrwx 1 root root      4 Nov 28 08:45 bin/awk -> gawk
22683748 -rwxr-xr-x 1 root root  18484 Oct 31  2007 bin/basename
22683657 lrwxrwxrwx 1 root root     21 Nov 28 08:45 bin/bash -> ../usr/local/bin/bash
22683691 -rwxr-xr-x 1 root root 722684 Jul 12  2006 bin/bash.old
foo]# ls -li bin/
total 732
17418579 lrwxrwxrwx 2 root root      4 Dec  1 15:32 awk -> gawk
17418580 -rwxr-xr-x 1 root root  18484 Oct 31  2007 basename
17418579 lrwxrwxrwx 2 root root      4 Dec  1 15:32 bash -> gawk
17418581 -rwxr-xr-x 1 root root 722684 Jul 12  2006 bash.old

Also, while l didn't keep the build directory for tar, I did keep the configure cache file, which may be helpful.

Not sure if I can recover what's left of the original disk; will try if necessary.  But I think this work has cut the problem down.

a) tar is confused about soft links.
b) it is reporting soft links as hard in -t output, but extracting them as soft
c) The extract uses the wrong target in the soft link - the target of the first soft link that it sees.

# uname -a
Linux  2.6.22.14-100 #1 SMP Wed Apr 8 18:07:54 EDT 2015 i686 i686 i386 GNU/Linux

Finally, an unrelated (except that it hit this incident and prevented an easy restore) issue: tar skips some large files with

tar: root/sd/sd.tar.gz: Cannot stat: Value too large for defined data type

-rw-r--r-- 1 root root 32251081571 May  6  2007 /root/sd/sd.tar.gz

Let me know if I can provide further information.  I appreciate the attention.

Thanks!

Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed. 
On 01-Dec-22 14:24, Paul Eggert wrote:
Thanks for reporting the problem. I'm not seeing the problem with GNU tar 1.34 as shipped with Ubuntu 22.10 x86-64. On this platform, the command:

  cd /
  tar -cf - bin/* | tar -tvf - >/tmp/tar.txt

outputs the attached file tar.txt, which looks OK, as it seems to match the output of the command 'cd /; ls -li bin/* >/tmp/ls-i.txt' which is attached. This is on an ext4 file system. (All the attachments are compressed with gzip.)

What would help to debug here is a smaller reproducer. Can you reproduce it with a smaller command like this?

   tar -cf - bin/awk bin/bash

In other words, make it as small as you can.

Also, even if you can't make it small, it'd be helpful to see the strace output so that we can see the information that tar is basing its decisions on. For example, I ran this command:

  strace -v --trace %%stat -o /tmp/tar-tr.txt tar -cf /dev/null bin/*

and got the attached file tar-tr.txt to see what the stat-like syscalls are yielding; can you do something similar?

Also, can you send the output of 'ls -il bin/*'? The inode numbers would be helpful for debugging, I expect.

Attachment: link_info.tar.gz
Description: GNU Zip compressed data

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]