Tim
Your information on the formats really helped. It should be distributed with
the GNU tar source. After all, those are the specifications that it implements.
So, both GNU and "old' GNU formats will output the 'u' 's' 't' 'a' 'r' space
space null sequence - correct ?
It's still not clear to me from the GNU tar source code how that sequence at offset 257
is being generated for the GNU and "old" GNU formats (which is probably what
added to my confusion). Can you point me to the particular lines ? If I had a debugger
I would try stepping the code. I'm trying to wind down this old legacy box, not do more
with it .... sigh !
I did manage to figure out that some files that were being saved had modification
times of 15 December 1942 & December 20 1942 (according to ls -la). The
mtime[] data is no longer valid octal. I suspect this was part of what 7-Zip and
WinZIP were unhappy about. Your web pages warn about negative times. It's a pity
that GNU tar doesn't at least throw a warning message to stderr when it encounters
problems like this. It probably shouldn't encode them in an invalid way, but just
store these out-of-range times as the beginning of the epoch. Thoughts ?
Regards
Jason
-----Original Message-----
From: Tim Kientzle [mailto:address@hidden
Sent: Friday, 4 June 2010 1:12 PM
To: Armistead, Jason
Cc: Dustin J. Mitchell; address@hidden
Subject: Re: [Bug-tar] tar 1.23: Problem under Solaris 10 - incorrect GNU
header contents
If you're looking for details about tar formats,
I wrote up a lengthy man page with a lot of
details about tar format variants.
There are online versions at the libarchive Wiki:
http://code.google.com/p/libarchive/wiki/ManPageTar5
and at the FreeBSD project man page reference:
http://www.freebsd.org/cgi/man.cgi?query=tar&sektion=5&manpath=FreeBSD+8.0-RELEASE&format=html
The mdoc-to-HTML translations seem to have some minor problems, though.
If you don't have access to a FreeBSD system, you might find the
mdoc source to be helpful:
http://code.google.com/p/libarchive/source/browse/trunk/libarchive/tar.5
In answer to your original question, the old "GNU tar" format
violates the POSIX ustar specification in several respects.
(GNU tar came out around the same time as the first POSIX
specification.) Most obviously, it sets the 8 bytes
starting at offset 257 to:
'u' 's' 't' 'a' 'r' space space null
where POSIX ustar archives set those same 8 bytes to:
'u' 's' 't' 'a' 'r' null '0' '0'
The GNU tar format also does not use the ustar
'prefix' field as specified in POSIX and has non-POSIX
extensions for handling long filenames, long linknames,
and sparse files. The mechanism used for sparse
files, in particular, can cause tar implementations
that don't understand this extension to lose header
synchronization.
More recently, GNU tar has added support for the
"pax extended format" which is specified by current
POSIX standards. You can request this format with
the --posix flag to current versions of GNU tar.
Despite the "pax" name, this is really an extended tar
format that has been broadly adopted. It was also
carefully designed so that programs that understood
the old ustar format but do not recognize the pax
extensions would still be able to extract the files
contained in the archive (they would just not restore
any additional file metadata).
Hope this helps,
Tim
Armistead, Jason wrote:
Dustin wrote:
The entire original email was focused on ustar functionality, by my
read. Perhaps you can repeat your experiment, bearing in mind that
you're expecting a GNU Tar archive, and let us know what happens?
My original experiment WAS with GNU formatted tar archives. Some work, and
some don't. I have far larger tar files that are working OK. But this one,
from a very important filesystem, is not. That is what led me to look more
closely at the bytes in the file header records.
With regard to my e-mail, I made a newbie blunder (having never looked under the hood of
tar before), and assumed that because the resulting files contained "ustar" in
the header, they must have been in Ustar format.
If I'm correct it my understanding, a GNU formatted achive should also contain "ustar"
(followd by a null) at offset 257 and "00" at offset 263. Is this correct for GNU format
archives ?
Also, 7-zip claims to support "TAR" format, but doesn't say which
format - are you sure it's designed to support GNU Tar archives? If
you create a tar file with --format=ustar, can you read it with 7-zip?
7-Zip is decidedly vague on what sort of TAR it supports. I now have the
source code, but it still doesn't explain what TAR format(s) it supports. Time
permitting, I'll try to instrument it to figure out where it's breaking, and to
understand what format(s) it supports. 7-Zip's author didn't leave many
comments in his code, and doesn't have the ability to conditionally add in
debugging. It could take me some time.
7-Zip will read many other TAR files. I have been able to download many of them from the Internet without problems.
My concern is, that for whatever reason, my Solaris 10 box with GNU tar 1.14 or
1.23 produces what appear to be incorrect contents in the two fields I
mentioned.
From what I've seen of 7-Zip's source, it isn't checking that the "ustar" and
"00" fields are correct. But, nevertheless, my installations of GNU tar are NOT
producing the same binary output for these as other TAR files I get off the internet. This
troubles me, and makes me wonder what else is being screwed up. I don't want to discover years
from now that my old system is dead and buried, and the TAR files it produced are worthless ...
Maybe the same bug is also causing other problems elsewhere. I just can't be sure !
Regards,
Jason