help-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: explicit extraction of files behind (sym)links


From: Aiyion.Prime
Subject: Re: explicit extraction of files behind (sym)links
Date: Sat, 23 Jul 2022 15:04:21 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0

Heu Reuti,

Thanks for you input. I was afraid I'd need to use a workaround like this.
I like the idea of explicitely packing relative symlinks and use them to determine the real target. What bugs me is the idea to extract the files in order to determine, whether this is the real deal or just a symlink I need to follow first.

If I understand tar correctly, it is stored in a header whether the current file is a file or a symlink.

I think I'll extend your recommendation by a variant of "tar tvf archive.tar targetfile". Gotta find out, whether that is intended to be machine-readable or if there's a cleaner approach to proping the files.

Thanks for your help
Aiyion


On 7/22/22 17:29, Reuti wrote:
Hi Aiyion,

Am 22.07.2022 um 10:13 schrieb Aiyion.Prime <help-tar@aiyionpri.me>:


Good morning everyone,

I thought I knew my way around tar for a few years now, but learned I'm wrong 
about that yesterday evening:

I'm archiving a directory-structure, that does contain large redundant files.

onepath/readme
onepath/binaryblob13
anotherpath/readme
anotherpath/binaryblob13

I don't know your complete workflow, hence I can give only a vague idea:

Assuming you are using symlinks in the above structure:

• instead of archiving the complete directories recursively, create a list of 
files to be saved for `tar`: first all symlinks (as symlinks), then all real 
files
• on extraction --occurrence=1 will stop at the first encounter
• in case it's a symlink, remove the extracted symlink file and extract the 
real file it points to with the name of the symlink file

This should speed up the processing.

-- Reuti


I cannot change the pathing, as this is to be fed to a packagemanager, that 
requires it.

What I thought I could do, to not have an archive twice the size of 
`binaryblob13`, was to use sym- or hardlinks and the `-h` flag for creation.

So archiving this:

onepath/
secondpath -> onepath/

using

tar --sort=name --owner=0 --group=0 --numeric-owner -chvf normal_sized.tar 
secondpath onepath ${mtime})

That would work like a charm if said packagemanger would extract the whole 
tarfile.

This is what it does though:

tar xf $tar_file secondpath/binaryblob13

And that works fine if I extract files from the directory first referenced in 
the creation command (in the case above secondpath)
but returns an error for the latter directory I archived, as it tries to create 
a hardlink on disk pointing to what would've been the former extracted file. As 
it does not exist I've got a problem.

I'd like to avoid extracting all binaryblob13 references beforehand only to 
have the link I extract point to something valid.

Is there a flag to tell tar "I dont care if you have to seacrh the archive twice, 
but extract the original file instead of creating an (invalid) hardlink"?


I realize thats unuseable for actual tape-records, but maybe someone has a hint 
for me here.

Thanks in advance and have a nice morning,
Aiyion





reply via email to

[Prev in Thread] Current Thread [Next in Thread]