bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Quadratic performance due to symlink processing


From: Gavin Smith
Subject: Quadratic performance due to symlink processing
Date: Sun, 24 Jul 2022 18:47:46 +0100

I wondered if the GNU tar developers were aware of the issue raised in
this article?

https://mort.coffee/home/tar/
also https://news.ycombinator.com/item?id=32206579

"The tar archive format, its extensions, and why GNU tar extracts in
quadratic time." (Martin Dørum, 2022-07-23)

I searched the mailing list archives for the word "quadratic" and did find
any results, so apologies if this has been discussed before or is well
known about.

Here's the main explanation from the article:

> When GNU tar encounters a hard link or symlink with ".." as a path
> component in its link_path, tar will create a regular file in its place
> as a placeholder, and put a note about the delayed link in a linked
> list datastructure. When it's done extracting the entire archive,
> it will go through the whole list of delayed links and replace the
> placeholders with proper links. So far, so good.
> 
> The problem comes when trying to extract a hard link which doesn't
> contain ".." as a path component in its link_path. GNU tar wants to
> create such hard links immediately if it can. But it can't create a
> hard link if the target is occupied by a placeholder file. That means,
> every time GNU tar wants to create a hard link, it first has to walk
> the entire linked list of delayed links and see if the target is a
> delayed link. If the target is a delayed link, the new link must also
> be delayed.

In theory this should be quite straightforward to fix, assuming the analysis
stated here is correct, by using a different data structure to the linked
list.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]