coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Pair-wise file operation (copy, link)


From: William Bader
Subject: Re: Pair-wise file operation (copy, link)
Date: Mon, 26 Aug 2024 00:47:03 +0000

>Since you were reporting 2 min, was wondering what your platform is and 
>whether there might be something else involved eating the 2 min realtime?

Shouldn't any modern operating system do enough caching of inodes and files 
(like the file with the "cp" executable) that the only difference should be the 
CPU time for "cp" to initialize and parse its command line?

Does it make a difference if you run "cp" and have some large directories $PATH 
before /bin compared to running "/bin/cp"?  Shells like bash hash paths to 
commands, but that wouldn't help if each "cp" runs from a fresh shell from 
"xargs".

Does it make a difference if the source or destination files are absolute or 
relative paths?

Does it make a difference if one of the path components is a network mount that 
can't be cached and requires sending requests to a remote server?

What is the locale? On my Fedora 40 laptop, strace shows that "cp" with 
LANG=en_US.UTF-8 opens /usr/lib/locale/locale-archive , which is 229,754,784 
bytes, although it then uses mmap and probably reads just a few bytes.

On my Fedora 40 i7-12800H laptop, "cd /tmp && touch abc && time /bin/cp abc 
def" shows "real    0m0.004s". 2500 copies would scale to 10s.

Does the person with the problem have a file system that gets slow when several 
thousand files are in a directory?

________________________________
From: coreutils-bounces+williambader=hotmail.com@gnu.org 
<coreutils-bounces+williambader=hotmail.com@gnu.org> on behalf of Glenn Golden 
<gdg@zplane.com>
Sent: Sunday, August 25, 2024 7:34 PM
To: Yair Lenga <yair.lenga@gmail.com>
Cc: P=C3=A1draig Brady <P@draigbrady.com>; Coreutils <coreutils@gnu.org>
Subject: Re: Pair-wise file operation (copy, link)

Yair Lenga <yair.lenga@gmail.com> [1970-01-01 00:00:00 +0000]:
>
> In my case, I have to bulk-move about 2500 files. This is part of a
> recurring sync job that has to mirror an existing hierarchy into a new
> hierarchy with different naming rules.
>
> It takes no time to create the mapping (even in bash script, case
> statement). When I "pipe" the mapping into "ln" (with xargs) it takes >2
> min to create the symlinks. Practically, all the time is spent on launch=
> ing "ln". With a custom perl script - it's 3 seconds.
>

2c observation:

Years ago I had a similar weekly need at work, except for an even larger
number of files (10k - 20k or so iirc), and always used a one-liner xargs
script to do the copy.  My recollection is that it would complete in "a few"
seconds (maybe 10s or so). I couldn't find that script, but I just tried
it now manually: Created 2500 randomly named files, each comprising 4kB
random data, and then copied them to new names like this

    $ cat fmap | xargs -L1 cp

where fmap is the name-mapping file, comprising 2500 lines like

        oldname0   newname0
        oldname1   newname1
        oldname2   newname2
           .          .
           .          .
           .          .

It took under 4 seconds, plus another 1-2 seconds for the sync.  This was on
a commodity x86_64 laptop. The target filesystem was the same as original.
Device is a slow 20-year old HDD.

Since you were reporting 2 min, was wondering what your platform is and
whether there might be something else involved eating the 2 min realtime?

Glenn



reply via email to

[Prev in Thread] Current Thread [Next in Thread]