[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Pair-wise file operation (copy, link)
From: |
William Bader |
Subject: |
Re: Pair-wise file operation (copy, link) |
Date: |
Mon, 26 Aug 2024 00:47:03 +0000 |
>Since you were reporting 2 min, was wondering what your platform is and
>whether there might be something else involved eating the 2 min realtime?
Shouldn't any modern operating system do enough caching of inodes and files
(like the file with the "cp" executable) that the only difference should be the
CPU time for "cp" to initialize and parse its command line?
Does it make a difference if you run "cp" and have some large directories $PATH
before /bin compared to running "/bin/cp"? Shells like bash hash paths to
commands, but that wouldn't help if each "cp" runs from a fresh shell from
"xargs".
Does it make a difference if the source or destination files are absolute or
relative paths?
Does it make a difference if one of the path components is a network mount that
can't be cached and requires sending requests to a remote server?
What is the locale? On my Fedora 40 laptop, strace shows that "cp" with
LANG=en_US.UTF-8 opens /usr/lib/locale/locale-archive , which is 229,754,784
bytes, although it then uses mmap and probably reads just a few bytes.
On my Fedora 40 i7-12800H laptop, "cd /tmp && touch abc && time /bin/cp abc
def" shows "real 0m0.004s". 2500 copies would scale to 10s.
Does the person with the problem have a file system that gets slow when several
thousand files are in a directory?
________________________________
From: coreutils-bounces+williambader=hotmail.com@gnu.org
<coreutils-bounces+williambader=hotmail.com@gnu.org> on behalf of Glenn Golden
<gdg@zplane.com>
Sent: Sunday, August 25, 2024 7:34 PM
To: Yair Lenga <yair.lenga@gmail.com>
Cc: P=C3=A1draig Brady <P@draigbrady.com>; Coreutils <coreutils@gnu.org>
Subject: Re: Pair-wise file operation (copy, link)
Yair Lenga <yair.lenga@gmail.com> [1970-01-01 00:00:00 +0000]:
>
> In my case, I have to bulk-move about 2500 files. This is part of a
> recurring sync job that has to mirror an existing hierarchy into a new
> hierarchy with different naming rules.
>
> It takes no time to create the mapping (even in bash script, case
> statement). When I "pipe" the mapping into "ln" (with xargs) it takes >2
> min to create the symlinks. Practically, all the time is spent on launch=
> ing "ln". With a custom perl script - it's 3 seconds.
>
2c observation:
Years ago I had a similar weekly need at work, except for an even larger
number of files (10k - 20k or so iirc), and always used a one-liner xargs
script to do the copy. My recollection is that it would complete in "a few"
seconds (maybe 10s or so). I couldn't find that script, but I just tried
it now manually: Created 2500 randomly named files, each comprising 4kB
random data, and then copied them to new names like this
$ cat fmap | xargs -L1 cp
where fmap is the name-mapping file, comprising 2500 lines like
oldname0 newname0
oldname1 newname1
oldname2 newname2
. .
. .
. .
It took under 4 seconds, plus another 1-2 seconds for the sync. This was on
a commodity x86_64 laptop. The target filesystem was the same as original.
Device is a slow 20-year old HDD.
Since you were reporting 2 min, was wondering what your platform is and
whether there might be something else involved eating the 2 min realtime?
Glenn