[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: My experience with using cp to copy a lot of files (432 millions, 39
From: |
Rasmus Borup Hansen |
Subject: |
Re: My experience with using cp to copy a lot of files (432 millions, 39 TB) |
Date: |
Thu, 21 Aug 2014 14:13:33 +0200 |
On 21 Aug 2014, at 11:31, Pádraig Brady wrote:
> The amount of files rather than the amount of data is pertinent here.
> So 17G/432M is about 40 bytes per entry which is about right.
>
> cheers,
> Pádraig.
I don't have the exact file system available anymore, but I do have the output
of "ls -laR", so I made a small Perl script that counted the number of plain
files and symlinks (lines starting with "-" or "l") and computed the number of
inodes by considering the link count of each file. It also computed the average
length of the full paths for plain files and symlinks (157 bytes). It appears
that I had 27,067,739 inodes corresponding to 365,721,810 directory entries for
plain files/symlinks. The 432M I mentioned in my original post also included
directories (67,087,195) as it was the number of lines in the output from "cp
-v". A memory usage of 17 GB corresponds to more than 600 bytes per inode if
you're only counting inodes for plain files or symbolic links. I haven't looked
at the code since my first post, but if inodes for directories are also stored
in the hash table we end up with around 180 bytes per inode which sounds
reasonable. In don't know if hard links to directories are supported by cp, but
if not, then not storing the directories' inodes in the hash table could save a
lot of memory in my case – provided they're not needed for something else that
I don't know about.
Also, thanks for the feedback.
Best,
Rasmus
Intomics is a contract research organization specialized in deriving core
biological insight from large scale data. We help our clients in the
pharmaceutical industry develop tomorrow's medicines better, faster, and
cheaper through optimized use of biomedical data.
-----------------------------------------------------------------
Hansen, Rasmus Borup Intomics - from data to biology
System Administrator Diplomvej 377
Scientific Programmer DK-2800 Kgs. Lyngby
Denmark
E: address@hidden W: http://www.intomics.com/
P: +45 5167 7972 P: +45 8880 7979