[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
My experience with using cp to copy a lot of files (432 millions, 39 TB)
From: |
chrjae |
Subject: |
My experience with using cp to copy a lot of files (432 millions, 39 TB) |
Date: |
Thu, 11 Sep 2014 20:42:32 -0700 (PDT) |
This post has made it to Hacker News[1].
We have discussed optimization possibilities a bit, and I made the
suggestion to replace the usage of a hash table in cp with sorting a
list.
For example: walk the source tree and write a list of ino/dev/path to
a temporary file, then sort that file according to ino/dev (e.g. using
GNU sort, which I seem to remember is already using a memory-efficient
algorithm (i.e. works well with files much bigger than RAM)?), then
parse the file back and copy the first path of every group with the
same ino/dev and link the rest.
The assumption is that sorting a list requires much less RAM to be
efficient than the hash table. (I can't find my copy of TAOCP right
now, I think it describes solutions.)
Christian.
[1] https://news.ycombinator.com/item?id=8305283
- My experience with using cp to copy a lot of files (432 millions, 39 TB),
chrjae <=
- Re: My experience with using cp to copy a lot of files (432 millions, 39 TB), Pádraig Brady, 2014/09/12
- Re: My experience with using cp to copy a lot of files (432 millions, 39 TB), Pádraig Brady, 2014/09/12
- Re: My experience with using cp to copy a lot of files (432 millions, 39 TB), Christian Jaeger, 2014/09/12
- Re: My experience with using cp to copy a lot of files (432 millions, 39 TB), Pádraig Brady, 2014/09/14
- Re: My experience with using cp to copy a lot of files (432 millions, 39 TB), Pádraig Brady, 2014/09/18