My experience with using cp to copy a lot of files (432 millions, 39 TB)

coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

My experience with using cp to copy a lot of files (432 millions, 39 TB)

From:	chrjae
Subject:	My experience with using cp to copy a lot of files (432 millions, 39 TB)
Date:	Thu, 11 Sep 2014 20:42:32 -0700 (PDT)

This post has made it to Hacker News[1].

We have discussed optimization possibilities a bit, and I made the
suggestion to replace the usage of a hash table in cp with sorting a
list.

For example: walk the source tree and write a list of ino/dev/path to
a temporary file, then sort that file according to ino/dev (e.g. using
GNU sort, which I seem to remember is already using a memory-efficient
algorithm (i.e. works well with files much bigger than RAM)?), then
parse the file back and copy the first path of every group with the
same ino/dev and link the rest.

The assumption is that sorting a list requires much less RAM to be
efficient than the hash table. (I can't find my copy of TAOCP right
now, I think it describes solutions.)

Christian.

[1] https://news.ycombinator.com/item?id=8305283

[Prev in Thread]

Current Thread

[Next in Thread]

My experience with using cp to copy a lot of files (432 millions, 39 TB), chrjae <=
- Re: My experience with using cp to copy a lot of files (432 millions, 39 TB), Pádraig Brady, 2014/09/12
  - Re: My experience with using cp to copy a lot of files (432 millions, 39 TB), Pádraig Brady, 2014/09/12
  - Re: My experience with using cp to copy a lot of files (432 millions, 39 TB), Christian Jaeger, 2014/09/12
  - Re: My experience with using cp to copy a lot of files (432 millions, 39 TB), Pádraig Brady, 2014/09/14
    - Re: My experience with using cp to copy a lot of files (432 millions, 39 TB), Christian Jaeger, 2014/09/14
  - Re: My experience with using cp to copy a lot of files (432 millions, 39 TB), Pádraig Brady, 2014/09/18
    - Re: My experience with using cp to copy a lot of files (432 millions, 39 TB), Bernhard Voelker, 2014/09/19

Prev by Date: Re: [PATCH] build: avoid name transformations on libstdbuf
Next by Date: Re: My experience with using cp to copy a lot of files (432 millions, 39 TB)
Previous by thread: [PATCH] build: avoid name transformations on libstdbuf
Next by thread: Re: My experience with using cp to copy a lot of files (432 millions, 39 TB)
Index(es):
- Date
- Thread