libchop-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

tar | chop-archiver


From: Ludovic Courtès
Subject: tar | chop-archiver
Date: Mon, 16 Aug 2010 17:58:02 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)

Hello!

It is now possible to feed data to ‘chop-archiver’ on stdin.  So for
instance you can pipe the output of ‘tar’ to ‘chop-archiver’, as Simon
Josefsson rightly suggested at the GHM:

--8<---------------cut here---------------start------------->8---
$ time chop-archiver -f ,,t -S fs_block_store -r $(tar cf - ~/pix/1/ | 
chop-archiver -s -A -S fs_block_store -f ,,t) | tar tf - > /dev/null
tar: Removing leading `/' from member names
stats: * store `data-store'
stats:   blocks written:          61375
stats:   bytes written:         251392000
stats:   virgin blocks:           61335 ( 99.9%)
stats:   virgin bytes:          251228160 ( 99.9%)
stats:   average block size:     4096.00
stats:   min block size:           4096
stats:   max block size:           4096
stats: * store `meta-data-store'
stats:   blocks written:            628
stats:   bytes written:         1742336
stats:   virgin blocks:             628 ( 100.0%)
stats:   virgin bytes:          1742336 ( 100.0%)
stats:   average block size:     2774.42
stats:   min block size:            206
stats:   max block size:           2782

real    0m26.090s
user    0m5.832s
sys     0m7.274s

$ time chop-archiver -f ,,t -S fs_block_store -r $(tar cf - ~/pix/1/ | 
chop-archiver -s -A -S fs_block_store -f ,,t) | tar tf - > /dev/null
tar: Removing leading `/' from member names
stats: * store `data-store'
stats:   blocks written:          61375
stats:   bytes written:         251392000
stats:   virgin blocks:               0 ( 0.0%)
stats:   virgin bytes:                0 ( 0.0%)
stats:   average block size:     4096.00
stats:   min block size:           4096
stats:   max block size:           4096
stats: * store `meta-data-store'
stats:   blocks written:            628
stats:   bytes written:         1742336
stats:   virgin blocks:               0 ( 0.0%)
stats:   virgin bytes:                0 ( 0.0%)
stats:   average block size:     2774.42
stats:   min block size:            206
stats:   max block size:           2782

real    0m7.463s   
user    0m5.270s   
sys     0m2.489s   

$ du -ms ~/pix/1/
241     /home/ludo/pix/1/
--8<---------------cut here---------------end--------------->8---

Note that this is really I/O-bound as shown by the difference between
the first command above (which populates the store) and the second one
(which doesn’t actually write any new block).

Now, there’s the problem that, with the fixed-size chopper, adding new
directories before ~/pix/1/ in the ‘tar’ command above leads different
blocks, thereby breaking single-instance storage.

This can be remedied by using the ‘anchor_based_chopper’:

--8<---------------cut here---------------start------------->8---
$ rm -rf ,,t

$ time chop-archiver -f ,,t -S fs_block_store -r $(tar cf - ~/pix/1/ | 
chop-archiver -s -A -C anchor_based_chopper -S fs_block_store -f ,,t) | tar tf 
- > /dev/null
tar: Removing leading `/' from member names
stats: * store `data-store'
stats:   blocks written:          39029
stats:   bytes written:         251392000
stats:   virgin blocks:           35221 ( 90.2%)
stats:   virgin bytes:          250761974 ( 99.7%)
stats:   average block size:     6441.09
stats:   min block size:             40
stats:   max block size:         329017
stats: * store `meta-data-store'
stats:   blocks written:            400
stats:   bytes written:         1107984
stats:   virgin blocks:             400 ( 100.0%)
stats:   virgin bytes:          1107984 ( 100.0%)
stats:   average block size:     2769.96
stats:   min block size:            122
stats:   max block size:           2782

real    0m48.792s
user    0m25.638s
sys     0m8.010s

$ time chop-archiver -f ,,t -S fs_block_store -r $(tar cf - ~/pix/2/ ~/pix/1/ | 
chop-archiver -s -A -C anchor_based_chopper -S fs_block_store -f ,,t) | tar tf 
- > /dev/null 
tar: Removing leading `/' from member names
stats: * store `data-store'
stats:   blocks written:          39904
stats:   bytes written:         257320960
stats:   virgin blocks:             721 ( 1.8%)
stats:   virgin bytes:          5921359 ( 2.3%)
stats:   average block size:     6448.46
stats:   min block size:             40
stats:   max block size:         329017
stats: * store `meta-data-store'
stats:   blocks written:            410
stats:   bytes written:         1132864
stats:   virgin blocks:             409 ( 99.8%)
stats:   virgin bytes:          1130082 ( 99.8%)
stats:   average block size:     2763.08
stats:   min block size:            150
stats:   max block size:           2782

real    0m31.723s
user    0m26.113s
sys     0m5.172s

$ time chop-archiver -f ,,t -S fs_block_store -r $(tar cf - ~/pix/2/ ~/pix/1/ | 
chop-archiver -s -A -C anchor_based_chopper -S fs_block_store -f ,,t) | tar tf 
- > /dev/null 
tar: Removing leading `/' from member names
stats: * store `data-store'
stats:   blocks written:          39904
stats:   bytes written:         257320960
stats:   virgin blocks:               0 ( 0.0%)
stats:   virgin bytes:                0 ( 0.0%)
stats:   average block size:     6448.46
stats:   min block size:             40
stats:   max block size:         329017
stats: * store `meta-data-store'
stats:   blocks written:            410
stats:   bytes written:         1132864
stats:   virgin blocks:               0 ( 0.0%)
stats:   virgin bytes:                0 ( 0.0%)
stats:   average block size:     2763.08
stats:   min block size:            150
stats:   max block size:           2782

real    0m30.381s
user    0m25.910s
sys     0m4.875s

$ du -ms ~/pix/2/
6       /home/ludo/pix/2/
--8<---------------cut here---------------end--------------->8---

However, as can be guessed from the last run, the bottleneck has become
CPU time.

One solution would be to have a tar-aware chopper, which would break at
tar header and file boundaries and somehow hand file contents to a
different chopper.

Food for thought!

Ludo’.

Attachment: pgpU3di1kodSg.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]