[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: auto merging extents
From: |
Jim Meyering |
Subject: |
Re: auto merging extents |
Date: |
Wed, 09 Mar 2011 19:23:51 +0100 |
Pádraig Brady wrote:
> I was wondering about adding fallocate() to cp,
> to efficiently allocate the destination before writing.
> With that in mind, I think it would be beneficial
> to auto merge extents, so that fragments in the
> source were not propagated to the dest?
>
> This should also be more efficient even without fallocate()
> as demonstrated by running the attached on my ext3 file system:
>
> $ dd if=/dev/zero count=50 bs=1000000 of=file.fa
>
> $ strace -c -e read,write cp-old file.fa file.fa.cp
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 86.67 0.017686 7 2363 write
> 13.33 0.002721 1 2372 read
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 0.020407 4735 total
>
> $ strace -c -e read,write cp-new file.fa file.fa.cp
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 85.76 0.019382 13 1535 write
> 14.24 0.003218 2 1544 read
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 0.022600 3079 total
>
> Hmm, I wonder should we get extent_scan_read() to loop until
> all are read, rather than requiring loops outside?
> As well as simplifying users, it would allow us to merge
> extents whose info spans the 4K buffer provided?
Nice.
> Subject: [PATCH] copy: merge similar extents before processing
>
> * src/extent-scan.c (extent_scan_read): Merge extents that vary
> only in size, so that we may process them more efficiently.
> This will be especially useful when we introduce fallocate()
> to allocate extents in the destination.
> ---
> src/extent-scan.c | 26 ++++++++++++++++++++------
> 1 files changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/src/extent-scan.c b/src/extent-scan.c
> index 1ba59db..c416586 100644
> --- a/src/extent-scan.c
> +++ b/src/extent-scan.c
> @@ -85,23 +85,37 @@ extent_scan_read (struct extent_scan *scan)
> scan->ei_count = fiemap->fm_mapped_extents;
> scan->ext_info = xnmalloc (scan->ei_count, sizeof (struct extent_info));
>
> - unsigned int i;
> + unsigned int i, si = 0;
Technically, the GCS says to put each variable declaration on
its own line, but in this case, I think it's ok as is, since
they're both indices, and I would do the same if I were putting
the declarations in the for (....
> for (i = 0; i < scan->ei_count; i++)
> {
> assert (fm_extents[i].fe_logical <= OFF_T_MAX);
>
> - scan->ext_info[i].ext_logical = fm_extents[i].fe_logical;
> - scan->ext_info[i].ext_length = fm_extents[i].fe_length;
> - scan->ext_info[i].ext_flags = fm_extents[i].fe_flags;
> + if (si && scan->ext_info[si-1].ext_flags == fm_extents[i].fe_flags
> + && (scan->ext_info[si-1].ext_logical +
> scan->ext_info[si-1].ext_length
> + == fm_extents[i].fe_logical))
> + {
> + /* Merge previous with last. */
> + scan->ext_info[si-1].ext_length += fm_extents[i].fe_length;
> + }
> + else
> + {
> + scan->ext_info[si].ext_logical = fm_extents[i].fe_logical;
> + scan->ext_info[si].ext_length = fm_extents[i].fe_length;
> + scan->ext_info[si].ext_flags = fm_extents[i].fe_flags;
> + si++;
> + }
> }
>
> - i--;
> - if (scan->ext_info[i].ext_flags & FIEMAP_EXTENT_LAST)
> + scan->ei_count = si;
> +
> + si--;
> + if (scan->ext_info[si].ext_flags & FIEMAP_EXTENT_LAST)
> {
> scan->hit_final_extent = true;
> return true;
> }
>
> + i--;
> scan->scan_start = fm_extents[i].fe_logical + fm_extents[i].fe_length;
The above is all fine, but unless you know that scan->ei_count
is always positive, seeing "i" and "si" used as indices right
after being decremented may make you think there's a risk
of accessing some_buffer[-1].
What do you think about adding an assertion, either on scan->ei_count
before the loop, or on i and/or si after it?
Thanks!