bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] sort: use posix_fadvise to announce access patterns on files


From: Pádraig Brady
Subject: Re: [PATCH] sort: use posix_fadvise to announce access patterns on files opened for reading
Date: Tue, 02 Mar 2010 01:04:53 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.7) Gecko/20100111 Thunderbird/3.0.1

On 01/03/10 09:36, Pádraig Brady wrote:
On 28/02/10 00:46, Pádraig Brady wrote:
On 27/02/10 22:16, Joey Degges wrote:
2010/2/27 Pádraig Brady <address@hidden <mailto:address@hidden>>

Well as long as the reads are big enough,
read-process-read-process-... won't be significantly different than
read-read-process-process-... However in the case where the
storage and CPU are contended for by other processes the former
is more desirable. You could test that by spinning the CPU with
another process while doing the test. It still seems to me
that WILLNEED is not appropriate for the sequential case.

Here are the results of sorting when two other processes are running at
100% each (dual core) for the duration of the sort:
NORMAL : 279.03s, 12.0% CPU
SEQUENTIAL: 230.92s, 14.5% CPU
WILLNEED : 186.26s, 18.5% CPU

I suspect this trend may also repeat in situations where other processes
are contending for both CPU and storage. Perhaps the priority of the
WILLNEED read ahead is much higher than that of typical reads? Just a
guess.

Interesting results, thanks!
I'll play around with this with different buffer
sizes and concurrent loads to see how it behaves
on my single core system.

Will check WILLNEED v SEQUENTIAL this evening.

TBH I couldn't see why you got such good results for WILLNEED,
especially since sort (with enough RAM) will read all the
data up front anyway. In my testing I did not get a benefit
from using WILLNEED over SEQUENTIAL. I still think SEQUENTIAL
is more appropriate and will allow sort to better share the
system resources.
  I have confirmed that WILLNEED reads the whole file synchronously,
which is surprising to me as I thought it should indicate to the
system to cache the file when possible rather than immediately.
I'll ask the kernel guys to see whether one also needs a "background"
tunable to get it running asynchronously. I'm also thinking that
the synchronous reading may not be desired in merge mode for example.
  Perhaps your WILLNEED was giving a better cpu affinity
characteristics than the other modes on your dual core system?

My system details and results are:

CPU:    pentium m single core 1.7GHz
RAM:    2G
kernel: 2.6.30.10-105.2.23.fc11.i586
cmd:    time LANG=C sort < file > /dev/null

88MB file of random numbers. mechanical disk @ 34 MB/s
 NORMAL      31.6 (26.7 user)
 SEQUENTIAL  31.6
 WILLNEED    31.6

88MB file of random numbers. flash key @ 21 MB/s
 NORMAL      31.6 (26.8 user)
 SEQUENTIAL  27.7
 WILLNEED    27.7

88MB file of random numbers. flash key @ 21 MB/s + spinning process
 NORMAL      59.5 (27.0 user)
 SEQUENTIAL  59.5
 WILLNEED    59.5

32MB file of random numbers on 800KB/s compact flash
 NORMAL      45.6 (7.8 user)
 SEQUENTIAL  45.2
 WILLNEED    45.2

32MB file of random numbers on 800KB/s compact flash + spinning process
 NORMAL      53.7 (7.8 user)
 SEQUENTIAL  53.7
 WILLNEED    54.9

So we can see that the only benefit under the conditions above
is the increased readahead on the fast flash device where the
wall clock time was reduced by 17% for you and 12% for my flash key.
So that's significant and worth enabling SEQUENTIAL mode for.

I'll push a version with just SEQUENTIAL enabled soon,
unless I see results to the contrary (with an explanation :))

cheers,
Pádraig.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]