coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: overly aggressive memory usage by sort.c


From: Pádraig Brady
Subject: Re: overly aggressive memory usage by sort.c
Date: Fri, 08 Jun 2012 01:11:08 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0

On 06/07/2012 09:12 PM, Jeff Janes wrote:
> In commit a507ed6ede5064b8f15c979e54e6de3bb478d73e, first appearing in
> v8.16, the default memory usage was changed to take all of available
> memory, rather than half of it.  I think this is too aggressive.
> 
> If I start a very large file sort on a previously idle server, it will
> report almost all physical memory as being available, and so sort will
> take all of it.  But as soon as the heavy IO (reading the data to be
> sorted, writing temp files) starts up, the kernel needs more memory
> for buffering in order to make the IO efficient.  The kernel and the
> sort start competing for memory, a little bit of paging/swapping
> starts, time in iowait increases, and the overall sort performance
> drops by roughly a factor of 2.
> 
> I don't know what the correct proportion of available memory to take
> would be, but I think it is >0.5 and <1.0.  Maybe 0.75.  But I think
> that just going back to 0.5 would be better than the status quo.  Or
> perhaps the upper limit clamp could be based on physical memory
> instead of available, so rather than:
> 
> mem = MAX (avail, total / 8);
> 
> maybe:
> 
> mem = MIN(total/4*3, MAX (avail, total / 8));

I have to agree. In general patches like this shouldn't
go in without extensive performance testing.

The thread discussing the patch is here:
http://bugs.gnu.org/10877

There are other things we might consider with external files.

- ensure they're written to disk rather than ram.
I.E. avoid /tmp if it's tmpfs as is becoming more common on systems

- use posix_fadvise as is done in dd, to mark the external files
as non cachable, as the only reason you'd be using them is when
you don't have enough RAM anyway.

The above considers a 2 level memory hierarchy.
Increasingly though the "memory wall" is an issue,
so a 3 level hierarchy involving increasingly large CPU caches
should be considered.

cheers,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]