[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [coreutils] [PATCH] sort: -R now uses less memory on long lines with
From: |
Paul Eggert |
Subject: |
Re: [coreutils] [PATCH] sort: -R now uses less memory on long lines with internal NULs |
Date: |
Thu, 12 Aug 2010 01:32:38 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.11) Gecko/20100713 Thunderbird/3.0.6 |
On 08/12/10 00:49, Pádraig Brady wrote:
> Is it uint32_t for alignment (speed)?
>From sort's point of view, it's uint32_t because that's what
the md5 library specifies. I haven't looked into md5 and don't
know if it could be sped up by assuming 64-bit integers.
> Would it be worth doing a memcmp() first and only doing
> the rest if the bytes differ. Depends on the data I know,
> but given caching the up front memcmp may be worth it?
I had the same idea, but have not bothered to investigate it.
You're right that it depends on the data. For actual uses
of -R I suspect that the memcmp won't help performance much,
as the items typically won't contain duplicates.
> Also I wonder would caching the xfrm() data in the line buffers
> be worth it, as the number of comparisons increases?
Yes, absolutely, and that's been on my list of things to try
for a couple of months now. I want to try something else first,
though, that will avoid the need for strxfrm entirely during
almost all calls to compare(). If this works, the benefit
of caching will be so little that I expect it's not worth
the bother.