[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Support in sort for human-readable numbers

From: Vitali Lovich
Subject: Re: Support in sort for human-readable numbers
Date: Wed, 7 Jan 2009 04:21:15 -0500

On Wed, Jan 7, 2009 at 2:52 AM, Paul Eggert <address@hidden> wrote:
> I looked at the patch
> <http://launchpadlibrarian.net/20858726/sort_human_v3.diff> and have a
> few comments:
> * There's no documentation, either in the .texi file or in the --help
>  string.  That's often the hardest part to get right.
I agree - that's why I'm waiting to get the implementation stabilized
before trying to write it (along with the necessary test cases).  That
will be done, but there's no point documenting right now since the
behaviour hasn't been finalized.

> * The patch slows down "sort -n" a bit, and (worse) complicates the code
>  for "sort -n".  There are a lot of gotos.
Oh no - scary gotos.  Seriously - gotos serve a place, and there is
appropriate usage, especially when it comes to minimizing code
duplication within a function (dont believe me?  go look at the linux
kernel or any significant piece of C code).  In fact, gotos are
actually quite fast (if you know CPUs, you'll realize it's a simple
jmp instruction which does nothing to the CPU pipeline - conditional
jumps are way more problomatic).  Some of those gotos are unnecessary
(not removing code duplication), but I think they make the code easier
to read and I'm quite positive that any decent compiler will optimize
away any unnecessary jumps.  And code readability is far more
important than that last 0.1% improvement.

I challenge you to show me one benchmark where the behaviour is
demonstrably slower by even a significant percentage (let's say 3%)
and is not caused by the branching of which comparison to use.

> Better, I think, would be
>  to (1) search for a trailing suffix character, and compare based on
>  that, and then (2) fall back on numeric comparison only if the
>  trailing suffix characters match.  This will certainly make plain
>  "sort -n" faster and will result in a less-intrusive change to the
>  existing code; arguably it will make the new code faster in many
>  cases, too.
Did you read my explanation above for your exact "optimization"?
Please explain to me, in a somewhat detailed manner, how this improves
behaviour because I'm failing to see the algorithm.

Here's what I think you're suggesting:

for (each character c in string a)
  if (c is not a number)

for (each character d in string b)
  if (c is not a number)

if (suffix starting at c != suffix starting at d)
   return which one is greater

perform the regular sort -n comparison between a & b

You must be thinking that that last step is somehow magical and free,
cause I see it as once again looping over the string.  You may be
thinking to yourself - yeah, but there's no expensive comparison
between the two strings.  Ok - I'll grant you that.  So you've sped up
the best case where the suffixes don't match (more on this later).
Now let's look at the worst case - the suffixes match.  Thus you have
to scan over the string again but this time performing your numeric
comparison.  So you've made your worst case slower.  I have a feeling
that this worst case is also likely to be the average case.

Also, let's take it that you have managed to speed-up the best-case.
But have you really?  What have you saved - the cost of a subtraction
n times (n is the length of the number) over a span of x comparisons.
Do you honestly believe that n * x will be sufficiently big in any
reasonable test case that this saving will actually be realized? x by
the way will be approximately m log m (where m is the number of
strings).  So your cost savings over the course of your call to sort
will be n * m log m, for m strings of length n.  Let's take a
reasonable n = 3 (after all - higher n on average means you will be
using a different suffix anyways).  Let's saying you are sorting a
million strings.  So you've saved about 3 000 000 log 3 000 000 = 19
431 363 comparisons.  Let's be generous here and say that subtraction
takes 20 clock cycles (I don't feel like digging up my FPGA work to
figure out how many clock cycles my implementation took - let alone
the far better ones used in modern processors).  On a modern 2.0 GHz
machine, that'll be about 40 ns.  So for a million strings, you've
managed to save a whopping 777254520 nanoseconds.  Wait a minute -
that's 0.7 seconds.  So every million strings - your optimization, in
the best case, will shave off 0.7 seconds.  Excellent.  Oh - and m has
a much larger impact, so sorting more strings will be more of a factor
than sorting longer strings.

Now repeat the above analysis for the worst case, and I'm quite
confident you'll see a much larger hit (although probably still not
significant at all) in the worst case (suffixes equal).

Hopefully I didn't emabarass myself with a glaring mistake - it's really late.

> * Please don't take the address of an often-used local variable; this
>  hurts optimization, since it often inhibits optimizing that variable
>  into a register.
Could you please provide me with context about where this is happening
(I can't find it in the code).  A line number in the patch would be

> * I suggest ignoring any suffix after the 'k' or 'M' or 'G' or whatever;
>  it's simpler.
And extremely incorrect - please read the discussion above.
Essentially, you then run into the problem of malformed input -
2Klingons would be treated the same as 2K Klingongs.  Granted, this is
a contrived example, but I hope you see my point.

> * I also suggest allowing only the upper-case version (e.g., 'P', not
>  'p'), as this will allow later support for ISO prefixes like 'p' (for
>  "pico") better.  (It's OK to allow both 'k' and 'K' to mean kilo,
>  though.)
Agreed, although in this context they are numeric suffixes (pico is
the unit prefix, but units play no role in this discussion).  The
problem though then becomes that I suddenly have to worry about which
suffixes can be both and which must be one or the either.
Additionally, you've got problems like mu for micro, which I'm not
even sure can be represented in ASCII.  Also, h & da are hardly used I
think (& even better, da breaks the pattern of 1 character suffixes).
So supporting the full set seems questionable.

> * exponent_order should be a simple indexed lookup like "table[a]";
>  there shouldn't be a need for a loop checking for "inverse_table[i] ==
>  a".
Is the memory-speed tradeoff really worth it?  The table isn't that
big to begin with.  So you're trading off 256-bytes for (in the best
case) a savings of, like what, 5 subtractions per suffix comparison?
I'd probably concede your point on this tomorrow after a nights sleep.
> Hope this helps.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]