[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Another patch, for discussion tho
From: |
Bruce Korb |
Subject: |
Re: Another patch, for discussion tho |
Date: |
Sat, 21 Apr 2012 10:14:31 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120328 Thunderbird/11.0.1 |
So after futzing with timing a bit, I figured out the following:
1. These pre-computed tables _can_ out perform "strpbrk". But
only if the skipped over character count is approximately in
the range of a dozen or two. After that, single instruction
testing and hand crafted assembly code beat it out. (These
generated tables require a load, mask and test instead of
just a load and test.)
2. The setup for a single character strpbrk break-on string is
*MUCH* larger than the setup cost for a two-or-more character
string. Likely, someone is trying to optimize the setup and
the setup is efficient enough that this optimization pessimizes.
3. It was never about efficiency of execution anyway. It is quite
unlikely that time-critical code is going to be scanning over
strings anyway. If they must, then use strpbrk/strcspn.
Maybe for really critical scanning code, variants of those
could split the interface into setup_strpbrk and run_strpbrk.
I suppose, in retrospect, I could do the same thing and
achieve the same efficiency. "SETUP_whatever_SCAN()"
populates an array of bytes that merely need to be tested
for "true" and "false" instead of masking. Entirely doable,
but not today.
This whole thing _is_ about efficiency -- but efficiency of
expression, and also flexibility. (Change the characters
in a classification and the main code now accepts the new
character set without alteration. E.g. add '$' to the set
of name characters for "C" and now you are VMS compatible.)
So where would the right place be for a beast like this?