bug-gperf
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gperf] Add machine-readable output to gperf


From: Bruno Haible
Subject: Re: [bug-gperf] Add machine-readable output to gperf
Date: Mon, 21 Nov 2016 19:08:20 +0100
User-agent: KMail/4.8.5 (Linux/3.8.0-44-generic; KDE/4.8.5; x86_64; ; )

Hi Sei Lisa,

> > In which language is the project written?
> 
> Pascal.

OK, that's similar enough to C and C++ in the sense that
  - the assumptions about the relative costs of statements (e.g. that array
    accesses are more expensive than bit arithmetic on variables) hold true,
  - hash tables are not part of the standard library of the language,
  - it's not easy to generate executable code for this language on-the-fly
    (unlike, say, Lisp, Java, or C#).

> If you're asking in regards to whether it allows binding to a C library, the
> answer is probably yes (not sure about the portability issues that would need
> to be dealt with, though).

It depends on the binding costs. If the binding costs at runtime are too high,
I would consider to postprocess the gperf output with a C -> Pascal translator.

> the aim is to avoid any recompilation every time there's a change in the
> keywords, which is often. Just a gperf run should suffice.

Well, if you want to avoid recompilation, you also want to avoid a gperf run.
Note that gperf's running time is not polynomially bounded. For input sets
of 20 to 100 keywords, it can take seconds or minutes. For input sets of 1000
keywords, it may take days. For 10000 keywords, don't even try - it won't finish
in your lifetime. There are some options that can tune the performance, 
depending
on the shape of the keyword set - but this requires some human understanding and
therefore manual intervention.

Both in your current approach (parsing the C output of gperf) and your proposed
approach (parsing some XML output file) you will still be *interpreting* the 
data
('function HashValue'). This interpretation typically costs a factor of 2 in
performance.

In such a situation, my recommendation is to go with a normal implementation of
hash tables: use
  - a fixed hash function for all strings, independent of the keyword set
    (but beware of a common pitfall found in textbooks, see
    http://www.haible.de/bruno/hashfunc.html),
  - a hash table structure that can deal with hash code collisions in
    average O(1) time. See the textbooks on data structures for this.

If you follow this path, the use of hash tables become a no-brainer.
And hash tables are useful for so many other types of keys, not just strings.

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]