[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Bash associative arrays are slow, don't scale

From: David Kuehling
Subject: Bash associative arrays are slow, don't scale
Date: Wed, 04 May 2011 16:57:05 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)


I'm using a bash script (bash 4.1) to parse debian package indices (kind
of a mirroring script).  This stores all package fields for every
package into one big hash table.

That's about 300.000 hash array elements for Debian sid's "Sources"
index file.  Unfortunately performance doesn't scale at all.  Needs
about 2m30 for parsing that Sources file.  Needs exactly the same time
after completely re-writing the parser using a sed preprocessor script.

Looking at the hashlib.c source-code it's somewhat apparant why it's
that slow: bash's hashtables implement open hashing with 64 hash chains.
So performance for search and insert is linear in number N of stored
elements, as soon as N is much larger then 64.  Especially insert is
extremely slow as it has to scan the full chain from start to end.

Are there any plans to change that?  Would be helpful if the man page
stated that bash arrays are O(N) in performance for large N.  Had I
known that, I wouldn't have relied on using bash at all for such kind of


GnuPG public key: http://user.cs.tu-berlin.de/~dvdkhlng/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40

Attachment: pgp9pV8Nz6oBl.pgp
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]