bug-recutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-recutils] Index design


From: Michał Masłowski
Subject: [bug-recutils] Index design
Date: Mon, 16 Jul 2012 17:41:24 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux)

Hello.

Unless there are significant design problems in index files and lazy
rset support, I think I can begin designing and implementing the changes
needed to use an index of records.

The index would contain the number of the indexed rset, the key type, a
tree mapping keys of the type to record positions and the depth of the
tree.

The index key type is a list of field names and value types indexed.
The value types will be currently only strings, other ones could be
added later to support case-insensitive comparisons or more efficient
indexing with numeric keys.

The tree will be mmapped for searches and built offline.  Each search
will be for records with field values specified by the index key in
specific ranges on each field listed in the key type.

If a record has multiple fields of the same name, it will be indexed for
all of them.  There will be no specific support for selection
expressions like "tag[0] = 'foo' && tag[1] = 'bar'" (only one part of
the expression would be used for the query).

B+ trees seem appropriate for this use, I haven't noticed earlier that
most searches will probably find many records so linking the leaves of
the tree will be useful.  Since the tree will be built offline, the
leaves could be stored in an array with an empty trailing node instead
of using a linked list.

This will need at least these new interfaces:

- a record iterator with implementation-specific data

  It will allow adding new index types without changing rec_db_query.
  Using rset objects for this could be possible, but it would make
  memory management more complex (a direct rset scan without an index
  shouldn't copy the rset twice).

- a query planner replacing parts of rec_db_query

  It would find any relevant index and return a record iterator to be
  processed by rec_db_query.  To implement other parts, it could find
  any index on a field name mentioned in the selection expression, then
  the planner could be rewritten to find better indexes or support more
  complex expressions.

- index key type objects

- index reader getting the key type and doing searches

- index builder

  Would do an offline build using the specified key type and database.

Aggregates and joins can be ignored now, like some operations needing
different index types.

I'll write a binary format for the tree later.  All texts about B-trees
that I have seen assumed constant-length keys, the difference probably
isn't significant for performance assuming approximately constant node
length.

Attachment: pgpA3D4PgLWjn.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]