[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-recutils] Index design
From: |
Michał Masłowski |
Subject: |
[bug-recutils] Index design |
Date: |
Mon, 16 Jul 2012 17:41:24 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux) |
Hello.
Unless there are significant design problems in index files and lazy
rset support, I think I can begin designing and implementing the changes
needed to use an index of records.
The index would contain the number of the indexed rset, the key type, a
tree mapping keys of the type to record positions and the depth of the
tree.
The index key type is a list of field names and value types indexed.
The value types will be currently only strings, other ones could be
added later to support case-insensitive comparisons or more efficient
indexing with numeric keys.
The tree will be mmapped for searches and built offline. Each search
will be for records with field values specified by the index key in
specific ranges on each field listed in the key type.
If a record has multiple fields of the same name, it will be indexed for
all of them. There will be no specific support for selection
expressions like "tag[0] = 'foo' && tag[1] = 'bar'" (only one part of
the expression would be used for the query).
B+ trees seem appropriate for this use, I haven't noticed earlier that
most searches will probably find many records so linking the leaves of
the tree will be useful. Since the tree will be built offline, the
leaves could be stored in an array with an empty trailing node instead
of using a linked list.
This will need at least these new interfaces:
- a record iterator with implementation-specific data
It will allow adding new index types without changing rec_db_query.
Using rset objects for this could be possible, but it would make
memory management more complex (a direct rset scan without an index
shouldn't copy the rset twice).
- a query planner replacing parts of rec_db_query
It would find any relevant index and return a record iterator to be
processed by rec_db_query. To implement other parts, it could find
any index on a field name mentioned in the selection expression, then
the planner could be rewritten to find better indexes or support more
complex expressions.
- index key type objects
- index reader getting the key type and doing searches
- index builder
Would do an offline build using the specified key type and database.
Aggregates and joins can be ignored now, like some operations needing
different index types.
I'll write a binary format for the tree later. All texts about B-trees
that I have seen assumed constant-length keys, the difference probably
isn't significant for performance assuming approximately constant node
length.
pgpA3D4PgLWjn.pgp
Description: PGP signature
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [bug-recutils] Index design,
Michał Masłowski <=