[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-recutils] GSoC: Ideas for Recutils
From: |
Jose E. Marchesi |
Subject: |
Re: [bug-recutils] GSoC: Ideas for Recutils |
Date: |
Tue, 27 Mar 2012 20:06:48 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.0.92 (gnu/linux) |
Hi.
For complex queries there are many ways to use indices, there are also
different performance benefits of tree or hash indices. This depends on
data. Maybe the index could be built in a way optimized for previously
done queries, without any manual specification of what to store
there.
Since any write practically requires rewriting the database (indices are
optional), maybe index formats which needs a complete rebuild on change
wouldn't be too slow for use with recutils, although they aren't used in
traditional database systems.
We can assume that changing the recfile in any way will require a
complete rebuild of the corresponding index file. The operation will be
performed by recfix, and it must be considered as an "offline"
operation. This implies that generating index file will be a slow
operation, but recutils users will probably use indexes only in files
which are rarely updated.
Writing good performance tests, which might approximate what a real
useful program does with a big database, is probably necessary for this
task. I don't know existing uses of recutils with database sizes for
which this task would be significant.
Yes, would be nice to have realistic performance tests. There are some
simple performance tests for recsel in torture/utils/p-recsel.sh, but
they could not be considered as "realistic".
The only problem which I already found is that the database is
completely read and parsed for use, changing this would be needed to
make indices useful with recsel. I don't expect this to be more
difficult than other parts of the task.
That will require changes in the internal design of librec, which must
be carefully studied.
This will basically require a change in the rec_rset_t ADT in order to
The ideas page mentions determining if the index is up to date, I don't
see other practical solutions than using filesystem metadata of the
database file (checksumming the file contents should be much slower than
doing a simple query using a tree index).
We could have a "checksum" comment at the end of the rec file, which
would be generated by recfix when creating the index file. The problem
with this approach is that the creation of the index wont be completely
decoupled from the recfile itself, but that may not be really a problem.
(I'm writing this as a student interested in implementing this; I don't
have practical experience in implementing databases, I know C and I can
implement structures useful for indices.)
More than enough. The analysis you just did proves that you could do
the task if you wanted to :)
--
Jose E. Marchesi http://www.jemarch.net
GNU Project http://www.gnu.org