[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [help-GIFT] Re: Clarification on inverted file
From: |
Wolfgang Mueller |
Subject: |
Re: [help-GIFT] Re: Clarification on inverted file |
Date: |
Mon, 20 Aug 2001 11:28:32 +0200 |
On Friday 17 August 2001 08:56, David Squire wrote:
> suryani lim wrote:
> > I am trying out GIFT at the moment, to see if I can use it and to test
> > your published algorithms on my database. I had some problems indexing
> > my images when I first used it. It's looking good now but I am unsure
> > which combination of algorithms are implemented in Classical IDF. You
> > have published several weight functions and pruning algorithms.
> >
> > For pruning, it seems that only sorted weight and lossy pruning is
> > implemented, with the pruning features set at 70%? Is that right?
>
OK. The 70% is a parameter which can be changed during runtime. It means the
program will read the feature list for the 70% query features with the
highest term frequency. Note that this term frequency is the effective term
frequency calculated from the feedback given.
The pruning is to be found in
gift/libGIFTQuInvertedFile/cc/CQInvertedFile.cc
in the method
CQInvertedFile::init() the parameters coming from the config file and/or the
interface will be analyzed.
in the method
...::keepScorePruning() they will be used (all these conditions in the
for-loop). Henning implemented also pruning based on a maximum time, and
cutting off the scoreboard at a certain point within query processing (this
limits the number of hash accesses). In the current version it is untested if
my linking it to the rest of the GIFT is correct. The feature-list based
pruning is used a lot, though so you can rely on that.
--
Why only lossy pruning?
With our current collection sizes we can assume, that the seek time (seeking
the location of the feature list on the hard disk) is more important than the
actual reading time. So the pruning has to be based on entire feature lists.
The other point is, that in the current feature set of Viper, for the
overwhelming majority of the features the term frequency is either one or
zero. Implementing more sophisticated methods that read only parts of feature
lists thus would not make sense.
--
The weighting is to be found in CWeightingFunction.cc
The function of interest is
CWeightingFunction::getTermFrequency();
So we use a Rocchio-inspired weighting formula for obtaining the
tfrequencies. What we don't do is cutting off the negative ones, i.e. there
*will* be features with negative term frequency. Please note that which
weighting function is actually used is also decided during runtime. The
weighting functions are encapsulated in classes that inherit from
CWeightingFunction ("Strategy" pattern).
> Few people say that they are using inverted files, but quite a few
> techniques use the same idea, which is to index images by features, rather
> than vice versa. Smith and Chang's colour sets is a good example (see
> http://www.ctr.columbia.edu/~jrsmith/html/pubs/PAMI/pami_final_1.html).
> There are other examples in the literature; sometimes the term "hashing" is
> used.
If I may add to that, Smeulders et al. report in their recent overview
article some work done in 1996 that was about indexing images in inverted
files using salient points. Chad Carson told me, that he stored the blobs of
BlobWorld in an inverted file. MARS is strongly inspired by text retrieval,
but modifies the retrieval scheme, basing the weighting not on the document
frequency but on the standard deviation of the term frequency.
After what I have seen, David's approach that was implemented in Viper is
maybe the most radical approach to adapting image retrieval methods to text
retrieval indexing, because it does not try to achieve semantically
meaningful representation (e.g. segments) before indexing the image in an
inverted file. So much so, that this inverted files are not even mentioned as
a possible general purpose indexing structure in the said Smeulders et al.
article. Please correct me, David, if I missed something.
More questions welcome...
Cheers,
Wolfgang
--
Wolfgang Müller,
assistant-doctorant == PhD student (2001), teaching assistant
Personal page: http://cui.unige.ch/~vision/members/WolfgangMueller.html
Maintainer, GNU Image Finding Tool (http://www.gnu.org/software/gift)