[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[help-GIFT] Adding text features to Viper/GIFT

From: David Squire
Subject: [help-GIFT] Adding text features to Viper/GIFT
Date: Tue, 15 May 2001 11:14:57 +1000

Hi all,

I am just about to spend a few hours integrating my text indexing code with the 
feature extraction code for Viper/GIFT. One of the fundamental issues here (as 
has been discussed earlier) is that the number and nature of the features (word 
stems) which will be encountered in indexing a collection is not known in 

The currently suggested solution is to maintain a file with each collection 
which maps words to feature IDs - feature IDs would not orrespond directly 
between collections (whereas they do now).

My current (quick and dirty) text indexing software accepts *all* the .txt 
files to index as command line arguments. Statistics are then gathered for term 
frequencies in the documents (in fact they are presently treated on a paragraph 
by paragraph basis) and the entire collection as a whole. The advantage of this 
is that a single hash mapping terms to their IDs and collection frequencies can 
be maintained throughout the entire process.

If this were to be changed to work on a file by file basis, as the image 
indexing currently works, then a file storing this hash would have to be 
loaded, updated and then saved each time features were extracted for a given 
.txt file.

I am planning a work-around where an initial text indexing phase will index all 
.txt files in a collection, and write a summary file
containing term ID and term document frequency information for each .txt file. 
These can then be read when the individual images are indexed. I think that 
this will work quite well, but I think that we should think about how this 
should be handled in the, gift-extract-features, 
gift-generate-inverted-file, framework.

Any thoughts much appreciated.



Dr. David McG. Squire
Computer Science and Software Engineering, Monash University, Australia
Do/Don't want HTML mail? Let me know.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]