[Varnamproject-discuss] Frequency calculation

From:

Kevin Martin

Subject:

Date:

Thu, 24 Apr 2014 23:30:38 +0530

I want to get more familiar with the code base and was hoping to work on this issue:

https://savannah.nongnu.org/bugs/?40401

A simple but inefficient solution will be to use float instead of int. Make the frequency increment by 0.001 instead of 1. I guess that would make the whole program slower since working with floats tend to have more overhead.

I believe that we are only interested in the relative frequencies here. We can have a frequency threshold of, say, 1000. This means that if the frequency of a word exceeds that of the word with the second highest (or third, or whatever) by 1000 or more, we use a normalization function. This will result in words rarely used being reset to 0 (or 1) frequency and the frequencies of other words adjusted to scale. Sort of like the percentile system - but keeps resetting.

[Prev in Thread]

Current Thread

[Next in Thread]