Re: Proceeding with the GSoC Project

From:

Sudeepam Pandey

Subject:

Date:

Fri, 27 Apr 2018 05:03:21 +0530

On 27 Apr 2018 4:48 a.m., "Nicholas Jankowski" <address@hidden> wrote:

On Thu, Apr 26, 2018 at 7:12 PM, Sudeepam Pandey <address@hidden> wrote:

Due to ASCII encoding, something like this will happen... say we have 2 proper words, 'amuse' and 'abuse', and someone types in 'atuse'. In such a case, the network by default would output 'amuse' because ASCII of 't' is closer to 'm' than it is to 'b'. I did not considered this to be a problem because essentially the network really is giving us 'the closest match', however, even if we do consider this to be a problem, considering all the classes within a probability range instead of the class of highest probability would solve it.

I would suggest qwerty keyboard distance if we're trying to catch the majority of fumble-finger typos. then again that would exclude people like me typing on Dvorak...

Is there a unique qwerty distance algorithm? As for my implementation, qwerty distance is something that I have already used in my Neural Network's test data, if training examples contain manually created fumble-finger typos mapping to the correct spelling, the neural network would identify them without a problem.