Re: Proceeding with the GSoC Project

On 2018-04-26, at 19:12, Sudeepam Pandey <address@hidden> wrote:

On Fri, Apr 27, 2018 at 3:23 AM, Bradley Kennedy <address@hidden> wrote:

On 2018-04-26, at 17:11, Doug Stewart <address@hidden> wrote:

On Thu, Apr 26, 2018 at 4:47 PM, Sudeepam Pandey <address@hidden> wrote:

On Fri, Apr 27, 2018 at 1:12 AM, Doug Stewart <address@hidden> wrote:

On Thu, Apr 26, 2018 at 3:26 PM, Sudeepam Pandey <address@hidden> wrote:
So I have set up my blog at [1] and a public repository at [2].

My public repo at bitbucket contains a branch by the name of "Did_you_mean" where I plan to push all the small changes that I make to my project. Then a large change-set can be pushed to the default branch.

a) Please take a look at these and inform me about anything that you'd like changed.
b) Kindly, also inform me about anything that needs to be done after setting up the repo and the blog,

Other than that, can anyone direct me to a link where all the existing functions of GNU Octave can be found in the form of a list? I know about this[3] link but over here, a description of the function is included with the function name. I essentially require, only the function names and so if I copy anything from this page, I'll have to clean it up first. If a list exists then its good otherwise I'll just proceed with the comprehensive list on that page.

Getting a list of all octave commands is one of the steps towards a goal.
getting a list of all package command is another step towards your goals. etc.

Would you like me to make any changes to the bitbucket repository or my blog?

I don't see any ideas as to how to do packages! How are you going to handle the situation, that the user has a package loaded and is typing in a command word from the package?

Yeah Doug,

As a person who researches in the area of machine learning I see many issues with using this for Octave. An immediate thought is “are we trying to adapt a problem to the solution?” Neural networks have issues like fixed input/output size (which you already ran into), model capacity, bias (as in mathematical bias), and catastrophic forgetting (with continual learning).

Doug brings up a great point, the only way NN would be feasible with dynamic loading of packages would be to either use continual learning (which might not work with one-hot encodings?), or retrain the model on package loading. But at some point we are also going to have increase model capacity in order to accomidate new function names as in the case. Also, you appear to be encoding your inputs as ASCII which is probably less than ideal given that assumes that inputs into your feature space are closer if their ASCII distance is closer. In natural language processing we usually use word embeddings but since the names of functions aren’t really natural it probably would apply. You could use LSTM or convolutions which may assist you with your input length problem but those are just going to slow you down and make it less feasible since you’re doing this all in Octave.

An smart implementation of edit distance would likely better serve Octave. Using assumptions about the errors that users would commonly make would allow you to lower your search space and since reasonable implemenations of edit distance are \theta(mn) with m and n being the target string lengths then we only have to multiply it by the number of functions, or perhaps names, in Octave. With some clever tricks you can reduce the number of functions you have to check in this scheme (usually most people know the first character of the function they want for instance might be sufficient, or only look at words that are similar in length, these are things you’d want to research first anyway, I’m sure it is a solved problem).

As the model capacity increases we also have to look at the complexity of the model as far as computation as well. Retraining the network with more examples will take longer and as we discovered I don’t think we can use a pretrained network.

So in summary the problems are:
- Scalability
- Performance
- Fixed input/output size
- Training time
- Dynamic names
- Simpler solutions exist
- Model encoding is poor

I think the most common issue is going to be dynamic names, given we are using training examples from an external source they would have to be packaged somehow with the distro.

Some mixed method might be interesting but again I think a well designed edit-distance would serve better.

Cheers,
Brad Kennedy

Edit distance, indeed is an excellent algorithm for this job but I disregarded it, only because a 'smart' implementation would require some really important assumptions. Like for instance, assuming that the user will always type the first character correctly.

Like I said, you’d want to do research into this first to see what previous literature is doing.

I have also gone through the problems that you have stated about neural networks. Most of these can also be solved with some reasonable assumptions, and I have concluded that the end decision about whether to use neural nets or edit distance ultimately lies with what assumptions do we consider more practical.

Examples of solutions to the problems you stated...

For the Input size problem, it is reasonable to assume that the user would at best, misspell a word with not more 3 extra characters, so we can keep the input layer size = 3 + longest length function of Octave.

I think this fails to account for user packages that arn’t included with Octave or in the Octave Forge as well. From my previous work with Octave in neuro physcology they use a lot of Matlab libraries that Octave is compatible with for the most part.

Due to ASCII encoding, something like this will happen... say we have 2 proper words, 'amuse' and 'abuse', and someone types in 'atuse'. In such a case, the network by default would output 'amuse' because ASCII of 't' is closer to 'm' than it is to 'b'. I did not considered this to be a problem because essentially the network really is giving us 'the closest match', however, even if we do consider this to be a problem, considering all the classes within a probability range instead of the class of highest probability would solve it.

Normally in neural networks you would fix this by encoding the ascii as 6 bits per character of binary input nodes. i.e.

x_1, x_2, x_3, x_4, x_5, x_6 would represent alpha-numeric inputs (which is base 36) so some bit loss.

Just like we would require to add new function names to the edit distance check list, with each new Octave release, we would require to retrain a neural network to incorporate new functions with each release which is doable since we will be the ones making these changes at every release. The user won't be affected by this. Training time is a problem but again, training time won't be so large as to delay a release so its doable, obviously using GPU services is also an option. The increase in the complexity of the model with the increase in the capacity is real but likewise the edit distance algorithm will also face a similar increase in the complexity because it will have more words to look at.

The edit distance algorithm only adds \theta(mn) complexity in the worst case per new function. A 3 layer neural network adds O(n^2) /multiplications/ per layer due to matrix multiplications and then you apply your non linearity which uses exponentialization (based on my cursory look, consider switching to ReLU). Remember here that m and n are small, but n in O(n^2) has to be large to support the largest function size, and the output vector has to be the number of possible functions.

The only real problem, in my view, with neural networks is the dynamic package loading. That, as I said, 'could be solved' by using a large neural network that incorporates all the existing functions of Octave (Core + Forge). I accept the fact that it may not be the most optimal solution though.

I’m concerned about the output size in this case, as well as the weights between layers. (Memory usage is O(m^2) where m is the greatest layer size per layer)

I think it would be the best if we come up with a solution to the problem with both the approaches. Then we can look at both the approaches and mutually decide what method takes in more practical assumptions and which one is more optimized.

I agree, however, as baseline I’d suggest working on edit distance first as it seems more likely to land within GSOC. The trivial, dumb implementation (use all possible functions to match against) is probably reasonably fast anyway (since I’m guessing we only trigger this computation on a name error?), I don’t know how many functions are available in Octave but I’m assuming that there is some kind of API to fetch it at runtime (auto complete works against this API?)

I’d like to point out for all of this that some of these assumptions it may still work for what we need it too (the most common miss spellings.)

I’m being critical, but only because NN are seen as a golden hammer and I want to be careful to not use them to solve problems that already have simpler, more stable algorithms.

My suggestion to your mentors, and to the maintainers, is to have you do a short literature of both methods to see what exists then implement both if you like comparing performance. If the simpliest version of edit distance is performant enough we might as well use it as it is more dynamic, stabler, and likely easier to maintain going into the future.

[1]: https://sudeepam.blogspot.in/
[2]: https://bitbucket.org/peesu_97/octave/src/default/
[3]: https://octave.sourceforge.io/octave/overview.html

On Thu, Apr 26, 2018 at 5:27 AM, Nicholas Jankowski <address@hidden> wrote:

On Wed, Apr 25, 2018, 17:56 Doug Stewart <address@hidden> wrote:

On Wed, Apr 25, 2018 at 5:49 PM, Sudeepam Pandey <address@hidden> wrote:
Thank you Doug. I have gone through these links. I'll will inform both of you after I finish the following tasks...

1) Initialize a public repository on Bitbucket for my project.
2) Setup a blog to report the weekly work, write an initial post, and link it to the Bitbucket repository.

It would be really helpful for me if both of you could share your preferred mode and time of communication with me.

Additionally, I would like to inform both of you and the entire Octave community in general, that Shane from octave-online had shared a little more than 100,000 user sessions, and a list of 1000 common misspellings with me via email previously.

I do have some technical doubts, some propositions and some discussions to make. Should we proceed with them here or take them back to bug 46681 [1] ?

[1]: https://savannah.gnu.org/bugs/?46881

I think that we should talk here.

Probably best. I'm UTC-4, so most days of the week async communication will be most practical, and email is as good as anything for that.

--
DAS

--
DAS

From:	Bradley Kennedy
Subject:	Re: Proceeding with the GSoC Project
Date:	Thu, 26 Apr 2018 20:08:38 -0400