|
From: | Bradley Kennedy |
Subject: | Re: Proceeding with the GSoC Project |
Date: | Thu, 26 Apr 2018 17:53:07 -0400 |
Yeah Doug, As a person who researches in the area of machine learning I see many issues with using this for Octave. An immediate thought is “are we trying to adapt a problem to the solution?” Neural networks have issues like fixed input/output size (which you already ran into), model capacity, bias (as in mathematical bias), and catastrophic forgetting (with continual learning). Doug brings up a great point, the only way NN would be feasible with dynamic loading of packages would be to either use continual learning (which might not work with one-hot encodings?), or retrain the model on package loading. But at some point we are also going to have increase model capacity in order to accomidate new function names as in the case. Also, you appear to be encoding your inputs as ASCII which is probably less than ideal given that assumes that inputs into your feature space are closer if their ASCII distance is closer. In natural language processing we usually use word embeddings but since the names of functions aren’t really natural it probably would apply. You could use LSTM or convolutions which may assist you with your input length problem but those are just going to slow you down and make it less feasible since you’re doing this all in Octave. An smart implementation of edit distance would likely better serve Octave. Using assumptions about the errors that users would commonly make would allow you to lower your search space and since reasonable implemenations of edit distance are \theta(mn) with m and n being the target string lengths then we only have to multiply it by the number of functions, or perhaps names, in Octave. With some clever tricks you can reduce the number of functions you have to check in this scheme (usually most people know the first character of the function they want for instance might be sufficient, or only look at words that are similar in length, these are things you’d want to research first anyway, I’m sure it is a solved problem). As the model capacity increases we also have to look at the complexity of the model as far as computation as well. Retraining the network with more examples will take longer and as we discovered I don’t think we can use a pretrained network. So in summary the problems are: - Scalability - Performance - Fixed input/output size - Training time - Dynamic names - Simpler solutions exist - Model encoding is poor I think the most common issue is going to be dynamic names, given we are using training examples from an external source they would have to be packaged somehow with the distro. Some mixed method might be interesting but again I think a well designed edit-distance would serve better. Cheers, Brad Kennedy
|
[Prev in Thread] | Current Thread | [Next in Thread] |