neurostat-develop
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Neurostat-develop] first ideas...


From: Joseph
Subject: Re: [Neurostat-develop] first ideas...
Date: Thu, 20 Dec 2001 09:34:24 +0100

Fabrice Rossi wrote
> My post was not very clear, so I rephrase. I completely agree with most of
> what you've answered, basically because I don't think having both
> representation AT THE SAME TIME is a good idea. My idea was simply to assume
> that when the network is dense, it uses a dense representation, whereas it
> uses a sparse representation when it is not dense. The idea is simply to have
> an union (or something similar). At the beginning of the (back)propagation
> function, you switch on the type of the MLP to call a sparse calculation or a
> dense one. You have only one representation at a given time. When you prune
> the network, the representation can change and you NEVER use a double
> representation. My idea is that pruning is useful but soft pruning
> (regularization) is also useful. So I think we should provide room for
> optimized spare MLP as well as for optimized dense MLP.


Ok, this seems reasonnable to keep room for the both, if we don't have
the both representation at the same time.



> I don't think so. I completely agree that for MLP related functions, there is
> a strong interaction between architecture and weights. But this is not the
> case during gradient descent. If we want to be able to use normal
> multidimensionnal minimizers for training, I think we should keep things
> separated. Indeed, the classical way to represent a function in C is to have a
> struct with an eval function pointer and a void * params. The evaluation type
> is something like this:
> double (* f) (const double * x, const int n,void * params)
> The  void * is a placeholder for any parameters needed by the function. The
> rationnal of this kind of representation is that the function does not deal
> with memory related issues. It is submitted an input vector (i.e., const
> double * x, const int n) and returns back a double value.
> 
> The easiest way to translate a MLP into this kind of representation is to use
> the MLP struct (as well as the training data and the error function) as the
> parameters. The input vector (which is at this point the parameter vector of
> the MLP) is not included into the params. If you put the parameter vector
> inside the MLP, you run into endless memory management problems. When do you
> decide to trust or not the pointer? If it has been submitted by a minimization
> algorithm, how long does it remain valid? If you cannot keep it (because it
> might be freed by its owner), what is the point in storing it inside the MLP?
> 
> I think that the training of an MLP should use the following algorithm:
> 
> 1) create a MLP and a initial weight vector w
> 2) reduce the modelling error thanks to a gradient descent algorithm starting
> at w
> 3) modify the MLP architecture using w_opt the result of the gradient descent
> -> you obtain a new w (possibly smaller, maybe sparse)
> 4) go back to 2
> 
> During step 2, you don't care about the MLP architecture, you don't even know
> you are working with a MLP. And the gradient descent algorithm is doing
> whatever is needed to w (allocation, freeing, etc.).
> 
> At the end of step 2, you end up with a new optimized parameter vector, which
> can be used by step 3. There is no reason for step 3 to keep this vector. It
> can be replaced by a sparse one, a smaller one, etc.
> 
> I'm not saying that I don't want to use specifically designed MLP training
> algorithms (in fact I don't, but I don't want to stop other people using such
> things), but I don't see any problem with separation between architecture and
> numerical parameters, whereas I do see problems with mixed representation.



I totally agree with what you have writed. I never thinking to use MLP
specific training algorithm nor use architecture information for
optimization task. 
I thinking just that optimization algorithm can reach parameter thanks
MLP->parameter (a double *).
and reach the derivative of the  MLP function (or the cost function)
thanks MLP->gradient (another double *).
Hence, when you store the MLP in a file you can have in a file the
architecture, and the weight of the MLP. 



I think that we never change the size of this vector in a optimization
algorithm so we always trust the values of the vector even if they are
changed by this algorithm. 

However, I understand that we can consider the parameter as the entry
and the architecture (and data) as parameter since in pratice we
optimize a function of parameter of the MLP and not a function of the
data. 

And finally, I agree to keep outside of the struct MLP the parameter
vectors and so the derivative with respect to the parameters.







> 
> > Finally the sizes can be keeped, and we can test the adequacy of the size
> > for debugage purpose.
> > (with #ifdef DEBUG ....#endif).
> 
> Right, but I still wonder if it's needed. I mean that sizes are already
> specified in the MLP struct...

In fact, I think that we can omit the dimensions, since such error is
easy to retrieve with a debugger.


Joseph



reply via email to

[Prev in Thread] Current Thread [Next in Thread]