octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

First plans on a profiler


From: John W. Eaton
Subject: First plans on a profiler
Date: Tue, 24 May 2011 14:01:33 -0400

On 21-May-2011, Daniel Kraft wrote:

| here are some first thoughts on implementing the profiler; so far I've 
| not yet tried to actually hack together something based on them, but 
| welcome any comments!  (As you of course know the internal working of 
| Octave by far better than I do.)
| 
| At first, I would like to experiment with exact profiling on the level 
| of functions.  For this, I would like to measure the time spent in each 
| individual function -- possibly by doing something like (just as kind of 
| "pseudo-code" of course):
| 
| function bar ()
|    time = 0;
|    start = cputime ();
| 
|    ... some code ...
| 
|    end = cputime ();
|    time = time + (end - start);
|    bar ();
|    start = cputime ();
| 
|    ... some more code ...
| 
|    end = cputime ();
|    time = time + (end - start);
| end
| 
| And keeping track of "own" time for each function this way.  We'll see 
| then by experiment whether this gives acceptable performance and whether 
| the resolution of the timer is well enough to make this feasible.
| 
| Note that we probably even have to do the timing with unwind_protect (or 
| rather some equivalent handling directly in the interpreter of course) 
| in order to ensure correct working even in the presence of errors thrown.
| 
| I tried a quick search of the octave's pt-* code but wasn't able to find 
| where function calls are made or which structures represent functions 
| defined in .m files.  (I found fct-handle, but am not sure whether this 
| is actually really involved when calling any function or only for stuff 
| like anonymous functions or doing foo = @bar?)
| 
| Do you think we should store timing information directly in the function 
| object in the syntax tree?  I thought so first, but then it came to my 
| mind that with that, it would possibly be harder to iterate over all 
| functions called during the last profiling session to get the output 
| together and clear the times when necessary...  So an alternative would 
| be to have a seperate "profiler" class store all the data in some kind 
| of map which includes all functions as keys (either hashed on the 
| function name or the pointer to the function object).  What do you think 
| there?

The evaluator is in pt-eval.cc, but you won't find a place where
functions are called directly there.  Instead, dispatching to function
calls happens as part of evaluating expressions at the line

  octave_value tmp_result = expr->rvalue1 (0);

in the tree_evaluator::visit_statement function.  Things are done this
way because the parser does not know whether an expression like

  x

refers to a variable or is a function call until it actually tries to
do the evaluation, which then looks up the symbol value.  I think this
is unavoidable given the possibility for things like

  function x ()
    "you have reached the function x"
  end
  function foo (s)
    eval (s);
    x
  end
  foo ("disp ('hi there, I do not define x as a variable');")
  foo ("disp ('hi there, I define x as a variable'); x = 13;")

Similarly, expressions like

  x(a)

could be either function calls or array references since both use the
same syntax.

So, the statement

  octave_value tmp_result = expr->rvalue1 (0);

dispatches via the expr class hierarchy to find the right type of
expression.  Function calls can be made from either the
tree_identifier class (functions called without (possibly empty)
argument lists) or the tree_index_expression class.  But even there,
you don't know precisely what type of object you have until you ask
for its value, so I think you will have to place timing calls inside
the functions that actually evaluate the function objects themselves.
So you will most likely have to modify the functions

  octave_user_script::do_multi_index_op
  octave_user_function::do_multi_index_op

in ov-usr-fcn.cc for user-defined scripts and functions,

  octave_builtin::do_multi_index_op

in ov-builtin.cc for built-in functions and functions defined in .oct
files (they are both handled as "built-in" functions because they are
called in the same way even though .oct files are dynamically loaded),
and

  octave_mex_function::do_multi_index_op

in ov-mex-fcn.cc for functions defined in MEX files (these are handled
separately because the calling conventions are different from built-in
functions and functions defined in .oct files).

I don't thin you have to do anything special for function handles
since you want to time the functions they refer to.

Anonymous functions are just user-defined functions, so they will be
handled by the octave_user_function class, but you will need some way
of distinguishing each anonymous function so you can properly report
the timings for each one separately.  For example, you'll probably
want to be able to report the timing for "anonymous function defined
in FILE at line L, column C".

I don't know how best to record the timings, but here is an possible
way:

  * Define a class profile_data_accumulator.  This class should
    probably be a singleton object since we likely only want one
    profiler at a time to be active and using a singleton helps to
    avoid initialization problems.  There are other singleton objects
    in Octave and they all use a similar style, so it would be best to
    follow the style of the others.

  * The profile_data_accumulator class should have a function that
    checks the CPU timer and stores timestamp date somehow.  I think
    something like

      void
      profile_data_accumulator::do_timestamp (octave_function& fcn)

    should work because all the octave_X function class objects listed
    above are derived from the octave_function class, and so you
    should be able to get all the info about a function object that is
    required to uniquely identify it from the octave_function object
    (or through virtual functions that will dispatch to the derived
    objects to get you specific information about the individual
    classes).

    The do_timestamp function is called, it can get the current CPU
    time info and it should know the previous value as well.  This
    chunk of time should be applied to the currently profiled function
    and then the function which was given as an argument will become
    the current function and the current CPU time info can be
    updated.  I think the results can be stored in a simple map that
    allows lookup from a full function name (complete file name for
    .m, .oct, .mex files, something like builtin:NAME for actual
    built-in functions, command-line:NAME for functions defined on the
    command line, and perhaps anonymous:LINE:COLUMN for anonymous
    functions).

You'll also need to deal with initialization (on the first call) and
clearing of the map that contains the data, and (I think, but I'm not
sure) a way to stop accumulating data for a given function when
control returns to the Octave prompt.

By having the profile_data_accumulator::do_timestamp function handle
all these details, I think you should just need to add something like

  if (tree_evaluator::profiling)
    profile_data_accumulator::do_timestamp (*this);

to each of the four do_multi_index_op functions listed above to
collect data.

As for displaying the data and controlling the profiler (turning it on
or off or whatever), I think we should have a "profile" function that
is compatible with the one in Matlab since that is what people using
Octave will expect.  It would be nice, but not essential, to have a
profsave function that does something functionally equivalent, but I
don't think it is necessary to generate precisely the same HTML...

One thing I'm not sure about is what to do with recursive function
calls.  How do profilers normally handle that?  Is the time for all
recursive calls (to the same function) lumped together, or are they
stored separately somehow?

Another is how to handle the call graph, if that is information that
needs to be stored.

Whatever we do, I guess we'll need to store enough information to be
able to generate the structures that can be returned by calling the
profile function with "info" and "stats" arguments.

In any case, does the above outline of a way to attack this problem
sound like it would work?  I'm not sure of all the details and have
not given it a lot of thought, so please follow up here if there is
something above that is not clear, seems wrong, or appears to be
missing.

jwe


reply via email to

[Prev in Thread] Current Thread [Next in Thread]