[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Watt-dev] A cross-language spec for WATT
Re: [Watt-dev] A cross-language spec for WATT
Tue, 11 Oct 2005 14:34:56 -0700
On Sun, 2005-09-04 at 07:36, Daniel P. Berrange wrote:
> There now follows modest proposal expanding the scope of WATT to
> provide a general specification for application tracing toolkits,
> across any language. There are some attached files giving examples
> of the file formats, and a couple of diagrams. Finally I've also
> attached a proof-of-concept implementation for Perl.
This is, indeed, an excellent idea. Splitting data collection from
reporting should help make the reporting tools much better than the
one-off tools that watt provides right now.
> The concept of an operation
> can be easily extended to cover any interaction with external systems,
> such as remote procedure calls, or interactions with messaging services
> like IBM MQ.
In all generality, an operation will likely be a point in the execution
of the program, best represented by a stacktrace, plus additional
> For developers of a web application, live pages may be
> desired to provide just in time view of the data collected. For a
> production support team, low detail, but long term aggregation of
> operation statistics may be desired to identify potential trouble
> spots, or abnormal runtime behaviour.
That I think is the main value of a 'developer support' type system:
collecting highly selective data about the execution of a program. In
some ways, this is similar to what a profiler does, except that
profiling data is for many purposes way too fine grained. The value of
developer support comes from having an easy way to instrument the
program for a specific purpose. In addition to the data a profiler
collects, developer support collects metadata that is, again, specific
to the purpose of instrumenting.
> For testers operating an
> integration test harness, reports on the data may be desired to
> provide a qualitative view on the system wrt to a previous baseline.
Interesting. Can you elaborate that a little ?
> Object model
Looks excellent; I would call the HTTPProcess an HTTPRequest, though.
Instead of distinguishing between a process and a script, why not just
add the additional attributes from script to the process ? They are easy
enough to gather from /proc/<pid>. It might be useful at some point to
harvest the whole /proc/<pid> dir for a process. Before we do that
though we should have some reporting use cases that clearly demonstrate
With context attributes, it migh be cleaner to let people subclass the
standard object model with their special-purpose classes. I don't think
a process is different from a stage in that respect. In a way, a
transaction is a stage with special context attributes added. But all
this points to the problem of extensibility of the object model, which I
would just punt on for now.
> Database Operation
> - row count. (8 bytes)
This might be tricky to get to if we want to avoid that the trace tool
causes database queries by itself.
> Messaging Operation
> RPC Operation
Do you know whether it is possible to hook into the appropriate
subsystem to collect the metrics similar to how the DB driver is
> Depending on the circumstances of the deployment, it can be
> desirable to record different levels of detail.
Another very good point. I think there are two options: either make it
possible to easily add/remove points of intrumentation or ignore points
of instrumentation that are always 'on'. Your logging proposal does the
For the Java part, I have been thinking of using BC annotation to define
stages, so that adding a stage could be done simply through stating in a
config file something like
which says 'Stage foo is entered (exited) whenever someMethod in
com.example.SomeClass is entered (exited)'. It has the advantage that no
code changes are needed to instrument the app, and the downside that the
scope of a stage has to coincide with a method call.
> A basic set of logging levels would thus comprise
> - off - no information collected
> - summary - only summary information
> - detail - summary, operations, stages & exceptions
> - complete - all information
If we go with the log-level approach, we might be able to get some
mileage out of log4j.
> Storage mechansim
> - Database - persist records to a fully normalized database.
> This is high overhead on insert, and fairly space
> inefficient, but lends itself very well to bulk
> data analysis
I would assume that this is mainly useful when the data needed for a
report would consume a significant amount of memory. It might be better
to leave the details of database storage up to the reporting tool, and
just focus on the plain file storage mechanism, i.e., something that
allows the instrumented process to write its data out as fast as
> File Storage
> When storing stats for a process, the bucket
> is chosen pseudo-randomly, for example by taking modulus of
> the UNIX PID wrt to 'n'. ie, getpid % n. The choice of value
> for 'n' is thus a factor of the ratio between the time for a
> single process, and the time to store the process's stats.
Why not use a simple per-process counter, where the counter is either
unbounded or bounded ? This would also make it easy to clean up unwanted
stats in a FIFO manner.
> Within each bucket, a ring buffer storage mechanism is used.
> Each bucket is intended to store 't' sets of stats, so, when
> the 't'th entry is created, it returns an overwrites the 1st
> entry again.
I am not sure I understand the reason for the two-level hash mechanism.
Is this just to keep directories to a reasonable size ?
> Index files
> The index.txt file within each sub-bucket contains one line
> per detail file.
Shouldn't the generation of the index file be left up to the reporting
tool ? It seems that the information that is useful in an index file
highly depends on the report that is to be generated.
> Detail files
Sounds good. We need to put some more thought into how the XML files are
written for long-running processes so that reporting on them can be done
while the process is still running. These files should be the main
interface between the instrumented process and the reporting tool(s).
That now strikes me as one of the main design deficiencies of the
current watt: it relies on in-memory communication between the two,
which is awkward and leads to ClassLoader strangeness.