monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] [RFC] M.T. phone home


From: Nathaniel Smith
Subject: [Monotone-devel] [RFC] M.T. phone home
Date: Thu, 8 Jun 2006 01:45:07 -0700
User-agent: Mutt/1.5.11

Hey all,
I've been thinking some more about a feature that might be
controversial, so want to consult with the community before going
forward.  There are some old and relatively incoherent notes here:
  http://venge.net/monotone/wiki/CarrotAndStick

But the basic problem is this: it would be really useful if we had a
way to get more metrics on how people use monotone in real life.  For
instance:
  -- what commands do people run most often?
     (maybe they should have the shortest names, and appear earliest
     in the docs, and get the most optimization effort)
  -- are there commands that people often run in quick succession?
     (maybe there should be sugar to make that more convenient)
  -- what percentage of merges involve conflicts?
  -- what percentage of merges involve messy tree-rearrangement
     conflicts?
  -- do selectors actually get used?
  -- what are the real-world statistics for trees?  E.g., for
     benchmarking, it can matter a lot how many files are in a tree,
     how deep the directory structure is, how many files are changed
     per commit, etc., and we don't know what numbers are actually
     representative.
...and so on.

The obvious solution is to, well, start measuring them.  It's
straightforward enough for monotone to start collecting data like this
(spooling it to a file in ~/.monotone, say).  It's even
straightforward enough to make sending these files in to add to our
statistics a painless operation.  But, any app that records
information about what you do and then phones home is going to risk
being controversial!  So, my question here is, what people think about
this, and what sort of precautions need to be taken if we do this.

My current thought is that firstly, we never ever record any user
names, host names, key names, branch names, tag names, file names, or,
of course, file contents.  Given this, the only way I can think of
that we could identify what was actually being worked on was if we,
say, recorded commit times or tree sizes and compared that to
public project histories --- but presumably people working on public
projects wouldn't care about that, because, well, they're public
anyway?  My thought is that the people that are more likely to be
worried about this are people using monotone commercially.

It would be useful to record the time each command is run at, so we
can look at histories of use as well as simple frequencies (e.g., for
the "which commands are run in sequence?" question mentioned above).
To make this data even more effective, it would be useful to include a
persistent random "cookie" with each bundle of data, so that multiple
bundles from the same person could be knitted together into a single
history.  This may be controversial, though, even though it does not
involve personally identifiable data!  What do people think?

The next question is how this is presented to the user.  Obviously,
there would be some way to disable the functionality entirely.  (This
is needed for technical reasons as well, see the NTP kiss-of-death
packet for comparison.)  The UI for if it is not disabled is not
entirely clear -- the most conservative option would be for it to ship
disabled, and only be enabled by people who came across a description
of it hidden in the manual somewhere, and then took action to turn it
on.  If we make it _this_ obscure, though, then most people are not
going to even bother, which could make the whole exercise pointless.
An intermediate version would be something like:
  -- by default, records data but doesn't do anything with it
  -- after recording X bytes, starts (occasionally or always?) giving
     the user a little hint "hey, I have some data, maybe decide if I
     should send it"
  -- after recording, say, 2X bytes without any response, disable the
     functionality and delete the log file (so as not to waste their
     hard-drive space).
If someone _does_ decide to enable the functionality, then we could,
how about, whenever they run push/pull/sync, if we have X bytes of
data, post them back to us.  The biggest problem here is making sure
that this doesn't interfere with use otherwise -- like, if they're not
actually connected to the network, we don't want to freeze trying to
resolve venge.net!  Perhaps the answer is to perform the actual
operation first, and then print a message saying what we're trying to
do, and that if it freezes or they're just feeling ornery that they
can just hit C-c to skip it (and "mtn <whatever>" to disable it in the
future).

Obviously, we would never send in any data without the user having
explicitly taken some action to allow it.

Finally, once we have the data... I would very much like to make all
the data simply available publically; keeping track of who exactly was
allowed to access it would be a big pain, reduce the usefulness, and
would mean that e.g. people working on other VCSes or studying FOSS
generally couldn't use it.  So that's one of the reasons that the list
above is so careful about not gathering personally identifiable
information.  This is yet another thing I'd like feedback on, though.

If you made it this far, thanks for reading :-).  I'll probably start
implementing this in the next few weeks (assuming that the response to
this isn't an overwhelming "this would be a horrible violation and
can't be done at all!"), but really want to make sure that we get the
details right so that people don't feel spied-on or otherwise
uncomfortable.  So, comments?

-- Nathaniel

-- 
The Universe may  /  Be as large as they say
But it wouldn't be missed  /  If it didn't exist.
  -- Piet Hein




reply via email to

[Prev in Thread] Current Thread [Next in Thread]