Re: [Monotone-devel] url schemes

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] url schemes

From:	Derek Scherger
Subject:	Re: [Monotone-devel] url schemes
Date:	Sun, 23 Mar 2008 20:43:28 -0600
User-agent:	Thunderbird 2.0.0.12 (X11/20080303)

Markus Schiltknecht wrote:

Hello Derek,
first of all: nice work in nuskool! Thanks for ripping out my sillycode, which re-implemented a kind of toposort. Dunno what I was thinkingthere...

Haha, I remember looking at that and thinking, "there must be a simplerway" and toposort was it.

[ A small side note: I'd have had an easier life reading your coolpatches, if you committed whitespace changes separately. ]

Yeah, sorry about that. Emacs cleaned up a bunch of things I didn'tnotice until I had made some changes and I didn't take the time andcommit this as two changes.

Xxdiff does work reasonably well to look over whitespace polluted diffsif you turn off display of whitespace. ;)

Too Verbose, maybe. But also very simple to understand.

Indeed. It did turn out to be very simple. The multiplicity ofencode/decode request/response things just seems a bit over the top.

On the bright side, I have managed to pull files and revisions frommy monotone database using the nuskool branch (which doesn't yet pullcerts or keys or care about branch epochs but does basically seem towork). It is rather slow at the moment (71 minutes vs 25 minutes withnetsync, which *does* pull certs, keys etc.). I haven't done anyprofiling yet but I would expect two things to show up.
Uh.. that is the time to pull the complete net.venge.monotonerepository, right? While that certainly sounds awful, let me point out


Correct.

that that's not the case where nuskool is supposed to be the winner.

I'm assuming that if this does work out it will replace netsync and itjust can't be slower and be successful imho.

It's rather optimized for subsequent pulls and it's already faster thannetsync there:

Yeah, the revision refinement phase is really quick. Side note: I'm not100% sure it's correct yet. I do recall seeing a push saying that thereare X outbound revs while pull, with the databases reversed, had someother number of inbound revs. We need to double check this.

# time ./mtn gsync -d ../test.db http://nabagan.bluegap.ch:8080/monotone/
mtn: 13,850 common revisions
mtn: 130 frontier revisions
mtn: 0 outbound revisions
mtn: 0 inbound revisions

Oh, another note here. I purposely set things up in run_gsync_protocolso that the client knows exactly which revisions are inbound andoutbound, thinking that we really want something like push/pull/sync--check to list (but not transfer) revisions that will be transferred.The mercurial equivalents are the incoming/outgoing commands.

This may require a bit more information coming back in the descendantsresponse, including author/date/changelog/branch certs for example. Thethought of combining author/date/changelog/branch into one commit certcrossed my mind here again. The current certs don't allow us to tie thecorrect things together. Maybe we should start another branch to combinethese certs into a single commit cert.

./mtn gsync -d ../test.db http://nabagan.bluegap.ch:8080/monotone/ 1.48suser 0.13s system 38% cpu 4.172 total
(Avg ping time from here to nabagan.bluegap.ch is ~60 ms)
(Agreed, that's not a fair comparison either, because gsync doesn't pullcerts.)


Yeah, but it is encouraging, nonetheless.

(1) printing/parsing basic_io has come up in the past and nuskool addsvery similar printing/parsing json_io so it will probably double theprinting/parsing time.
That applies to the current http channel. Other channels might or mightnot use JSON. Or maybe we even want to add different content-types forhttp, i.e. return json or raw binary, depending the http accept header.


Yeah, both ideas have crossed my mind as well.

(2) it's currently very granular, request one revision, receive onerevision, then for all files changed in the revision request one filedata or delta, receive one file data or delta, etc. until all thecontent for the revision has been received, then move on to the nextrevision. latency of request/response times is probably a big factor.
Agreed. However, merging multiple get requests for a single resourceinto one multiplex request is just one option to solve that problem.Another one would be running multiple queries in parallel. Dunno howfeasible that is, though.

I may just try having get_revision include all of the file data/deltadetails as well, and see how big these get in the monotone database. Ifwe didn't first encode the json object as a string and subsequentlywrite it to the network we could just start writing bytes until we weredone and not have to hold them all in memory. However this causesproblems with trying to set the Content-Length header. I'm not sure whatto think of issuing several requests (one for each file data/delta in arevision, perhaps up to some limit). Actually, I don't think it wouldhelp, because the server can only handle one request at a time afaict orthere will be multiple scgi processes running and there will be databaselock issues.


Probably doing a bit of profiling first would be the best idea!

(Using threads could also help hash calculation... considering ourcommodity hardware boxes are getting more and more cores per box, thatmight be worth it in the long run).

So would a hand-optimized sha1 implementation. Would someone just writeone of these already! ;)

Plus: having that simplicity would allow us to handle dumb serverspretty equally.
I went with the fine-grained get/put request/response pairs so thatneither side would end up having to hold too many files in memory atany one time. If we instead requested all file data/deltas for one revthe number of round trips would be reduced but we'd end up having tohold at least one copy (probably more) of the works in memory whichdidn't seem so good. I'm open to suggestions. ;)
I don't think files necessarily need to be put together by revision -that would be a rather useless collection for small changes. Instead, weshould be able to collect any number of files together - and deferwriting the revision until we have all of them.


I'm not really sure where you're going with this.

I certainly think of JSON as a good exchange format. It doesn't onlyhelp JavaScript, but provides a good mixture between well structureddata (think XML) and raw binary data. It provides some structure, butit's not overly verbose. And it's easily usable from pretty much anyscripting language.

Agreed, however, I'm wondering how popular or useful scriptedpushing/pulling is going to be. When I first say the json format Ithough that it might have been nice to have that rather than basic_iobut it probably didn't exist at the time basic_io was invented.

However, one of the downsides of JSON is: it cannot encode binary data.Or more precisely: strings are interpreted as UTF-8 encoded, so youbetter don't write binary data in there.

Yeah, the base64 encoding/decoding of file content is another extra stepthat shouldn't really be needed.

Thus, JSON and binary encoding for revs don't seem to mix well here. Asmuch as I like binary encoded stuff for internal things, I also like tobe able to read the revision's contents.
Once again, this makes me think about using the revisions solely forsynchronization, and not storing them in the database, but use (binary)rosters instead.

Or storing the revisions in the database as binary rather than text, butI guess we don't actually use the revisions themselves that much do we.Seems like a reasonable idea.

In general, I think it would be great if we had a few people workingtogether on all of these things, rather than one poor lonely soul oneach of them. You and Zack seem to have been doing a bit of this on thecompaction and encapsulation branches and I'm sure it's more fun andproduces better results that way.


Cheers,
Derek

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Monotone-devel] url schemes, (continued)
- Re: [Monotone-devel] url schemes, Derek Scherger, 2008/03/22
  - Re: [Monotone-devel] url schemes, Markus Schiltknecht, 2008/03/23
    - Re: [Monotone-devel] url schemes, Derek Scherger <=
    - Re: [Monotone-devel] url schemes, Markus Schiltknecht, 2008/03/24

Prev by Date: [Monotone-devel] nvm.experiment.db-compaction
Next by Date: Re: [Monotone-devel] url schemes
Previous by thread: Re: [Monotone-devel] url schemes
Next by thread: Re: [Monotone-devel] url schemes
Index(es):
- Date
- Thread