monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] database compaction


From: Markus Schiltknecht
Subject: Re: [Monotone-devel] database compaction
Date: Thu, 18 Oct 2007 08:33:35 +0200
User-agent: Icedove 1.5.0.10 (X11/20070328)

Hi,

Zack Weinberg wrote:
Paul Crowley had a suggestion for an encoding that would work and
might be simple enough to be worth implementing, even though (as you
say) the space savings are quite small relative to the total database
size.

I remember having seen some bit encoding. Do you have any pointers to Paul's suggestion? A real, optimized Huffman thing would be, troublesome because it's bit based, so the resulting bit string could be 11 bits long and thus not end on byte boundaries. Can sqlite store bit strings of a certain length efficiently?

Anyway, one could also come up with a simpler, byte based encoding. Something like that:

  height           encoded
  values           representation

  0 - 127          0xxxxxxx
  128 - 16511      10xxxxxx xxxxxxxx
  16512 - 2113664  110xxxxx xxxxxxxx xxxxxxxx
  ...

FYI: I've copied a sample height value distribution list below.


Another idea which just popped up: do we need to store the heights in a single row per revision at all? It looks more like sqlite is loading it all into memory once and then using SQL to query it seems rather expensive.

Wouldn't it be better if we stored all the heights in a single blob, loading it all into memory in one step? That would allow much better compression using libz, because lots of height sequences are repetitive.

Up to now, we need heights for annotate and log, right? Both presumably use more that one rev_height per invocation...

== Using rowids or lookaside tables ==

The wiki has several paragraphs about using rowids or lookaside tables.
I don't particularly like that idea, because a) it increases the number
of lookups needed, b) savings are dubious and c) it is database specific
(okay, that probably doesn't count...). I might rethink b) as soon as we
switch to longer hashes, but 20 bytes hardly justify the extra row
headers (and indices) necessary.

We really do want to do this, but only in contexts where table A
stores a hash which is used as the sole key in a SELECT on table B.
For example: file_deltas.base, revision_ancestry.(parent,child),
rosters.id, roster_deltas.(id,base) and maybe revision_certs.id.  In
that context, doing it reduces lookups, because what sqlite does if
asked to SELECT * FROM b WHERE id = 'hash' is look up 'hash' in the
autoindex for id, which tells it the rowid, and then do a second
lookup in the actual table.

Doh! Really? I see, sqlite is very much not like Postgres... That must hurt as soon as those things don't fit into memory anymore, no?

Anyway, thank you for pointing this out. I'm still somewhat skeptical, because your example only shows the winning side of the coin. There are other queries, for example getting the parent of a revision. Currently, that's an index scan on the autoindex for revision_ancestry(child) (followed by the rowid lookup). If we use rowids in revision_ancestry, we'd first have to lookup the child revision's rowid in revisions, then lookup the rowid of the parent revision in revision_ancestry to finally get the real revision id from revisions by a rowid lookup. That seems rather expensive, or am I missing something because of thinking in terms of traditional databases?

Regards

Markus




Distribution of the different height values in my monotone database at some point in time. Others, like the xaraya, openembedded or botan repositories were similar.

height  number of
value   occurrences

0,      57888
1,      25721
2,      5737
3,      11909
4,      6889
5,      620
6,      360
7,      501
8,      792
9,      3329
10,     678
11,     555
12,     420
13,     7606
14,     456
15,     296
16,     162
17,     259
18,     504
19,     294
20,     252
21,     225
22,     275
23,     6014
24,     4675
25,     112
26,     310
27,     1761
28,     388
29,     97
30,     6838
31,     336
32,     502
33,     89
34,     200
35,     538
36,     667
37,     391
38,     75
39,     78
40,     86
41,     460
42,     721
43,     131
44,     60
45,     85
46,     70
47,     1314
48,     178
49,     65
50,     58
51,     1973
52,     57
53,     67
54,     107
55,     87
56,     58
57,     203
58,     5467
59,     39
60,     75
61,     46
62,     78
63,     950
64,     47
65,     128
66,     34
67,     51
68,     43
69,     45
70,     33
71,     110
72,     32
73,     46
74,     240
75,     50
76,     32
77,     32
78,     34
79,     59
80,     24
81,     31
82,     577
83,     25
84,     1157
85,     32
86,     75
87,     32
88,     29
89,     24
90,     22
91,     21
92,     24
93,     22
94,     25
95,     48
96,     23
97,     123
98,     31
99,     25
100,    20
101,    54
102,    29
103,    21
104,    18
105,    20
106,    18
107,    46
108,    19
109,    18
110,    16
111,    17
112,    42
113,    26
114,    168
115,    29
116,    21
117,    18
118,    15
119,    18
120,    92
121,    19
122,    15
123,    16
124,    15
125,    15
126,    20
127,    16
128,    30
129,    1257
130,    17
131,    18
132,    15
133,    15
134,    112
135,    448
136,    18
137,    13
138,    219
139,    75
140,    13
141,    11
142,    535
143,    11
144,    10
145,    10
146,    11
147,    10
148,    10
149,    11
150,    11
151,    119
152,    11
153,    9
154,    9
155,    9
156,    9
157,    9
158,    9
159,    9
160,    10
161,    9
162,    10
163,    37
164,    11
165,    9
166,    11
167,    15
168,    11
169,    8
170,    134
171,    13
172,    8
173,    9
174,    8
175,    9
176,    10
177,    8
178,    13
179,    8
180,    28
181,    12
182,    12
183,    15
184,    7
185,    7
186,    15
187,    7
188,    8
189,    7
190,    138
191,    8
192,    64
193,    7
194,    8
195,    10
196,    13
197,    8
198,    982
199,    7
200,    777
201,    7
202,    7
203,    7
204,    7
205,    7
206,    21
207,    3836
208,    11
209,    20
210,    6
211,    290
212,    8
213,    10
214,    6
215,    7
216,    5
217,    4
218,    4
219,    7
220,    5
221,    4
222,    6
223,    24
224,    4
225,    4
226,    1645
227,    3
228,    3
229,    4
230,    3
231,    3
232,    3
233,    3
234,    3
235,    3
236,    3
237,    3
238,    3
239,    3
240,    3
241,    3
242,    3
243,    3
244,    3
245,    3
246,    3
247,    3
248,    3
249,    3
250,    3
251,    3
252,    3
253,    3
254,    4
255,    3
256,    3
257,    3
258,    3
259,    3
260,    3
261,    3
262,    3
263,    3
264,    3
265,    3
266,    2
267,    2
268,    2
269,    2
270,    2
271,    2
272,    2
273,    2
274,    2
275,    6882
276,    2
277,    2
278,    2
279,    2
280,    2
281,    2
282,    2
283,    2
284,    2
285,    2
286,    2
287,    2
288,    2
289,    2
290,    2
291,    2
292,    2
293,    2
294,    2
295,    2
296,    2
297,    3
298,    2
299,    2
300,    2
301,    2
302,    2
303,    2
304,    2
305,    2
306,    2
307,    2
308,    2
309,    2
310,    2
311,    2
312,    2
313,    2
314,    2
315,    2
316,    2
317,    2
318,    2
319,    2
320,    2
321,    2
322,    2
323,    2
324,    2
325,    2
326,    2
327,    2
328,    2
329,    2
330,    2
331,    2
332,    2
333,    2
334,    2
335,    2
336,    2
337,    2
338,    2
339,    2
340,    2
341,    2
342,    2
343,    2
344,    2
345,    2
346,    2
347,    2
348,    2
349,    2
350,    2
351,    2
352,    2
353,    2
354,    2
355,    2
356,    2
357,    2
358,    2
359,    2
360,    2
361,    2
362,    2
363,    2
364,    2
365,    2
366,    3
367,    2
368,    2
369,    2
370,    4
371,    2
372,    2
373,    2
374,    2
375,    2
376,    6
377,    2
378,    2
379,    2
380,    2
381,    2
382,    2
383,    2
384,    2
385,    2
386,    2
387,    2
388,    2
389,    2
390,    2
391,    2
392,    2
393,    2
394,    2
395,    2
396,    2
397,    2
398,    2
399,    2
400,    2
401,    2
402,    2
403,    2
404,    2
405,    2
406,    2
407,    1
408,    1
409,    1
410,    1
411,    1
412,    1
413,    1
414,    1
415,    1
416,    1
417,    1
418,    1
419,    1
420,    1
421,    1
422,    1
423,    1
424,    1
425,    1
426,    1
427,    1
428,    1
429,    1
430,    1
431,    1




reply via email to

[Prev in Thread] Current Thread [Next in Thread]