gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Inventory Tree


From: Thomas Lord
Subject: Re: [Gnu-arch-users] Inventory Tree
Date: Sun, 07 May 2006 14:38:54 -0700
User-agent: Thunderbird 1.5 (X11/20060313)

Hello Saurabh,

Thank you for your interest in a GNU Arch Summer of Code project.

I (Tom Lord) am the official mentor for this project.   Andy Tai will serve as unofficial co-mentor and official back-up mentor.

The deadline for applications is very tight.   Please make sure that you have read both:

    Google's guideline's:
    http://code.google.com/soc/studentfaq.html

and

    The FSF's guideline's
    http://www.gnu.org/software/soc-projects/guidelines.html

In the FSF guideline's please pay careful attention to the section titled "Your Proposal".

You asked some specific questions:


Saurabh Sehgal wrote:
I have been trying to come up with creative and efficient algorithms to implement the feature described on the above website. However, in order to achieve my goals, I need to know what documentation am I going to have to work with, and what code is going to have to be changed in the GNU Arch software, for this tool to be successful. If there is anyone, who can be kind enough to point me in the right direction in my research, I would be more than grateful to them. I really want to be a part of this project, and become a successful contributer to GNU Arch. I am not looking for any algorithms, or any suggestions on implementation, since that is for me to come up with, I just need to know what kind of code am I going to have to work with, and what existing documentation that might exist on the GNU Arch archives that I can read over. If anyone can point me in the right direction, I will be more than grateful.

1. About algorithms.

There are some algorithm puzzles that come up when implementing tree inventory and whole tree diff.  Whole tree diff is especially hard.   Tree inventory is straightforward, but does require thinking about the performance of various file system operations.   Let's call those the "big algorithm problems".

When implementing either one, there are lots of smaller choices, too.   Should such and such a data structure be an array, a linked list, or what?   Let's call these the "small algorithm problems".

So, actually, it is not realistic for you to single-handedly solve the big algorithm problems.   They are mostly solved already.   The problem space is pretty well understood by several of us.   It took a lot of us longer to come up with these solutions than you'll have time for in a summer project.

It's excellent if you think about the big algorithm problems.   You'll have to.   Your mentor will take various existing write-ups and translate them into simpler, context-specific write-ups for you.   You'll have to be able to understand these and fill in the small algorithm solutions yourself (I'll be here to help if you get stuck).   You'll have to be able to understand the big algorithms we describe and translate them into actual working code (again, with help should you get stuck).

Beyond algorithms, a lot of these projects comes down to design problems rather than algorithm problems.  What's a good syntax design for a particular control file, for example?   Part of that is what will comfortable for user's to use.   Part of it is planning for the future.  Part of it making sure the parser is easily implemented.

Now, of course, if you have something original to say about the big algorithm problems then please do.  It is far from impossible you'll have an original contribution here.   It's just that it doesn't make sense to make your success at contributing depend on your having big breakthroughs here.    We aren't going to ask you to do anything that we don't know can be done.

2. About Code

You asked what code you'll have to work with.   Actually, that's the beauty of these projects for SoC, as far as I'm concerned.

Some background:  The GNU Arch architecture is made up of a lot of very separate components.   Tree inventory is one thing.  Whole tree diff and patch is a separate (optionally layered) thing.  And so on.

In the current releases, tla, all of those separate components are mixed up into one big monolithic implementation.   In Arch 2.0, want to keep those separate components truly separate.

Why do this?  Well, because tree inventory is useful all by itself -- separate from Arch.   Whole tree diff and patch is also useful separately and even more useful when combined with inventory.   There can be more than one implementation of tree inventory, and whole tree diff/patch is useful with both together or each separately.   Those components are useful with or without GNU Arch archive implementations.  For example, separately from GNU Arch, tree inventory and whole tree dif/patch should be useful in combination with git or subsversion.   Finally, having all of these components nicely separate makes them easier to maintain in the long term -- and makes the whole system easier to maintain.

Therefore: you would be starting from scratch -- from "the empty directory".   You don't have to use or modify any of our existing code to make a useful tool.   Now, time permitting, I may ask you to do some integration with other code -- that may come up.   But the core idea of these projects is that you can succeed by cleanly implementing some entirely new libraries and shell tools.

3. About Documentation

Have you used GNU Arch yourself?

For orientation, you should quickly go through the GNU Arch tutorial for `tla'.   That will give you a clearer sense of what we mean by tree inventory and whole tree diff and patch.   As a practical matter, I think you should use tla for your projects to make it easier for you and I to work together (and for Andy, too).

For your project, it is not essential to re-implement tla's tree inventory features in an upward compatible way.  It is probably better not to do that but to instead, implement something similar but cleaner.   An upward compatibility feature may be useful later, but is not what we need for "Arch 2.0".   It's enough that you get the general idea of how tree inventory is used in "Arch 1.0" (tla) and how it fits in in general -- we can go from there.

Beyond that, because these projects are self-contained, I plan to take scattered existing materials and condense them into documents you can absorb in a day or two.

For example, on day 1 of your project, I'll hand you a specification for what we need Arch 2.0 tree inventory to initially do.   I may give some hints but that documentation won't tell you how to implement it.   It won't even specify every detail of the user interfaces and APIs.    Your job will be to make sure you basically understand that, to present me quickly with a plan for what code you'll start on in the coming days, and then to start hacking.   Aside from day-to-day questions, we'll formally touch base once a week to revise the emerging design, specification, documentation, and implementation.   At the end, you'll have produced a tree-inventory program that we can release on its own.   Then we can do the same for the harder problem of whole-tree diff/patch.


4. Other Considerations

This is a free software project using, roughly speaking, open source engineering practices.   Part of this project will be participating in the gnu-arch-users mailing list and part of the project may include integrating your code with code from other people (including me).

The critical thing from my perspective is to get these foundational layers of Arch 2.0 up and running and the mailing list and integration stuff takes a back seat to that.

An almost final step to these projects, if all else goes well, will be to work on publishing the results in appropriate ways.  We should give a final report to the gnu-arch-users list if we produce something useful.  We might want to inform other projects that we have made a tool that might be useful to them (warning way in advance: expect the predominant response to be nay-saying, no matter how good our results are).   We should take care to make sure that the sources remain available and ideally have a caretaker in the future.

A final step will be to figure out if and how you want personally persist on this project.

-t


reply via email to

[Prev in Thread] Current Thread [Next in Thread]