[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Converting a proprietary svn repository to git

From: Vadim Zeitlin
Subject: Re: [lmi] Converting a proprietary svn repository to git
Date: Fri, 26 Feb 2016 14:24:36 +0100

On Fri, 26 Feb 2016 12:55:48 +0000 Greg Chicares <address@hidden> wrote:

GC> On 2016-02-26 02:01, Vadim Zeitlin wrote:
GC> > On Fri, 26 Feb 2016 00:17:43 +0000 Greg Chicares <address@hidden> wrote:
GC> > 
GC> > GC> Vadim--do you see any real reason not to use 'git svn' in this case,
GC> > GC> particularly with '--no-metadata'?
GC> > 
GC> >  In the simple case "git svn clone" works just fine. Things get more
GC> > complicated if you want to convert the existing branches and tags or want
GC> > to make some changes to the repository structure. I wrote a guide about
GC> > this some time ago if you're curious:
GC> > 
GC> >   http://www.tt-solutions.com/en/articles/advanced_svn_to_git
GC> In that article:
GC>   s/abovet/above/

 Fixed, thanks!

GC> > GC> # convert svn to git
GC> > GC> /home/greg/tainted/migration/svn_working_copy/repository[0]$cd ../..
GC> > GC> /home/greg/tainted/migration[0]$git svn clone \
GC> > GC>   file:///home/greg/tainted/migration/repository \
GC> > GC>   --authors-file=authors.txt --no-metadata --trunk=/ ./proprietary
GC> > 
GC> >  I guess your svn repository uses non-standard layout, which is why 
GC> > option is needed. But other than this, this is perfectly fine.
GC> It did indeed use a non-standard svn layout:

 Sorry, I should have been more precise about what I meant by "layout". In
svn, as you know, branches and tags are nothing more than a convention
(which always amazed me as somehow they managed to regress even compared to
cvs which svn was developed to replace) and the "standard layout" just
means having them in the correct places, i.e. have "trunk", "branches" and
"tags" as the only top-level directories in the repository, see e.g.

GC>   svn_working_copy/
GC>     repository/
GC>       data/
GC>         .svn/
GC>       src/
GC>         .svn/
GC>       test/
GC>         .svn/
GC>       .svn/

 This looks like a checked out version, so it doesn't necessarily say
anything about the layout of the repository if you have checked out just
the trunk. But if you have checked out the repository root (i.e. "svn info"
shows the same values for "URL:" and "Repository Root:"), then it is indeed

GC> and after conversion I have:
GC>   proprietary/
GC>     data/  263 files, 12.2MB total
GC>     src/    37 files,  1.4MB total
GC>     test/  156 files, 11.2MB total
GC>     .git/
GC> Should I restructure that?

 There are no restrictions/conventions for organizing files in Git
repositories whatsoever, so you can do whatever you prefer. In Git branches
are branches and tags are tags and not just some awkwardly named copies of
the repository contents.

GC> (1) Is there some "standard" layout with an expected main directory
GC> like svn's "trunk"?


GC> (2) Should I use git submodules? (At a quick glance, this seems to bring
GC> little benefit in return for much complexity in our case.)

 No, submodules are arguably one of the most problematic parts of Git and
you shouldn't use them unless you're in the exact use case they were meant
for, i.e. you're using 3rd party libraries which you rarely modify/update
(and some people will tell you that you should avoid submodules even them,
but I'm not of this opinion myself).

GC> The subdirectories are interdependent as follows:
GC> - src/ is independent. It contains the source files described here:
GC>     # Files whose names match 'my_%.cpp' are taken as product data files,
GC>     # which are overridden by any customized files found in a special
GC>     # directory.
GC>     vpath my_%.cpp        $(src_dir)/../products/src
GC>   which lmi's makefiles use to build product_files$(EXEEXT).
GC> - data/ depends only on src/ . It contains the xml files created by
GC>   running product_files$(EXEEXT). These files are distributed to users.
GC> - test/ contains the lmi input files (.ill, .cns, etc.) used for
GC>   'make system_test' (which are independent and change infrequently), as
GC>   well as md5sums of the test results (which depend on data/ and also on
GC>   the code in the public lmi repository, and change frequently). The
GC>   regression-test results (about 110MB in 1400 xml files) are not stored
GC>   here; if all the dependencies are right, then Kim's system-test md5sums
GC>   and mine will match perfectly, and we virtually never have cause to
GC>   share any of the 110MB of output files.

 It looks like it would be possible to split this repository in "src" and
the rest, but it doesn't seem like there is any real reason to do it and,
without one, why bother? Whichever VCS you use, one repository is always
simpler to have than two.

GC> Then I have a pending changeset that, in concept, is a simple rename of a
GC> few files in test/

 git svn tracks svn "renames" provided you use a not too ancient version of
svn libraries, but it would probably be simpler to rename the files already
in git.

GC> But first I'll have to get rid of '$Id', which makes most files
GC> mismatch the svn originals, and in particular affects every line of
GC> test/md5sums .

 Getting rid of them completely would be the best thing indeed. Git can
emulate svn keywords with smudge and clean filters, but I'd recommend to
keep thing simple and not use them.

 Good luck,

reply via email to

[Prev in Thread] Current Thread [Next in Thread]