Github mirror and archive

From: Pjotr Prins
Subject: Github mirror and archive
Date: Thu, 6 Apr 2017 09:53:02 +0000
User-agent: Mutt/1.6.2 (2016-07-01)

A bit of a long post, but I would like some feedback:

** Going off github

Because of the discussion around JOSS I did a thought experiment last
night and it scares me: what if I closed my Google account, what if I
closed Facebook and what if I closed github. I'll turn this into a
BLOG entry, most likely.

Facebook would be the easy one (if they allow you - I think you can't
remove it). I use Facebook about weekly to check on some friends and
read a cartoon. Closing Facebook would have no impact on my life
though changing accounts would be a hassle because I would need to
re-register the interesting friends.

Google is a bit harder. I use gmail to monitor the JOSS mailing list
and a few others. I run my own mail server so I can switch Google
accounts and easily re-register. You can't use groups without a Google
account, but if I closed today I could be back when I wanted with a
different account. No real harm done. I also need it for the Google
Summer of Code (GSoC).  Google is famous for tracking and when you are
logged in (as we normally are) it can track the sites you visit. It is
enlightening to visit

For example I visited the Dlang website yesterday which apparently is
tracked via facebook AND google. Google knows where I am and that I
rented a car and my current interest is supercomputing. I have decided
to start switching Google accounts annually. Fortunately Google caters
for that - so I am not the only one. Even GSoC requires you to
re-register every year.

Now Github. I am one of the early github adopters and created my first
repository November 2008. Being a Ruby guy I was aware of the garage
startup and very much in favour of what they were attempting. It was
(and perhaps is) a cool distributed company. But now it is very
critical to my work flow. Too critical, I think, because even the
thought experiment of removing my account scares me. First of all
github provides me an internet personae. Anyone who wants to assess my
work can visit github and get a clear picture of what I am working on.
This is a valuable resource, but I can probably move it elsewhere
without too much loss. Github has a nice way of showing work, though,
and it is easy to see what organizations I am contributing to and what
type of comments I post. I always tell students that the github track
record is more important (to people like me) than the publishing
record in scientific journal. Code is much clearer to me about what
people (can) contribute. I therefore use github to assess and monitor
other peoples work. Hosting your code elsewhere, as long as it has a
decent interface, is good enough for me - I'll change that
recommendation to something more generic. Commenting on multiple sites
will need multiple accounts.

So I need an account for github hosted repositories, but I can switch
accounts annually without too many concerns, though it will be a

There are a few projects and organizations, however, which make it
very hard to leave my account and one of them is my work environment
( and the other is the Journal of Open
Source Software (JOSS). They require an account and they, implicitely,
require me to use the same account over time. These two environments
(and a few others) depend on all members in the organization to be
stable. It would not do if everyone switched accounts every year.

GeneNetwork could move elsewhere, but it would make it much harder to
collaborate. People already have github accounts which makes it easy
to contribute. Being on a different site requires people logging in
separately and would impact contributions (including bug reports) for
sure. I am certain this will have an impact: you want to make
contributing as easy as possible (especially in Science where people
don't have patience to juggle accounts).

The key thing here is the issue tracker. Github made it so that you
can not easily move the issue tracker to a different provider. Much of
the value of github is in using the issue trackers.

This cluster effect is much more visible on JOSS. In fact the journal
can not exist without github because the software stack is (1) built
on the issue tracker and (2) attracts authors and reviewers through
its network. Removing my account would invalidate the reviews I am
involved in and moving JOSS elsewhere would loose the cluster effect.

In all, I have come to depend on github, critically because
organizations I am involved in would be severely impacted when we
moved elsewhere. Even for GNU Guix - which pointedly does not use
github - would be severely hampered if something happened to
github. So many projects are hosted on github that distribution and
deployment of free software would be severely broken if github went
down. In other words we have built a monster.

People wrote me that they are concerned about other trends. For
example the Julia language is basing its package infrastructure on
github and therefore compromises robustness for convenience. The only
setup that is robust is a distributed setup - i.e., to not depend on
one party.

If github changed its ways (e.g., add advertising, a new charging
model, sell user behavior to 3rd parties or government) or its
policies (i.e., repository incompatibility with GPL - they are border
line already) and we would want to move out, it would be incredibly
hard, not least because some of our software is built on top of
github's interface and API. And because github's software is non-FOSS
we'd be stuck somewhere in a dark land.

This is not a situation I want to be in. I don't like to have all my
eggs in one basket. It is why I run my own mail server and web
server. It is why I don't like Google groups and github.

I hereby decide I am going to (gradually) move all my projects off
github and remove my main account. This won't happen within a year, I
expect, but it is clear to me that I need to do this. I will still
have github accounts, but plan to change them on a regular basis.

For projects that depend on github (such as JOSS) we ought to build a
mirror of the issue tracker. There already is some software for this
(hosted on, yes, github), so it should be reasonably trivial. Also
software releases distributed by github should be mirrored elsewhere.
GNU Guix, for example, should mirror and fully archive the source
tarballs that github provides.



