mediagoblin-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GMG-Devel] Possibility of moving to an SQL database backend


From: Christopher Allan Webber
Subject: [GMG-Devel] Possibility of moving to an SQL database backend
Date: Sun, 13 Nov 2011 11:30:09 -0600
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.90 (gnu/linux)

Hi all,

There's been a lot of talk on IRC almost from the beginning of the
project of the possibility of moving from MongoDB to SQL.  We decided
we'd table the issue until 0.1.0 came out.  Well, it's out, and a lot
more conversation has been going on.  Please see:

http://wiki.mediagoblin.org/SQL_Database_Backend

Particularly, see "pros and cons" and "Chris Webber's weigh-in".  In
fact, my "weigh in" is below:

So I think several things about the whole possible move to SQL.

First of all, after having written out the Pros & Cons of each, it
seems like maybe MongoDB is a lot of extra complexity and not gains in
the areas I expected it to be.  The supposed win wasn't scaling (and
as predicted, scaling down has been something we've had to work around
carefully), it was flexibility.  Does MongoDB allow for extra
flexibility?  (And we *do* need flexibility for MediaGoblin's design.)
Yes in the sense that you can dump in whatever, but if you intend to
query on any of those attributes, it's mostly no.  Indexes are
expensive, and we have to spend a lot of time carefully pussyfooting
around them.

While planning MediaGoblin, I knew that there were two patterns for
making things flexible in SQL... one of them is the table with "key,
value, type".  I thought that was unacceptably gross, and still do
think so.

The other option is that you have a "main" table (like MediaEntry), it
references what "type" it is (such as "video", "image", whatever), and
external tables for the extra information for that type point to the
MediaEntry via a foreignkey and provide whatever media type specific
data.  Similarly, use external tables for plugins.  The main reason I
didn't want to deal with this is because I imagined migrations
becoming a convoluted mess.  It wasn't that we *wouldn't need*
migrations in MongoDB, it's that maybe migrations would be less
nightmarish with extensible stuff involved.  In retrospect this was
pretty reactive to a number of frustrating times I've had to try and
walk people through broken migrations.  But I'm getting the sense that
I'll have to walk people through database complexity or breakage as
much or more in MongoDB, and the complexity of managing indexes for
any sort of extensibility is as bad or worse than dealing with
migrations with an extensible SQL setup.

So, okay.  I think I just made a pretty compelling case for moving
back to SQL.  So what then?

Two options have been proposed: try and support both SQL and MongoDB
at the same time, or create a branch that switches from MongoDB to
SQL.  I'm afraid that the former just doesn't sound like a good idea
to me at all... it seems like it'll result in a system with a
massively bloated codebase, hard for new contributors to work with,
hard to maintain, and "worst of both worlds" types compromises.  Think
about this for a second: how do you map things like migrations,
indexing, etc over?  Do we really want to completely rewrite the
MongoDB query tools over to SQL?  Some people have said that "it looks
like we have a pretty simple use of MongoDB so this layer won't be so
complex."  To me that sounds like classic hacker "well that can't be
so hard" underestimation of the complexity of the problem.  Anyway, I
already see a ton of complexities, and I'm sure there are more I
haven't even been able to see.

So the remaining option is to do a branch to switch from MongoDB ->
SQL.  There's some risk of this also... it's hard to maintain a big
overhaul branch while the mainline is constantly changing.  There's
also a risk of fracturing, and if we change our mind and stay with
MongoDB, there's even a risk of forking!  Not to mention that working
on a branch that's so huge that doesn't get pulled in is incredibly
demoralizing.

But we can reduce all those risks if we can come to a *consensus* that
this is what we want to do.  So I propose at the next meeting we
discuss this and try to make sure we're at community consensus before
agreeing to move to SQL (if that's indeed what we intend to do) and if
we do so, move to SQL *entirely*.

Here's what I envision the path to that future will look like:
 - Create a branch that prototypes all the models being switched over
   to SQLAlchemy, included "multiple media types" implemented with a
   friendly API.
 - Assuming that works nice, continue work in that branch to switch
   all code over to using SQL.
 - Figure out how to do migrations nicely in SQL, including with
   multiple media types (I have some thoughts on how to do this
   "nicely")
 - Create a MongoDB->SQL migration tool

Anyway, let's discuss this at next meeting!


reply via email to

[Prev in Thread] Current Thread [Next in Thread]