[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[directory-discuss] Debian/Ubuntu Database import
From: |
Andrew Engelbrecht |
Subject: |
[directory-discuss] Debian/Ubuntu Database import |
Date: |
Mon, 26 Mar 2012 00:04:40 -0400 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111114 Icedove/3.1.16 |
Hello Michael and directory-discuss,
I'd like to introduce Michael Faille, who I met at the LibrePlanet
conference yesterday. He said that he would like to help out with the
directory, including the planned database import from either Debian or
Ubuntu. It's a big job, and I know that Joshua has been discussing it
with some people regarding how it should be done, so I hope he'll weigh
in on our strategy.
For those of you who don't know about the plan, the first long-term goal
I see is to get updated project version info from some distro's
repository data and onto directory entry pages. Reading values about
each package shouldn't be the hard part, but I think we will have the
biggest challenge in matching up directory.fsf.org project urls/page
names with a distribution's package names, since there will be some
variation between the two lists. So there will have to be some manual
matching and verification wherever it is challenging for an automatic
script to find matches.
For instance, in the directory, the project name "GIMP" corresponds to
"gimp" in the debian repository. That's an easy match for a script to
find. However matching "Armadillo: C++ library" and "libarmadillo2" is a
bit harder, and I think there are some that will be more challenging
than that. One strategy for this issue could be to try auto-generating a
list of possible matches, based on similar names and project homepage
urls for each project in the directory. We could then to split between
many people the task of human selection/verification.
And once that is done, we can write a simple python script using the
mediawiki extension to auto-edit the "templates" on each project page in
order to include an entry that lists the distro's package name. Then it
will be easy to broaden the scope of data to import, such as updated
descriptions, since the groundwork will be laid. For instance, Joshua
was telling me and Michael that there is another database that lists
extra information beyond what's in bare-bone debian "Packages" files.
This info is referenced by debian or ubuntu package name.
So Michael, while I was a bit unclear in my original description to you
at LP, I hope this gives you a better idea of what we have before us. If
you need a better explanation from me, I can answer questions. If this
is indeed something you wish to help us with, we would all love to hear
your thoughts.
Thanks, and welcome aboard. :)
-Andrew
- [directory-discuss] Debian/Ubuntu Database import,
Andrew Engelbrecht <=