Re: [hfdb] Re: Grand Unified Hardware Database

hfdb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hfdb] Re: Grand Unified Hardware Database

From:	Zenaan Harkness
Subject:	Re: [hfdb] Re: Grand Unified Hardware Database
Date:	Fri, 23 Jul 2004 14:10:35 +1000

On Fri, 2004-07-23 at 07:36, Richard Stallman wrote:
>     1.  Someone has to type all that XML markup, and verify it, and make it
>     consistent with the other XML.  
> 
> That would be a major issue if we had the amount of data that you were
> envisioning.  I suppose Till could give practical advice about methods
> for dealing with the issue.  With the amount of data that we are
> talking about having in the early stages, it should not be an issue.
> 
> It might be easy to write something to read the XML files into an
> RDBMS to check them.  If so, I'm not against it.  But if it would take
> more than a day of work, I'm against doing it NOW.

There is only one missing step in my mind with the RDBMS "storage"
option as proposed by originally myself and then also James - the actual
data entry. James, feel free to address that question in either my
previous or this email, as you prefer.

There are three things we need to aim for:

* Simplicity of data entry, such that we can start now.

* Scriptability - it is important to me personally that we are not
reliant on "fragile" high level "gui" tools for data entry, and
therefore that we can add data from the command line. This implies that
putting the data under version control (cvs, arch/tla) is easy to do.
This also implies that making copies of the repository/ data storage is
also easy to do (my point in a different email about making my own local
copy of the database).

* Scalability. This I also mentioned in my previous email, but it's good
to set it as a definite goal - we must be able to grow our team, and not
have any one of us as a single point of "data entry" and therefore
"failure" of the hfdb process from an "impatient manufacturer wanting to
submit data" point of view.

Scriptability and scalability combine to make it easy for random people
to run their own copies of the database. This is a desireable feature.

>     A vendor makes many devices, and a driver works in many OSes.  How will
>     XML ensure that the vendor and the OS are referred to consistently?  More
>     important, why should they?
> 
> I don't follow--why should what?  Anyway, there will be just a few
> people entering data.  It won't be hard for them to ensure consistency.
> This gets to be a problem when you have a lot of people, but the problem
> you get in that case is even worse.

The problem is solved in the case of having an RDBMS validating data as
it goes in. The reason is, you choose from the availably "types of
device" or "list of current vendors" for example, and so there is no
issue of different spellings in different XML/etc files, since there is
only one entry for one particular vendor name, one entry for one
particular "device type", etc.

With even a single person editing 100s and 1000s of files, it is all too
easy to have multiple "pieces of data" referring to the same entity,
simply due to errors when typing.

But as you mentioned elsewhere, if we imported XML files into a
(normalized) RDBMS, that would catch duplicates and other problems.

So the real question as I see it, is how do we enter data into this
RDBMS - it would be nice to be able to write up files and import them
somehow with a simple script. This is approaching the ideal interface I
think (satisfying the above three features).

>       In a relational model, the vendors are in one
>     table and the OSes in another.  Each has an ID.  The Driver table need
>     merely refer to the appropriate IDs to capture the referent completely and
>     accurately.  
> 
> I think that the XML that Till uses does more or less the same thing.

XML files would have separate fields and sub fields for each piece of
data, yes.

However, the comparison with a _relational_ db here is that for a given
device type, the device "table" would have all these columns, containing
the actual data fields (eg. manufacturer).

This means each device entry has its own copy of the manufacturer name
for example. Which means we have redundant data.

Removing the redundancy - creating a "manufacturers" table, and then for
each device just have the "manufacturer id" instead of the whole
manufacturer name, is called normalization.

Which means that a particular manufacturer name only has to be gotten
right, once. And if it's spelt wrong, it can be fixed, and it is
immediately fixed for all devices by that manufacturer.

This principle applies to many data fields.

XML schema validation is one way to do this for XML files. Importing XML
files into a suitably normalized RDBMS is another. Errors obviously have
to be fixed in each case, and the problematic files re-imported.

So far, we've been speaking of "thumbing the data into the DB" - which
means running a command line program (a special shell) that connects to
the DB, and then typing in suitable SQL "INSERT" statements, to get the
data into the database.

For example,

INSERT INTO modems VALUES ("Connexant", "Supermodem", 56, "serial");

or some such.

Except with ID's/ normalization, I imagine it will be something more
like:

INSERT INTO modems VALUES (7, "Supermodem", 56, 2);

where the numbers are the manufacturer id and interface type id,
respectively.

Which will kind of push us to either print out sheets containing these
lookup codes, and/ or implement web forms for submission of various
device types (different types, having different features, need different
forms).

James, please correct me here as needed.

>     3.  Once we've got all those XML files in our CVS repository, how are we
>     to discover how many of each device type we have?

not needed

>   Which devices by a given vendor are supported?

possibly useful

>   How many printers can be used with a given OS?  
> 
> This isn't really the problem.

I agree that is not a necessary report.

However, we will want to report on which drivers support a particular
chipset (so I can find a driver for my hardware), and obviously "is this
particular device supported".

I pretty sure that even if we go the XML route, we will be entering the
data into an SQL database as well, for reports (and possibly data
validation).

>     Even if I'm wrong, I'm right.  If the XML is any good, we can parse and
>     load it into the RDBMS anyway.  
> 
> By all means, please do.  I am trying to simplify what we need to do
> to get up and running.  I have no objection to reading the XML into
> an RDBMS and doing useful things with it then.  I want to exclude that
> from the problem we are tackling now.

[Prev in Thread]

Current Thread

[Next in Thread]

[hfdb] Re: Grand Unified Hardware Database, James K. Lowden, 2004/07/17
- [hfdb] Re: Grand Unified Hardware Database, Richard Stallman, 2004/07/18
  - [hfdb] Re: Grand Unified Hardware Database, James K. Lowden, 2004/07/18
    - [hfdb] Re: Grand Unified Hardware Database, Richard Stallman, 2004/07/20
    - [hfdb] Re: Grand Unified Hardware Database, James K. Lowden, 2004/07/21
    - [hfdb] Re: Grand Unified Hardware Database, Richard Stallman, 2004/07/22
    - Re: [hfdb] Re: Grand Unified Hardware Database, Zenaan Harkness <=
    - [hfdb] Re: Grand Unified Hardware Database, Richard Stallman, 2004/07/22
    - Re: [hfdb] Re: Grand Unified Hardware Database, Zenaan Harkness, 2004/07/22
    - Re: [hfdb] Re: Grand Unified Hardware Database, Richard Stallman, 2004/07/23
    - Re: [hfdb] Re: Grand Unified Hardware Database, Zenaan Harkness, 2004/07/24
    - Re: [hfdb] Re: Grand Unified Hardware Database, James K. Lowden, 2004/07/25
    - Re: [hfdb] Re: Grand Unified Hardware Database, Zenaan Harkness, 2004/07/25
    - Re: [hfdb] Re: Grand Unified Hardware Database, James K. Lowden, 2004/07/26
    - Re: [hfdb] Re: Grand Unified Hardware Database, Zenaan Harkness, 2004/07/26
    - Re: [hfdb] Re: Grand Unified Hardware Database, James K. Lowden, 2004/07/26
    - Re: [hfdb] Re: Grand Unified Hardware Database, Zenaan Harkness, 2004/07/26

Prev by Date: Re: [hfdb] Re: Grand Unified Hardware Database
Next by Date: Re: [hfdb] Scope (Was Re: Grand Unified Hardware Database)
Previous by thread: [hfdb] Re: Grand Unified Hardware Database
Next by thread: [hfdb] Re: Grand Unified Hardware Database
Index(es):
- Date
- Thread