hfdb
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hfdb] Scope (Was Re: Grand Unified Hardware Database)


From: James K. Lowden
Subject: Re: [hfdb] Scope (Was Re: Grand Unified Hardware Database)
Date: Sun, 25 Jul 2004 19:03:46 -0400

On Sun, 25 Jul 2004 09:01:06 +1000, Zenaan Harkness <address@hidden>
wrote:
> On Sat, 2004-07-24 at 03:12, David Zeuthen wrote:
> > 
> > My hunch feeling is that we need to consider each device type
> > (printer, scanner, storage, camera, input, usb hub) in detail to look
> > at what information is needed to store. 
> 
> Agreed.

I think that's going to make the most sense, and I agree it's a hunch. 
How many kinds of devices need their own tables (as opposed to "Devices")
will depend on how different the devices' metadata are.  

> > Second, I think it's good to partition this data into three
> > categories:
> > 
> >  1. Data we extract from the hardware itself in order to uniquely
> >     identify what we're dealing with; and
> > 
> >  2. Data that is useful for software using the hardware but cannot be
> >     extracted from the hardware proper; and
> > 
> >  3. Metadata about the hardware, e.g. textual content consumed only by
> >     human beings for the purpose of giving more information about the
> >     keyboard
> 
> Useful for discussion purposes yes. The data will have a natural
> physical representation as we enter it (obviously we will want to
> clearly link related data).

What I lack atm is detailed knowledge of what we need to store.  As Zen
says, there's no particular reason to reflect these categories in the data
model, but I agree these are useful distinctions that can drive the model
and the discussion.  

One caveat: Comment fields are notoriously abuseable.  It's always
tempting to stuff data into a comment e.g., "has 2 USB 2.0 ports".  The
problem with that sort of thing is that it's not (readily) searchable.  It
will be important that people loading the database understand that
comments (if we provide for them) are "metadata of last resort".  

> > So, looking at my keyboard, I'd say:
> > 
> > Data that can be extracted:
> > ===========================
> > The USB product id is 0x058f, the USB product id is 0x9410 and the USB
> > device revision is 0x0101. This uniquely identify my device. 

Hold on.  What's the difference between a USB product id and a USB product
id?  

> > (In some cases, like printers, we need a bit more, like the IEEE1284
> > ID string, since vendors erroneously reuse the supposed to be unique
> > keys I gave above.)

In which case, it doesn't uniquely define the device.  I guess the PK for
this table would be 

        UsbProductID
        UsbOtherProductID ;-)
        UsbDeviceRevision
        Ieee1284ID

Since we won't always have/need the IEEE1284 ID, it can be nullable, or,
if the purists among us insist not, then the column can default to -1 or
something.  


> > Data that is useful for software:
> > =================================
> > 
> > My laptop got a nonstandard key on the keyboard is a button with a
> > 'power off' symbol and the keycode is <some_kay_code_value> (I haven't
> > looked it up yet). So, we might be interested in storing this and one
> > day export it to HAL in this fashion

This is a perfect example of a DeviceFeature and a DriverFeature, and it
raises an interesting point.   The intersection of these features has to
be described:

1. Device features ignored by the driver.
2. Device features supported by the driver.
3. Device features required by the driver.
4. Device features not required by the driver.

Your nonstandard key is probably supported by the standard keyboard
driver, but the poweroff button is inoperable (1).  Perhaps there is or
will be a driver that supports it (2).  That driver might not work with
keyboards lacking it (3).  But, perhaps that support will find its way
into the standard driver, which will continue to support keyboards without
it (4).  

(#3 and #4 are really subtypes of #2.)

> > > * It's got 16 special-function keys across the top, above the Fn
> > > keys:
> > >  - back
> > >  - forward
> > >  - stop
> > >  - refresh
> > >  - search
> > >  - favourites
> > >  - web/home
> > >  - mail
> > >  - mute
> > >  - volume -
> > >  - volume +
> > >  - play/pause
> > >  - stop
> > >  - prev track
> > >  - next track
> > >  - media
> > > * It's got 3 special function keys above the keypad:
> > >  - my computer
> > >  - calculator
> > >  - sleep
> > 
> > We should store these keycodes as category 2 data.

Here is where RMS's concern with the KISS principle comes into play. 
These may be useful.  We can design a database to hold them.  The operable
question is: who's going to input the data, and who's going to use it? 
The degree of complexity and detail in the database will be constrained
not by our imagination or analytical skills, but by the work and the
workers.  

> http://www.schemamania.org/projects/chd/chd.sql
> http://www.schemamania.org/projects/chd/chd.dia
and
  http://www.schemamania.org/projects/chd/chd.pdf ;-)

> Of course, with all the discussions of XML with RMS, we might have to
> look into XML import filter or something. I hope to do enough reading on
> it this week.
> 
> Which actually leads me to think, perhaps Richard didn't realise that we
> already have a pretty good start on an SQL database... hmmm.

I think RMS's real concern is with complexity and resources, meaning,
respectively, the work and us.  I'm still of the opinion that he's
oversimplifying the work, but we agree that what gets done is strictly a
function of who's willing to do what.  Absent rows, the model is moot.  

Likewise, he who's doing the work gets to decide what form it takes. 
Unless someone actually working on the data wants to use XML, then it's
not going to happen.  AFAIK, no actual worker has actually objected to an
RDBMS, so as far as I'm concerned, that's the only technology we need to
address.  

> > We have a table for each physical bus type that contains the
> > properties we can extract from the hardware. 
> > 
> >  USBDevice
> >  IEEE1394Device
...
> James anticipated this in the schema he put together, and created a
> DeviceConnectors table

Right.  A device isn't a bus or a connector.  It *has* those things.  Else
how would one store a printer with a parallel and USB port?  

> I note in the current schema, there is a table "MechanismTypes" with a
> comment saying "laser, inkjet, dot matrix". I'm thinking it would make
> sense just to call this table "Printer types" or similar. James?

Well, OK.  I guess that's the vernacular.  I think mechanism type is more
accurate, and lets one imagine cataloging PrinterMechanisms (in that there
are many more printers than printer engines).  

> > with category 2 data. Each table got FOREIGN keys into the tables from
> > a. The PRIMARY key is a combination of said FOREIGN keys and other
> > useful identifying data (such as IEEE1284 ID for printers).
> 
> Perhaps this is one of the reasons that IDs instead of (primary fields)
> are used as primary key - it minimizes the duplication of data.

At some point, all devices are the same.  When you want to put a Device's
key as a foreign key in another table -- and that table doesn't care what
kind of a device it is -- then you need some uniform way to key the
devices.  The natural key for different kinds of devices differs; it makes
no sense, obviously, to try to use USB IDs for PCI cards.  So, every
device gets an ID assigned to it (by the database, btw).  This ID is
unique across devices: if you have a DeviceID, you can find its type
because there's no overlap between IDs assigned, say, to printers and
keyboards.  Neither can any inference about the device be made by
examining its ID, because it's completely arbitrary, just a number.  

The natural key's uniqueness ("catagory 1 data", sometimes) should also be
enforced where applicable.  

> > Many values here can be NULL; for Keyboard we probably want an
> > elements for each keycode for multimedia keys but for my example above
> > only one of them will contain useful data.
> 
> I think that proper normalization gets rid of most NULL values. You
> simply have these "types" tables for various devices. 

Normalization gurus like Chris Date are of one mind on NULL: correct
logical design eliminates them, but feasible physical designs frequently
require them.  

A nullable column is just an optional relationship embedded in the table. 
To eliminate it, one need only place the column in a separate table having
the same key, and populate that table with only the non-null values. 
Extant technology won't make you happy when you do that, however, so we
use nulls where they "make sense" i.e., where they could obviously be
removed to another table, but aren't, for reasons of convenience.  

> > I'm not a big database design myself, it looks like James is, so any
> > comments? How should we go about specifying the database schema?

In SQL, of course. ;-)  I like your ideas.  I'll incorporate them into the
model just as soon as someone trying to load something needs them.  As I
told Zen on Day 1, the effort of loading the data exposes the semantics of
the schema.  In the beginning, we'll learn something from almost every
row.  

> If we are to enter data in XML files, we need to just do so, submitting
> the (XML tagged) data to this list, and then we discuss what fields go
> where, and start to create some types lists, that we can use for fields,
> as well as XML tag names. It should be pretty straightforward I imagine.

I'm on record that XML is harder to generate than flat ASCII files, and
that the data model has to be vetted irrespective of whether it's in XML
or in an RDBMS.  It's easy until it gets hard, but in my experience it
gets hard pretty quickly.  

> I feel that my main limitation is my lack of XML experience
> right now, but I can see the benefits of simple file-based data entry.
> I'll go do some reading this week.

I really do think the XML/RDBMS debate is complexity in technology
clothing.  I would like to see us discussing data -- which will define the
complexity -- and let that speak for itself.  

--jkl




reply via email to

[Prev in Thread] Current Thread [Next in Thread]