bug-gne
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnupedia] Re: Classification difficulty and incompletene ss


From: Thomas E. Vaughan
Subject: Re: [Bug-gnupedia] Re: Classification difficulty and incompletene ss
Date: Thu, 18 Jan 2001 15:36:52 -0600
User-agent: Mutt/1.3.12i

On Thu, Jan 18, 2001 at 01:34:16PM -0700, Mike Warren wrote:
>
> "Thomas E. Vaughan" <address@hidden> writes:
> 
> > If each article must have only a single, unified <content> section that
> > only supports a subset of HTML, then are we ruling out the possibility
> > that an article might have several HTML nodes, as in the default output
> > of latex2html?
> 
> No. The back-end representation need bear absolutely no resemblance to
> what is presented to the user. If one wants nodes split up after every
> title, then that can be easily accomplished.
>
> > And how many HTML tags shall we support?
> 
> I would argue for ``none''.

Then it seems we should write our own "latex2exml", where "EXML" refers to
our own encyclopedia-article XML.  I might take a crack at it with Perl
once we have our DTD hammered out.

> All we really need is <em> for emphasis.

Like LaTeX's '\emph{}'.

> Other examples one might consider:
> 
> <reference>Winnie the Pooh</reference>
> 
> which would correspond to a later bibliographic reference of the same
> title and be treated however it makes sense for the particular medium.

So we need to read BibTeX files and understand both '\ref{}' and '\cite{}'.

> When translated to PDF, this might be a [1]-type reference to a later
> bibliography entry.

Might we not generate PDF, where possible, directly via pdflatex?

> If the DTD is designed correctly, it will be possible to embed a
> MathML (or whatever is decided) formula directly into the content
> section.

That seems like a fun project in itself.  Surely someone has started
working on translating TeX math expressions into MathML.

> > In a previous message, I tried to bring up for discussion the issue of
> > just replacing the <content> section with a URL to a page that the
> > author provides.  At least at first, this would give scalability and
> > reduce the central resource requirements.
> 
> Yet it makes the content almost useless, as all the content would be in
> different formats. How could you go about producing a LaTeX translater
> for such a mess? At least with a well-defined DTD, this problem becomes
> possible (if still somewhat difficult).

Well, we could focus at the beginning on handling the metadata properly and
worry about storing content internally later, especially because the math
part is hard.  Obviously, not every URL on the Internet is fit to be an
entry in our encyclopedia.  After we establish a certain policy, like that
an entry's URL must point to a page that has article content either in PDF
or HTML format, and after it has been established that no links from the
article to non-free stuff exist, then the author's content link, along with
the relevant metadata, is added to the encyclopedia.  If, regardless of the
existence of the XML translation stuff, each author will write in the
appropriate source format (like LaTeX), and if the author has the ability
to get stuff up in a Web-appropriate way long before the XML translation
stuff is developed, then we could have a functioning system sooner and
start debugging part of the system.  When the XML translation software
becomes available, then we can start moving stuff to internal storage, if
we really want to.

Has anyone thought about using freenet or some such infrastructure in order
to do content mirroring?  Aside from author hosting and case-by-case
decisions on where to store frequently accessed content, what are the real
options for scalability if this thing takes off in a big way?

-- 
Thomas E. Vaughan <address@hidden>
CIMMS/NSSL, Norman, OK, USA




reply via email to

[Prev in Thread] Current Thread [Next in Thread]