bug-ncurses
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH]Converting HTML to Docbook SGML/XML


From: Pradeep Padala
Subject: [PATCH]Converting HTML to Docbook SGML/XML
Date: Mon, 1 Jul 2002 19:51:07 -0400 (EDT)

Hi,
   I am providing a patch which will convert HTML to docbook sgml or xml.
I developed this as a part of lampadas project
(http://www.lupercalia.net/scoop/story/2002/3/12/131530/117). It's
a project aimed at unifying lot of documentation formats and for ease of
LDP management.
   This can be used by authors who write for LDP, since not
everybody is familiar with docbook.
   I started this while trying to convert ncurses html documentation to
docbook. I am also the author of Ncurses-Programming-Howto
   The following page shows a demo with original html and sgml output by
tidy with the patch applied.
   http://www.cise.ufl.edu/~ppadala/tidy

   If you want to use this feature, please follow the instructions below

Instructions to patch and Usage
-------------------------------
The diff is taken from the latest source (26th June, 2002) from
tidy.sourceforge.net

*) Download the latest source from
   http://tidy.sourceforge.net/src/tidy_src.tgz
*) Uncompress it
   tar zxvf tidy_src.tgz
*) Move to tidy directory and apply the patch
   cd tidy
   patch -p0 < dbpatch

The options -dbsgml -dbxml let you access the feature
To output docbook sgml from html use
   ./tidy -dbsgml <html file>

For docbook xml
   ./tidy -dbxml <html file>

The docbook public identifiers are hardcoded for now. In future, I will
add options.

The output is far from perfect, but it does most of the dirty work. I will
prepare a web page explaining the transformations. It will take some time
:-) For the impatient, look in the source.

To Tidy developers
------------------
   Regarding the patch, I tried to make as little intrusion as possible
into the parsing code. I had to make DescendantOf() non-static as it is
needed to figure out some heirarchy. Most of the code is in pprint.c. If
you want me to add another 'P' to the function, let me know. Presently
the functions are named like PrintSgml not PPrintSgml.

   I think somebody is working on bringing a tidy library which can be
used by other applications. That would be great and make it easy to write
applications like convert_html_to_dbsgml. I can help in making the
library. I spent enough time pouring into tidy functions :-) What about
perl/xs? If nobody is working on it, I would like to work on that too.
Please let me know about the status of these things.

Thank You,
Pradeep Padala

P.S. patch attached

-- 
Perfection is our goal, excellence will be tolerated. -- J. Yahl
--

Attachment: dbpatch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]