|
From: | Larry Kollar |
Subject: | Re: [Groff] Escaping punctuation marks in troff |
Date: | Wed, 29 Oct 2003 22:04:54 -0500 |
2. secondly, I want to parse a 'html' file and convert it to troff format.so, i cannot change the characters.Surely it could be filtered through sed on the way to troff...?
If it's plain HTML (not a lot of <span> or <font> tags), the best way to convert HTML to *roff is to first use HTML Tidy to make sure the file is well-formed XML, then use an XSLT script to convert to *roff. Use sed to clean up blank lines, change character entities (stuff like "&lquot;" to *roff special characters, and so forth. -- Larry Kollar k o l l a r @ a l l t e l . n e t Unix Text Processing: "UTP Revival" http://home.alltel.net/kollar/utp/ (note new URL)
[Prev in Thread] | Current Thread | [Next in Thread] |