groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Groff] conversion to DOC format


From: Ted Harding
Subject: RE: [Groff] conversion to DOC format
Date: Wed, 04 Aug 2004 09:52:59 +0100 (BST)

On 03-Aug-04 Dean Allen Provins wrote:
> Fellow groff folk:
> 
> I am pleased to say that I managed to graduate, having written and
> successfully defended my thesis which I wrote using groff and friends.
> I should add that I am indebted to many on this list for their
> assistance.
> Without it, the document would not have been such a masterpiece (it
> looked good, even read acceptably well - but that isn't groff).

Dear Dr Provins, Congratulations!

> [...] 
> Many firms or government agencies require cover letters and resumes
> in Microsoft Word (i.e. .DOC) format.  Alas groff doesn't generate
> this format but of course, I prefer to use it (i.e. groff). Have any
> readers of this list faced this problem and found a way to convert?
> I am aware that OpenOffice has a DOC output capability, and it has
> occurred to me that I might be able to use that code in a groff post
> processor, but it makes sense to check with you first. If you have
> any suggestions, I would be pleased to hear from you.

Yes, I have faced this problem! Not easy. I have adopted two approaches.
(Also below I suggest looking at another option, closer to what you
seem to be looking for).

1. Exemplified by the 3-volume book "The Universal History of Numbers"
by Georges Ifrah, which I co-translated with David Bellos of Princeton
from "L'Histoire Universelle des Chiffres", being responsible for almost
all of vol 3 and parts of vols 1 and 3 (translation published 1998-2000
by The Harvill Press).

DB needed the material in Word .doc format (Apple user!); I did my bits
in groff. Printed groff output was useful for sending to the publisher
when it was necessary to show precisely how some parts of the text needed
to be laid out. The conversion to .doc was primitive and tedious:
working with the groff source (ms macros) a 'sed' script removed single
"\n" at the ends of input lines, and wrapped paragraph macros within
"\r\n". Thus each paragraph became one long line, as required by Word.

Formatting macros, requests and in-line escape sequences were preserved.
The result was then imported into Word as a text file, and these things
were searched for and the corresponding blocks of text were formatted
(type of paragraph, fonts, styles, ... ) within Word by hand, using
Word's standard GUI resources.

Tables: The groff 'tbl' input was stripped of its "header" information,
leaving only the tabular entries with "#" as separator. This was then
converted into a Word table by "highlighting" it and using the Word
text->table utility with "#" as separator; then any detailed formatting
required within the table was again done by hand.

Footnotes: Search for ".FS", then open a "footnote" in Word and
cut&paste the footnotes text. Close that, then delete the original.

And so on. As I say, primitive and tedious. I kept this up for several
hundred pages, as an act of defiance on behalf of groff and of
resistance to domination by software for which I have very little
respect! Potentially symptomatic of mental abnormality, for which
there might have been some basis had it not worked. But it did work.

2. Exemplified by a book of which I no longer have the details,
where my task was simply the production of formatted copy. This was
a geographical, historical and cultural bibliography for Turkey, and
in particular therefore needed Turkish characters (as well as those
for other European languages since articles from Danish, German etc.
were cited).

In this case it was possible to work with WordPerfect (WP-5.1, a
program whose passing I regret!) as intermediate format.

Since I fortunately possessed a copy of the technical specification
for a WordPerfect file, I was able to write a set of much more
sophisticated 'sed' scripts which converted much of the 'groff'
markup into WP's "hidden codes", including automatic conversion
of special characters into WP's 2-byte codes. Anything left over
could be picked clean by hand in the WP document, much more easily
than in the case of the Ifrah book.

Indexing was interesting, since in this case it was required that
non-Turkish words should be collated "English-fashion" where, for
instance, the Danish "å" is equivalent to English "a", but Turkish
word should be collated in Turkish alphabetical order (where, for
instance, Turkish "ç" comes after "c", and dotless-i precedes the
"i" with a dot, etc.). This was solved with makeindex, a TeX program
which will also work with groff/troff, since this allows you to
specify the indexing (sort) key and the index entry separately.

Once a WP document has been created, this can be imported into Word.
Most (but not necessarily all) of the WP formatting is preserved,
and what gets lost or garbled can be mended by hand in Word without
too much trouble.

3. There is a program 'troffcvt':
  "troffcvt is a translator that turns troff input into a form that
   can be more easily processed. The troffcvt distribution comes with
   postprocessors that turn troffcvt into various destination formats
   such as HTML (Hypertext Markup Language), RTF (Rich Text Format)
   or plain text."

See (e.g.)

  http://www.snake.net/software/troffcvt/

It is several years since I last looked at this program, and I did not
think it was very good then. However, the version referred to in the
above URL seems to date from 2001, quite a bit later, and may well
be much better. In particular the table formatting seems to be good.

However, without trying it I cannot check whether one of the problems
(interpreting custom macros) is now improved. I must have another look!

If you can get it to work OK, then the RTF format should be importable
directly into Word. For the sort of document you are considering here
(letters and resumés), it may well be quite adequate.

4. Your post-processor suggestion seems to be addressed by 'troffcvt'.
However, as an aside, I was led to ponder what name should be given
to a post-processor which produced Word format (if possible).

It's a pity that 'grotty' is already pre-empted. However, I could
live with 'growl'.

Best wishes,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <address@hidden>
Fax-to-email: +44 (0)870 167 1972
Date: 04-Aug-04                                       Time: 09:52:59
------------------------------ XFMail ------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]