[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Freecats-Dev] OmegaT (cont., from Marc)
From: |
Henri Chorand |
Subject: |
[Freecats-Dev] OmegaT (cont., from Marc) |
Date: |
Thu, 27 Mar 2003 21:35:34 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003 |
Hi all,
Marc recently sent this message, which was intended for our mailing
list. Here it is for all to read. A good food for thought, indeed!
Cheers,
Henri
-------- Original Message --------
From: Marc Prior <address@hidden>
Re splitting OmegaT into modular components:
I have been thinking about a modular TM application for some time. Keith
and I had in fact already discussed this briefly. The background to this
was as follows:
I have been using Linux and promoting its use among other translators
for three years now. The absence of a CAT tool for Linux was a serious
deficit, both for my own use, and for my efforts to promote Linux. I had
been in contact with a large number of TM application vendors to discuss
the possibility of either a native Linux version, or a version which
could be used on Linux (e.g. by using a macro language such as
OpenOffice.org's). When it became clear that no vendor was interested in
supporting a user base of one, I took the plunge and began learning to
program.
My first efforts, around the end of 2001, were in ELF. This is the
integral macro language of the Applixware Office suite. I began by
"cloning" Wordfast's segmentation function. Initial progress was good
(the code may have been a disaster, but I got results that worked), as I
was able to use keystroke recording in order to produce routines without
any programming skills, and Applixware Office contains a huge library of
ready-made macros.
When proper programming became necessary, though, I ground to a halt.
The reason is that there is only one manual for ELF, and it is not
written for beginners. In any case, I had to spend more time translating
in order to eat.
I made a couple of forays into Java and Star Basic, but gave up on both.
Java I found much too difficult.; Star Basic suffered from an almost
total lack of documentation.
In the summer of 2002, I discovered tcl/tk. tcl/tk had a number of
advantages. Firstly, everyone seemed to think it was easy to learn.
Secondly, there were a number of free tutorials on the web, including
manuals suitable for beginners. Thirdly, and crucially, I had discovered
several tcl/tk applications, free and open-source, that I could use for
my purposes. These were:
RSTool - a text segmenter, actually designed for text parsing for
linguistic purposes
tkXMLiVE - an XML editor
DING - a dictionary, essentially a GUI for grep
en-rus - a GUI for a Russian dictionary
glimpse - a text indexing and retrieval utility
tkglim - a tcl/tk UI for glimpse
With the exception of glimpse, all of these applications are tcl/tk,
free and open-source. My first encounter with tk/tcl was when I
discovered that with a little modification, DING could be used to access
Wordfast translation memories. A developer may well say "so what", but
you must remember that at this point, I couldn't program.
That's how I became hooked on tcl/tk. I had the fantastically naive idea
that I could learn it sufficiently well to glue these applications
together, and so produce a translation memory without having to do much
programming. The user interface would be tkXMLiVE (there was the minor
drawback there that I would have to brush up my Russian, as all the
documentation was in that language...). This I would modify to produce a
dual-window interface.
RSTool would segment the text, and the user would simply overwrite the
segmented text in one of the two windows. I would add a routine to
extract the segment pairs from the two windows and convert them to
Wordfast TM format, and the resulting memories could then be accessed
with either DING or glimpse+tkglim.
This was, of course, utopian. It may well be that the task could be
accomplished with little effort by an experienced tcl developer, but if
one has no experience, the effort needed to analyse someone else's code
is just as great, if not greater, than learning to do it. I learnt quite
a bit of tcl/tk in the process but made little progress with an
application. I also had to do some more translation, in order to eat again.
Then I discovered some of the features of OpenOffice.org, in particular
the Sections function. By this time, I had re-discovered the original
incarnation of OmegaT (this was before my contact with Keith) and was
trying to get it to import OOo files, so I was also familiarizing myself
with OOo's XML structure. The solution for a very basic TM was there for
the taking. In tcl/tk, I wrote a routine for detecting OOo paragraph
boundaries and inserting interleaved OOo sections between them. The text
in each paragraph was then copied into the section. Result: when OOo was
opened again, each paragraph was there twice; thanks to OOo's sections
function, the first paragraph was protected and could be hidden if
desired. The translator simply overwrote the second paragraph. Revision
was easy, as the paragraphs were interleaved; reading through the final
text was also easy as the source text could be hidden. Another tcl
routine stripped the source segments from file to produce the final version.
The translation memory was the "uncleaned" file, and since source and
target were kept together, kfind or glimpse were suitable ways of
accessing these files in the file system.
Once again, I ran out of time, but this time I did manage to produce a
prototype. It doesn't work properly, but the code is fully annotated and
should be comprehensible to anyone who knows tcl/tk, and anyone who is
interested is welcome to it.
The development of OmegaT has made this application largely superfluous
for me personally, but I still see great benefits in the basic concept.
Separating the user interface from the search/indexing engine means that
new user groups can be supported without the whole code having to be
re-written.
So this aspect of Free CATS is one which I am very hopeful about. In
particular, providing an in-line TM application for OpenOffice.org would
meet a demand which already exists, as some translators would like such
a product.
Most are using OOo on Windows, but it would provide a good introduction
into open-source software in general, in the way that OmegaT is already
doing.
Having said all that, it has to be technically possible. As I have
already discovered, just because something is a good idea, doesn't mean
that it will work. In the meantime, OmegaT does work, and in my opinion
very well. So that, at the moment, is where I'm directing my efforts.
I'll say more about that in another message.
Marc
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Freecats-Dev] OmegaT (cont., from Marc),
Henri Chorand <=