freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freecats-Dev] OmegaT (cont., from Marc)


From: Henri Chorand
Subject: [Freecats-Dev] OmegaT (cont., from Marc)
Date: Thu, 27 Mar 2003 21:35:34 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Hi all,

Marc recently sent this message, which was intended for our mailing list. Here it is for all to read. A good food for thought, indeed!


Cheers,

Henri

-------- Original Message --------
From: Marc Prior <address@hidden>

Re splitting OmegaT into modular components:

I have been thinking about a modular TM application for some time. Keith and I had in fact already discussed this briefly. The background to this was as follows:

I have been using Linux and promoting its use among other translators for three years now. The absence of a CAT tool for Linux was a serious deficit, both for my own use, and for my efforts to promote Linux. I had been in contact with a large number of TM application vendors to discuss the possibility of either a native Linux version, or a version which could be used on Linux (e.g. by using a macro language such as OpenOffice.org's). When it became clear that no vendor was interested in supporting a user base of one, I took the plunge and began learning to program.

My first efforts, around the end of 2001, were in ELF. This is the integral macro language of the Applixware Office suite. I began by "cloning" Wordfast's segmentation function. Initial progress was good (the code may have been a disaster, but I got results that worked), as I was able to use keystroke recording in order to produce routines without any programming skills, and Applixware Office contains a huge library of ready-made macros. When proper programming became necessary, though, I ground to a halt. The reason is that there is only one manual for ELF, and it is not written for beginners. In any case, I had to spend more time translating in order to eat.

I made a couple of forays into Java and Star Basic, but gave up on both. Java I found much too difficult.; Star Basic suffered from an almost total lack of documentation.

In the summer of 2002, I discovered tcl/tk. tcl/tk had a number of
advantages. Firstly, everyone seemed to think it was easy to learn. Secondly, there were a number of free tutorials on the web, including manuals suitable for beginners. Thirdly, and crucially, I had discovered several tcl/tk applications, free and open-source, that I could use for my purposes. These were:

RSTool - a text segmenter, actually designed for text parsing for linguistic purposes
tkXMLiVE - an XML editor
DING - a dictionary, essentially a GUI for grep
en-rus - a GUI for a Russian dictionary
glimpse - a text indexing and retrieval utility
tkglim - a tcl/tk UI for glimpse

With the exception of glimpse, all of these applications are tcl/tk, free and open-source. My first encounter with tk/tcl was when I discovered that with a little modification, DING could be used to access Wordfast translation memories. A developer may well say "so what", but you must remember that at this point, I couldn't program.

That's how I became hooked on tcl/tk. I had the fantastically naive idea that I could learn it sufficiently well to glue these applications together, and so produce a translation memory without having to do much programming. The user interface would be tkXMLiVE (there was the minor drawback there that I would have to brush up my Russian, as all the documentation was in that language...). This I would modify to produce a dual-window interface.
RSTool would segment the text, and the user would simply overwrite the
segmented text in one of the two windows. I would add a routine to extract the segment pairs from the two windows and convert them to Wordfast TM format, and the resulting memories could then be accessed with either DING or glimpse+tkglim.

This was, of course, utopian. It may well be that the task could be
accomplished with little effort by an experienced tcl developer, but if one has no experience, the effort needed to analyse someone else's code is just as great, if not greater, than learning to do it. I learnt quite a bit of tcl/tk in the process but made little progress with an application. I also had to do some more translation, in order to eat again.

Then I discovered some of the features of OpenOffice.org, in particular the Sections function. By this time, I had re-discovered the original incarnation of OmegaT (this was before my contact with Keith) and was trying to get it to import OOo files, so I was also familiarizing myself with OOo's XML structure. The solution for a very basic TM was there for the taking. In tcl/tk, I wrote a routine for detecting OOo paragraph boundaries and inserting interleaved OOo sections between them. The text in each paragraph was then copied into the section. Result: when OOo was opened again, each paragraph was there twice; thanks to OOo's sections function, the first paragraph was protected and could be hidden if desired. The translator simply overwrote the second paragraph. Revision was easy, as the paragraphs were interleaved; reading through the final text was also easy as the source text could be hidden. Another tcl routine stripped the source segments from file to produce the final version.

The translation memory was the "uncleaned" file, and since source and target were kept together, kfind or glimpse were suitable ways of accessing these files in the file system.

Once again, I ran out of time, but this time I did manage to produce a
prototype. It doesn't work properly, but the code is fully annotated and
should be comprehensible to anyone who knows tcl/tk, and anyone who is
interested is welcome to it.

The development of OmegaT has made this application largely superfluous for me personally, but I still see great benefits in the basic concept. Separating the user interface from the search/indexing engine means that new user groups can be supported without the whole code having to be re-written.
So this aspect of Free CATS is one which I am very hopeful about. In
particular, providing an in-line TM application for OpenOffice.org would meet a demand which already exists, as some translators would like such a product. Most are using OOo on Windows, but it would provide a good introduction into open-source software in general, in the way that OmegaT is already doing.

Having said all that, it has to be technically possible. As I have already discovered, just because something is a good idea, doesn't mean that it will work. In the meantime, OmegaT does work, and in my opinion very well. So that, at the moment, is where I'm directing my efforts. I'll say more about that in another message.

Marc





reply via email to

[Prev in Thread] Current Thread [Next in Thread]