freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freecats-Dev] Freecats & OmegaT (from Keith)


From: Henri Chorand
Subject: [Freecats-Dev] Freecats & OmegaT (from Keith)
Date: Thu, 27 Mar 2003 21:40:12 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Hi again,

I must have been sleeping, waiting in a bear's cavern for the spring to come... here is another message I should have forwarded earlier, this time from Keith, OmegaT's developer.

As the list has been very quiet (a bit too much) recently, I hope this message will revive it a bit.

Henri

-------- Original Message --------
Subject: Re: [Freecats-Dev] OmegaT
From: Keith Godfrey <address@hidden>


Hi Henri,
This started out as a quick response, but the fingers wouldn't stop
typing.... oh well  :-)   Something inspired me to take a rather
critical look at the direction of the project specification and to
provide some information from a software developers perspective (which
is quite different than how a translator sees a CAT tool).  In summary,
I think the project needs to be reigned in a bit and it needs a narrower
focus.  I might be a little frank in certain areas, so consider yourself
forewarned!

Also, Yves and others might not fully agree with me, but you know what
they say, opinions are like a$$holes - everybody's got one   ;-)

Henri Chorand wrote:

> Also, for now, if we examine Free CATS' list of aimed features, the
> ONLY request that we should ask Keith to do in OmegaT is to split its
> client & server parts via an API, if only to allow:
> - a multi-user mode with HTTP access
> - other clients to use the server component.

> <snip>
> If Keith goes along the lines of what we're asking him above, then I'm
> personnally ready to drop my Free CATS project coordinator hat and
> I'll be happy helping at what I'm best.


I'm a little confused here (maybe because I haven't read the specs
yet... forgive me father!).

What exactly qualifies as the OmegaT server?
If you're talking of having FreeCATS as an imbedded component of an
industrial strength word processor, which I do believe is on the wish
list, then we're only talking about a fuzzy matching engine because the
file management will be much better handled by the word processor.  If
you're talking about a stand-alone component capable of being slaved to
a web server, then we're talking about stripping the UI off OmegaT and
expanding the selection of file filters.  Either of these goals is quite
possible to achieve on its own, but a decision has to be made as to
which one to pursue.  To accomplish both of them together will
practically require in advance knowledge about how both are to be
designed and work and will take about as much effort as splitting OmegaT
into two seperate and independent applications and supporting both.  In
general, the more 'flexible' you make software, the more difficult it is
to design, build and maintain, and often the end product is such a
series of compromises (like Windows) that, while doing practically
everything, it does nothing well.

I realize that one of the benifits of a public forum for FreeCATs is to
work out what is possible and what is not, but my feeling is that there
are too many people adding on the wish list, most with very valid
desires, but not enough committed developers to bring the list back down
to earth, so the resultant spec is growing in complexity to something
that can never practically be achieved.  Many elements in existing CAT
tools are not dictated by what developers think is best for the
translator, but what the developer(s) can reasonably accomplish given
available time, tools, hardware and technology.

Take for example the desire to have full movement of segment markers.
 This is a very important item for many translators, but what are they
willing to give up for such a flexibility?  If one wishes to have the
seemingly trivial ability to move segment markers beyond the equivalent
of hard return boundaries, then one restricts themself to operating
entirely within an industrial strength word processor.  (I can go into
details if you wish, but supporting that ability in a CAT tool will
require building and maintaining some very complex and complete file
filters to support the arbitrary formatting changes such would require,
and also the necessary infrastructure to support such filters - a rather
non trivial task.  After doing this, you might as well extend the UI and
you'll end up with a fully functional word processor).

Moving segment markers within structural boundaries is relatively
simple, compared with the previous task, but it also has trade-offs,
primarily in string recall, fuzzy matching ability and performance.
 When a segment marker is moved, a new 'string' is created, and the
entire database much be searched for strings identical or similar to
this one before information can be provided to the translator (there is
no pre-processing which can anticipate such resegmenting).  A small
database can be searched quickly, but the memory and processing power
required to search even a moderate sized database in real time is quite
large.  I'm not a Trados user, but I do recall hearing about serious
performance degradations as the translation memory size grows,
presumably because of the resegmenting ability (or alternatively, just a
poor search design).  OmegaT [currently] has fixed segment markers, but
by doing this it can provide fuzzy matching information from databases
literally of biblical proportions (early design estimates assumed the
translation memory could grow to somewhere near the size of the Bible
and it wouldn't degrade performance, assuming the machine had sufficient
memory.  It might take biblical time frames to have this TM fully
available, but OmegaT does the processing in the background and feeds
database matches to the engine, hence to the user, when they become
available).  Unnecessary overkill?  Oh yes.  But it was a design
trade-off in the interest of simplicity and performance that (1) greatly
assisted in OmegaT actually seeing the light of day and (2) enabled
OmegaT to function under the significant performance penalty of running
under a generic cross-platform architecture.  It was _NOT_ because
that's how I thought that's translators would work best (this underscore
emphasis is not for you but instead for the countless translators that
raze software developers because of design decisions that they don't
agree with while at the same time not understanding the design
intricacies and trade-offs that were involved making them)

If a translator wants to move segment markers and they either (a) don't
care if fuzzy matching information arrives by the time they finish
translating a resegmented phrase, or (b) they only want to use a small
translation memory, then there are great solutions for them.
 Unfortunately, most of those tend to be proprietary, and some of these
solutions are so arrogant that they lock up your data to prevent you
from using it elsewhere (such as Trados).  I may adjust OmegaT to
support modifiable segments in the future (it is on my to-do list) but
my emphasis will remain on performance and in not implementing hack
solutions.  My available time is split between several projects right
now so I offer no timeframes.

Like this issue, there are many other things that involve trade-offs.
 Sometimes certain features will need to be sacrificed in order to have
other features (you can't please all the people all of the time) and
sometimes the difficulty in implementing a desired feature may be more
than the developer resources can bear.  I know you're aware of this, but
I do feel it needs reiteration.

Finally, I think a committed developer (or developers) to actually own
the project needs to be found, maybe even a recent college grad from
Russia or Pakistan looking for experience.  If early releases of
FreeCATS are sufficiently promising, and if it becomes necessary, it
will be easier to raise 'donations' to support someone in one of the
eastern countries to enable them to continue development.  Raising
sufficient $$ to influence a western developer is not realistic - you'll
be bound by their available time and, most importantly, their interest
in working on such a group project.  I'm not saying this donation
concept will be necessary, but one might at least consider it as a
possibility.

>
> We started it because no free software tool has yet come and changed
> the world of CAT like Apache did for HTTP servers.


Apache is a very focused, special purpose tool, and as such, it is able
to do its task very well.  It also started as a very humble, small and
functional web server and was able to be enhanced to its present state
over several years because of it's focused and limited design and a very
large and interested user base.
In order for FreeCATS to succeed, a narrow focus is required with a firm
and finite line drawn limiting what FreeCATs is to accomplish and that
must be stuck to.  Later enhancements can be made.  Attempting to bridge
the gaps between all existing CAT tools will result in a spec for a tool
that will never be realized.  (realize that Apache has been rewritten
from scratch more than once, iirc)

Another point, while OmegaT is a very young and simple product (although
I do believe it past the prototype stage), it does have over 12,000
lines of computer code - not huge, but still a significant size when
considering the reductions in code possible through using the Java
libraries for data structure management, unicode support, string
processing and file I/O.  I only mention this as an example on how large
of a project a stand-alone CAT tool will be when you start to include
all the features on the wish list.  OmegaT has consumed many man-months
of work, and that was from a software engineer who had worked developing
CAT tools before and was able to focus efforts through having some
experience.


> I'll install the latest 1.0.2 release.
> I'll first wait for Charles' or Keith's feedback about JVMs just to be
> sure - installing stuff on Linux is sometimes more complicated than it
> should be and I'm far from being a Linux power user yet.


It's quite easy - just download java runtime environment and unpack it
somewhere you're happy with (typically that is either your home
directory or under /usr/local/).  I use the Sun release (either 1.3 or
1.4 are fine) - while there may be better ones out there, I have had no
problems running either on my 3 year old laptop.
To run OmegaT, specify the full path to 'java' (such as
/usr/local/java/bin/java) in the 'OmegaT' script in your OmegaT
directory, and you should be good to go.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]