freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freecats-Dev] OmegaT


From: Henri Chorand
Subject: Re: [Freecats-Dev] OmegaT
Date: Thu, 27 Mar 2003 23:09:31 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Hi,

Here are a few thoughts brought by Keith's last message:

Something inspired me to take a rather critical look at the
> direction of the project specification and to provide some
> information from a software developers perspective (which
is quite different than how a translator sees a CAT tool).

Sure. We definitely need this sort of feedback.

> In summary, I think the project needs to be reigned in a bit
> and it needs a narrower focus.  I might be a little frank in
> certain areas, so consider yourself forewarned!

Yep. As I already said, captain, there is no software architect yet on this ship, so the seat is vacant.

Also, Yves and others might not fully agree with me, but you
> know what they say, opinions are like a$$holes - everybody's
> got one   ;-)

Well, now that I posted your message (at last), everybody has a chance to say something :-)

Also, for now, if we examine Free CATS' list of aimed features,
>> the ONLY request that we should ask Keith to do in OmegaT is to
>> split its client & server parts via an API, if only to allow:
- a multi-user mode with HTTP access
- other clients to use the server component.

<snip>
If Keith goes along the lines of what we're asking him above,
>> then I'm personnally ready to drop my Free CATS project coordinator
>> hat and I'll be happy helping at what I'm best.

I'm a little confused here (maybe because I haven't read the specs yet... forgive me father!).

What exactly qualifies as the OmegaT server? If you're talking of having FreeCATS as an imbedded component of an industrial strength
> word processor, which I do believe is on the wish list, then we're
> only talking about a fuzzy matching engine because the file
> management will be much better handled by the word processor.

We haven't exactly decided yet if we were to prefer a word-processor embedded solution or a standalone editor-based one.

We translators well know the pros and cons of these, which is why we would, ideally, want both ;-))

Once we agree on a client and a server parts, we may have several clients using the same TM server. Also, some of these clients may even be developed by external teams for whatever dedicated uses.

Another reason is that we deeply need a multi-user TM server (for industrial-strength CAT projects), and we thought HTTP access would be so nice. Our company is small, yet we often work in multi-user mode (up to 6 translators on the same project). We would be very happy to see a TM server that accepts between 10 & 20 simultaneous users on a dedicated server with a standard 512/128 DSL line. If such a free tool ever comes up, I bet it's going to be a major success.

For instance, if we start from OmegaT and go in this direction, we may:
- agree on an API that fits your code as well as other tools (Yves will certainly suggest interesting things) - draw a line within your code modules (classes) so as to decide which is needed by a translation client and/or a TM server.


If you're talking about a stand-alone component capable of being
> slaved to a web server, then we're talking about stripping the UI
> off OmegaT and expanding the selection of file filters.

As we believe the files to be translated are to reside on the translation client's local filesystem, the file filters will be managed at the client level.

Either of these goals is quite possible to achieve on its own, but
> a decision has to be made as to which one to pursue.

Sure. We chose the server, because we believe it's important to create a "proof-of-concept" and to test the technology in the wild.

You may have read that Yves Champollion promised to make a future version of WordFast compatible with our server. We could later undertake the development of a standalone translation client and/or of an Open Office plug-in.

To accomplish both of them together will practically require in advance knowledge about how both are to be designed and work and
> will take about as much effort as splitting OmegaT into two seperate
> and independent applications and supporting both.
In general, the more 'flexible' you make software, the more difficult
> it is to design, build and maintain, and often the end product is
> such a series of compromises (like Windows) that, while doing
> practically everything, it does nothing well.

This is a sound warning.

I realize that one of the benifits of a public forum for FreeCATs
> is to work out what is possible and what is not, but my feeling
> is that there are too many people adding on the wish list, most
> with very valid desires, but not enough committed developers to
> bring the list back down to earth, so the resultant spec is growing
> in complexity to something that can never practically be achieved.

Apart from HTTP access, everything that was included in our specification documents has been implemented in one or more proprietary CAT tools.

> Many elements in existing CAT tools are not dictated by what
> developers think is best for the translator, but what the developers
> can reasonably accomplish given available time, tools, hardware and
> technology.

True, which is why, if Free CATS is to succeed, it will begin with something small that will be strong enough to evolve later, once it will have attracted a lot of users.

Take for example the desire to have full movement of segment markers. This is a very important item for many translators, but what are they willing to give up for such a flexibility? If one wishes to have the seemingly trivial ability to move segment markers beyond the equivalent of hard return boundaries, then one restricts themself to operating entirely within an industrial strength word processor. (I can go into details if you wish, but supporting that ability in a CAT tool will require building and maintaining some very complex and complete file filters to support the arbitrary formatting changes such would require, and also the necessary infrastructure to support such filters - a rather non trivial task. After doing this, you might as well extend the UI and you'll end up with a fully functional word processor).

Well, this is exactly the kind of advice we're looking for.

Moving segment markers within structural boundaries is relatively simple, compared with the previous task, but it also has trade-offs, primarily in string recall, fuzzy matching ability and performance. When a segment marker is moved, a new 'string' is created, and the entire database much be searched for strings identical or similar to this one before information can be provided to the translator (there is no pre-processing which can anticipate such resegmenting).

Sure. Even so, like with Trados and WordFast, a decent user parameterization of segmenting should reduce the frequency of resegmenting operations to a very low value. At this stage, if it takes time (CPU), never mind.

We don't pretend it's needed every couple of sentences, only that it will be needed from time to time. I personally translate a lot and rarely resegment text. When I do it, it's often because of Trados inefficiencies, but I would feel very unhappy if I was not allowed to.

My only "harsh" critic about OmegaT at this stage is that, as explained by Marc, OmegaT only allows paragraph-level segmenting.

We know that a large number of translation agency customers expect us to be able to use more sophisticated segmenting features than a fixed paragraph-level only.

(...) I'm not a Trados user, but I do recall hearing about serious performance degradations as the translation memory size grows, presumably because of the resegmenting ability (or alternatively,
> just a poor search design).  OmegaT [currently] has fixed segment
> markers, but by doing this it can provide fuzzy matching
> information from databases literally of biblical proportions
> (early design estimates assumed the translation memory could grow
> to somewhere near the size of the Bible and it wouldn't degrade
> performance, assuming the machine had sufficient memory.

Sure. Anyway, I don't think our points of view are really contradictory.

Basically, we translators mostly need to perform segmenting at sentence level. Enabling the user to activate one or more out of several optional, pre-defined delimiters (Tab ":" ". " "[line break]") ".[CR]" "[CR]") - and to be able to interactively modify proposed segmentation in statistically rare circumstances - seems to me a reasonable feature to implement.

> (...).  Unnecessary overkill?  Oh yes.  But it was a design
trade-off in the interest of simplicity and performance that
> (1) greatly assisted in OmegaT actually seeing the light of day and
> (2) enabled OmegaT to function under the significant performance
> penalty of running under a generic cross-platform architecture.
> It was _NOT_ because that's how I thought that's translators would
> work best (...)

Sure.

I may adjust OmegaT to support modifiable segments in the future
> (it is on my to-do list) but my emphasis will remain on performance
> and in not implementing hack solutions.  My available time is split
> between several projects right now so I offer no timeframes.

As I was only asking you to consider this issue, I'm very satisfied with your answer.

> Finally, I think a committed developer (or developers) to actually
> own the project needs to be found, maybe even a recent college grad
> from Russia or Pakistan looking for experience.  If early releases
> of FreeCATS are sufficiently promising, and if it becomes necessary,
> it will be easier to raise 'donations' to support someone in one of
> the eastern countries to enable them to continue development.
> Raising sufficient $$ to influence a western developer is not
> realistic - you'll be bound by their available time and, most
> importantly, their interest in working on such a group project.
> I'm not saying this donation concept will be necessary, but one
> might at least consider it as a possibility.

1) Money - Donation option
We thought about that. Organizing a collective donation is something we're ready to organize among us.

At this stage, it would be very useful if you could provide us with the following: - An estimate of the workload required (in man-days) for you to code the next round of new features (interactive segmenting & splitting the client and server portions, TMX support and other suggestions you certainly have) - adding a little margin for debugging (with our help).
- The daily price you would request for doing this work.
Just to give a crude estimate, 200 translators each donating 50 Euros/USD, makes $ 10,000 $ available. Kirk already collected a list of translators mailing lists which we can use to raise interest - and funds - once we clearly define our goals and spend some time in "marketing" the whole idea.

2) Development resources - other option
Also note that weak in coding skills at it now is, our present team might be able to help as follows.

We have recently contacted two teachers at a famous French engineering school and they seemed eager to help us. They suggested they could help by providing a few man-months of development time by last-year students. As you would be able to direct their efforts, it could prove very valuable, at least for some parts of the job.

Of course, they have yet to decide if our project is worth it, but I consider that if we tell them we want to join forces with you and Marc (I'll come back quickly to Marc's message), it means we won't be starting from scratch, but from OmegaT 1.0.2.
I believe it will make quite a difference in terms of credibility.

You can also be sure that we can help by publicizing your project and bringing attention, testers, documentation writers (me for instance, along with Kirk & several others) & so on.

So, let us know your thoughts,

Henri





reply via email to

[Prev in Thread] Current Thread [Next in Thread]