aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Aspell-user] Looking for usage advice


From: Christoph Hintermüller
Subject: Re: [Aspell-user] Looking for usage advice
Date: Sun, 30 Jan 2005 23:02:43 +0100
User-agent: KMail/1.6.2

Am Sonntag, 30. Januar 2005 20:31 schrieb Grzegorz Adam Hankiewicz:
First question which aspell version do you use <= 0.33 0.50.X or even 0.60.X?
[...]
> The book is written in XML. This is not a problem for aspell and its
> html/sgml mode. Each chapter is stored in a separate file, and so far
> translators have gone individually through each file. When the file
> has gone through an initial translation, we fire up aspell on it.
> 
> Since the text contains lots of technical words not included in the
> default dictionaries, and also sometimes text in English which is
> left verbatim, aspell reports many false positives which have to
> be ignored.
Are the english citations at least in the file spelled delimited by some 
recognizeable dleimiters (xml tags) like
  <en> this is some english text </en>
or
  "[...] this is some english text [...]"
or other. If the pairs of delimiters are unique and the texts of different 
languages do not overlapp aspell 0.60.X could help you to ease the spelling 
process. As there exists a context filter which allows to separate two 
different contexts of a text. One visible and one invisible as long as both 
are separated by at least one pair of delimiters. In this case a two pass 
spell check with initial context visible for spelling spanisch text and 
initial context invisible to aspell for spelling english text would do the 
trick. In case you collect all the settings for both passes in the mode files 
spell-en and spell-es including proper selection of englisch ans spanish 
dictionary, than the following calls would do the trick.
  aspell --mode spell-es -c <file-to-spell>
  aspell --mode spell-en -c <file-to-spell>

If simple context switching is not suitable but there still  exists a set of 
rules how to distinguish between text parts to spell with english, text parts 
to spell using spanish dictionary an parts to spell not at all for 0.60.X you 
can code your own text filter to do the job.
> 
> Once the translator has gone through the XML file with aspell, we
> create an "ignore" file from the spell checked document. This ignore
> file is created with the list command and piped to a hidden file. On
> posterior aspell runs, this hidden file is converted into a custom
> dictionary (with "create master") and added on the commandline.
> 
Again in 0.60.X you could collect all your settings in a mode file and call 
aspell similar to the above examples.
[...]
> Possibly the best improvement we could find is if aspell was able
> to recognise different languages in the document being scanned.
> Reading the mailing list archives I've found out that this feature is
> not planned due to the intrinsic difficulty of detecting correctly
> a language.  However, in the kind of documents we translate
> usually english text is left alone in specific tags, like <screen>
> or <quote>.
See above. The following commandline examples give a hint uppon what could be 
the content of mode files spell-en and spell-es, i do not mention the 
parameters for the ignore file as this has to be splittet into es-ignore and 
en ignore file. the '\' char in the following only denotes that the line is 
continued do not add it literally

  aspell --add-filter context  --add-context-delimiters "<screen> </screen>" \
 --add-context-delimiters "<quote> </quote>" --language-tag en \
 --dont-context-initial-visible -c <file-to-spell>
  aspell --add-filter context --add-context-delimiters "<screen> </screen>" \
  --add-context-delimiters "<quote> </quote>"  --language-tag=es \
  --context-initial-visible -c <file-to-spell>

Sadly this only works for Aspell 0.60.X . Further i do not know exactly if 
context filter is run prior to xml filter or if it is part of the xml mode 
afaik it is. Therefore i didn't add any --rem-all-context-delimiters to the 
above lines.
> 
> Possibly heresy in itself, it could be useful if aspell had a basic
> XML scanner and was more aware of the format of the document it is
> parsing, providing user customised hooks whenever specific tags are
> found. By basic I mean really dump word matching: if the tag <quote
> is found and the user specified this as a hook, aspell could maybe
> change to another dictionary on the fly, prompt the user whether
> this change is OK (showing it at the same time on the screen for the
> user to judge), maybe pipe the bit of text to yet another program,
> etc, until the byte sequence </quote> had been found.
> 
Why there is not only xml there are other text file formats too. Thus i think 
it would be better to add a multi context filter to aspell. This filter 
should be capable of not only distinguishing visible and invisible but aso 
handle a visible context within an invisble and a invisble within a visible 
one.
[...]
As a final question are you able to change aspell to 0.60.X if you not using 
it allreadyl.

cu
Xris




reply via email to

[Prev in Thread] Current Thread [Next in Thread]