--- aspell-0.60/manual/aspell.info 2004-08-26 21:23:37.000000000 -0700 +++ aspell.info 2004-09-09 15:21:21.828705032 -0700 @@ -157,11 +157,11 @@ Aspell Ispell Netscape Microsoft 4.0 Word 97 -Open Source x x +Open Source x x Suggestion 88-98 54 55-70? 71 -Intelligence -Personal part x x x -of Suggestions +Intelligence +Personal part x x x +of Suggestions Alternate Dictionaries x x ? ? International Support x x ? ? @@ -396,7 +396,7 @@ Version 1.0 of LyX provides support for Aspell's learning from user's mistakes feature. - To use aspell with LyX 1.0 either change the `spell_command' option + To use Aspell with LyX 1.0 either change the `spell_command' option in the `.lyxrc' file or use the `run-with-aspell' utility. 3.2.4 With VIM @@ -561,7 +561,7 @@ OPTION [VALUE] - There may any number of spaces between the option and the value + There may be any number of spaces between the option and the value however it can only be spaces, i.e. there is no `=' between the option name and the value. @@ -693,7 +693,7 @@ (list) Extra dictionaries to use. dict-alias - (list) create dictionarie aliases. Each entry has the form `FROM + (list) create dictionary aliases. Each entry has the form `FROM TO'. Will override any system dictionaries that are present. @@ -706,7 +706,7 @@ encoding (string) The encoding the input text is in. Valid values include, but not limited to, `iso-8859-*', `utf-8', `ucs-2', `ucs-4'. When - using the aspell utility the default encoding is based on the + using the Aspell utility the default encoding is based on the current locale. Thus if your locale currently uses the `utf-8' encoding than everything will be in UTF-8. The `ucs-2' and `ucs-4' encodings are intended to be used by other programs using @@ -856,9 +856,9 @@ visible while the delimited ones are hidden. add|rem-context-delimiters - (list) Add or remove pairs of delimiters. This allows to specify - the character, or sequences of characters, which should be used to - switch contexts and and therefore have to be escaped by `\' if + (list) Add or remove pairs of delimiters. This allows you to + specify the character, or sequences of characters, which should be + used to switch contexts and therefore have to be escaped by `\' if they should appear literally. The two delimiting chars belonging to one pair have to be separated by a space character. If multiple pairs are specified by one `add|rem-context-delimiters' @@ -943,7 +943,7 @@ suggest (boolean) Suggest possible replacements in `pipe' mode. If false Aspell will simply report the misspelling and make no attempt at - suggestions possible corrections. + suggestions or possible corrections.  File: aspell.info, Node: Dumping Configuration Values, Next: Notes on Various Options, Prev: The Options, Up: Customizing Aspell @@ -980,8 +980,8 @@ ----------------------------------------------- Aspell now has filter support. You can either select from individual -filters or chose a filter mode. To select a filter mode use the `mode' -option. You may chose from `none', `url', `email', `sgml', `ccpp', +filters or choose a filter mode. To select a filter mode use the `mode' +option. You may choose from `none', `url', `email', `sgml', `ccpp', `tex' and any other available on your system. The default mode is `url'. Individual filters can be added with the option `add-filter' and removed with the `rem-filter' option. The currently available @@ -990,7 +990,7 @@ text from one format to another. To check which filters are available use `aspell dump filters'. To -check which filter modea are available use `aspell dump modes'. The +check which filter modes are available use `aspell dump modes'. The `aspell help' command will also list all available filter and filter modes. @@ -1025,7 +1025,7 @@ The SGML filter allows you to spell check SGML, HTML, XHTML, and XML files. In most cases everything within a tag `' will be skipped by the spell checker. The +attrib2="a whole sentence">' will be skipped by the spell checker. The SGML/HTML/XML that Aspell supports is a slight superset of most DTDs (Document Type Definitions) and can spell check the often non-conforming HTML found on the web. @@ -1047,8 +1047,8 @@ sgml-check This is a list of attributes whose values you do want spell checked. By default, 'alt' ( alternate text) is a member of - the check list since it is text th at is seen by a web page - viewer. You may also want 'value' to be on the check li st since + the check list since it is text that is seen by a web page + viewer. You may also want 'value' to be on the check list since that is the text put on buttons: add-filter-sgml-check value In this case `' will be flagged as @@ -1063,7 +1063,7 @@ 4.4.1.5 HTML Filter ................... -The `html' filter is like the SGML Filter Mode but specilized for HTML. +The `html' filter is like the SGML Filter Mode but specialized for HTML. By default, 'script' and 'style' are members of the skip list in HTML mode. @@ -1172,7 +1172,7 @@ instead of `the' or `hapoy' instead of `happy'. However in order to do this well Aspell needs to know the layout of the keyboard via the keyboard definition file. The keyboard definition file simply -identifies the keys on the keyboard and and which of them are right +identifies the keys on the keyboard and which of them are right next to each other. It has an extension of `.kbd' and and all non-ASCII characters are expected to be in UTF-8. @@ -1184,12 +1184,12 @@ key a A á Á - It generally only necessary to list keys which type more than one + It generally only is necessary to list keys which type more than one distinct letter as Aspell can derive the rest from the language data file. For example, it is not necessary to include the previously mentioned key. - To identify two keys as being right next to each other simpile list + To identify two keys as being right next to each other simply list the type keys right after each other. For example the line: as @@ -1197,7 +1197,7 @@ will indicate that `a' and `s' are right next to each other. If `as' is listed as a entry it is not necessary to list `sa' as an entry as that will be done automatically. Also by "right next to each other" I -mean to keys that are close enough together that it is easy to type one +mean two keys that are close enough together that it is easy to type one instead of the other. On most keyboards this means keys that are to the left or to the right of each other and _not_ keys that are below or above it. @@ -1215,7 +1215,7 @@ --------------------------------------------- In order to understand what these suggestion modes do, a basic -understanding of how aspell works is required. For that see *Note +understanding of how Aspell works is required. For that, see *Note Aspell Suggestion Strategy::. The suggestion modes are as follows. @@ -1240,7 +1240,7 @@ normal This method normally looks for soundslikes within two edit distance apart and performs typo-analysis unless it is turned off. Is is - several slower than fast mode but it returns better suggestions. + much slower than fast mode but it returns better suggestions. This mode gets 93% of the words. slow @@ -1248,16 +1248,16 @@ bad-spellers This method also looks for soundslikes within two edit distances - apart but is more tailored for the bad speller whereas `fast' or - `normal' are more tailored to strike a good balance between typos + apart but is tailored more for the bad speller, whereas `fast' or + `normal' are tailored more to strike a good balance between typos and true misspellings. This mode never performs typo-analysis and returns a _huge_ number of words for the really bad spellers who - can't seam to get the spelling anything close to what it should + can't seem to get the spelling of anything close to what it should be. If the misspelled word looks anything like the correct spelling it is bound to be found _somewhere_ on the list of 100 or more suggestions. This mode gets 98% of the words. - If jump tables where not used than the `normal' option is identical + If jump tables were not used, then the `normal' option is identical to `fast'. And the `slow' option is identical to the `normal' if jump tables were used. @@ -1286,8 +1286,8 @@ The `aspell-import' Perl script will look for old personal dictionaries and will import them into GNU Aspell. It will look for both Ispell and -Aspell ones. To use it just run it from the command prompt. If you -get an error about `/usr/bin/perl' not being found than instead try +Aspell ones. To use it, just run it from the command prompt. If you +get an error about `/usr/bin/perl' not being found, then instead, try `perl BINDIR/aspell-import'. When running the script if you get a message like: @@ -1333,7 +1333,7 @@ dictionary is selected by creating an alias from one dictionary name to another. This option is most useful when there is more than one dictionary for a given language. For example `add-dict-alias en_US -en_US-w_accents' will cause Aspell to chose the accented version of the +en_US-w_accents' will cause Aspell to choose the accented version of the American English dictionary instead of the non-accented version. To add an alias use: @@ -1381,7 +1381,7 @@ where BASE is the name of the word list and WORDLIST is the list of words separated by white space. The name of the word list will automatically be converted to all lowercase. The `./' is important -because without it aspell will create the word list in the normal word +because without it Aspell will create the word list in the normal word list directory. If you are trying to create a word list in a language other than English check the Aspell `data-dir' (usually `/usr/share/aspell', use `aspell dump config' to find out what it is on @@ -1395,12 +1395,12 @@ `--master=BASE'. During the creating of the dictionary you may get a number of -warnings or errors about invalid words or affixes. By default aspell +warnings or errors about invalid words or affixes. By default Aspell will skip any invalid words and remove invalid affixes. If you rather -that Aspell simply accepts all words given than the option +that Aspell simply accepts all words given then the option `--dont-validate-words' can be specified. To avoid checking if affixes are valid use the option `--dont-validate-affixes'. However, rather -than disable checking it is preferable to clean the input word list. +than disable checking, it is preferable to clean the input word list. This can be done by using the command aspell --local-data-dir=./ --lang=LANG clean < WORDLIST > RESULT @@ -1469,7 +1469,7 @@ In order for Aspell to be able to correctly recognize a dictionary based on the setting of the `LANG' environment variable the dictionaries need to be located somewhere Aspell can find them and they -need to be _multi_ dictionaries. Where aspell looks for dictionaries +need to be _multi_ dictionaries. Where Aspell looks for dictionaries depends on the value of the `dict-dir' and `word-list-path' option. `dict-dir' is generally `PREFIX/lib/aspell', and `word-list-path' is @@ -1524,7 +1524,7 @@ 6 Writing programs to use Aspell ******************************** -There are two main ways to use aspell from within your application. +There are two main ways to use Aspell from within your application. Through the external C API or through a pipe. The internal Aspell API can be used directly but that is not recommended as the actual Aspell API is constantly changing. @@ -1545,7 +1545,7 @@ classes. The two main classes are `AspellConfig' and `AspellSpeller'. The `AspellConfig' class is used to set initial defaults and to change spell checker specific options. The `AspellSpeller' class does most of -the real work. It is responsible for managing the dictionaries, +the real work. The `C API' is responsible for managing dictionaries, checking if a word is in the dictionary, and coming up with suggestions among other things. There are many helper classes the important ones are `AspellWordList', `AspellMutableWordList', `Aspell*Enumeration'. @@ -1559,7 +1559,7 @@ ----------- To use Aspell your application should include `aspell.h'. In order to -insure that all the necessary libraries are linked in libtool should be +ensure that all the necessary libraries are linked in libtool should be used to perform the linking. When using libtool simply linking with `-laspell' should be all that is necessary. When using shared libraries you might be able to simply link `-laspell', but this is not @@ -1630,10 +1630,10 @@ is null terminated. If the string is a cast from `const u16int *' or `const u32int *' then `size' is the amount of space in bytes the string takes up after being cast to `const char *' and not the true size of -the string. `sspell_speller_check' will return `0' is it is not found +the string. `sspell_speller_check' will return `0' if it is not found and non-zero otherwise. - If the word is not correct than the `suggest' method can be used to + If the word is not correct, then the `suggest' method can be used to come up with likely replacements. AspellWordList * suggestions = aspell_speller_suggest(spell_checker, @@ -1708,7 +1708,7 @@ 6.2 Through A Pipe ================== -When given the `pipe' or `-a' command Aspell goes into a pipe mode that +When given the `pipe' or `-a' command, Aspell goes into a pipe mode that is compatible with `ispell -a'. Aspell also defines its own set of extensions to Ispell pipe mode. @@ -1804,7 +1804,7 @@ num of items: item1, item2, etc - _(Part of the preceding section was directly copied out of the + _(Part of the preceding section was directly copied out of the Ispell manual)_  @@ -1823,9 +1823,9 @@ beginning, beaning, begging, ... -so the user selects _beginning_. However than, later on in the -document the user misspells it as _begng_ (*not* _beging_). Normally -aspell will suggest. +so the user selects _beginning_. However, later on in the document +the user misspells it as _begng_ (*not* _beging_). Normally Aspell +will suggest. began, begging, begin, begun, ... @@ -1859,13 +1859,13 @@ 7 Adding Support For Other Languages ************************************ -Before you consider adding support for Aspell first make sure that +Before you consider adding support for Aspell, first make sure that someone else has not already done it. A good number of dictionaries off the Aspell home page at `http://aspell.net'. If your language is not listed above please send me a note and I will work with you on adding support. - Adding a language to aspell is fairly straightforward. You basically + Adding a language to Aspell is fairly straightforward. You basically need to create the language data file, and compile a new word list. * Menu: @@ -1904,11 +1904,11 @@ `name' is the name of the language and should be the same as the file name (without the `.dat'). - `charset' is the 8-bit character set aspell will expect the word -lists to be formatted in. If possible chose from one of the standard + `charset' is the 8-bit character set Aspell will expect the word +lists to be formatted in. If possible choose from one of the standard ones provided with Aspell. These are `iso-8859-*', `koi8-*', or `viscii'. If your language does not require any non-ascii characters -chose `iso-8859-1'. If one of these standard character sets is not +choose `iso-8859-1'. If one of these standard character sets is not suitable for your language than you can create a new one. *Note Creating A New Character Set::. @@ -1918,7 +1918,7 @@ The encoding the language data files are expected to be in as well as the default encoding to use when saving the personal dictionaries. It can be either `utf-8' or any of the 8-bit - encoding that Aspell supports. If not set that it defaults to + encoding that Aspell supports. If not set then it defaults to `charset'. `special' @@ -1931,14 +1931,14 @@ CHAR is the non letter character in question. BEGIN, MIDDLE, END are either a `-' or a `*'. A star for BEGIN means that the character can begin a word, a `-' means it can't. The same is - true for MIDDLE and END. For example the entry for the `'' in + true for MIDDLE and END. For example, the entry for the `'' in English is: ' -*- To include more than one middle character just list them one after - another on the same line. for example to make both the `'' and - the `-' a middle character use the following line in the language + another on the same line. For example, to make both the `'' and + the `-' a middle character, use the following line in the language data file: special ' -*- - -*- @@ -1947,13 +1947,13 @@ The name of the soundslike data for the language. The data is expected to be in the file `NAME_phonetic.dat'. - If NAME is `simpile' than a very simpile soundslike is used. This - is nearly as powerful as full phonetic soundslike but it can be - computed a lot faster. (*note The Simpile Soundslike::) + If NAME is `simpile' than a very 'simpile soundslike' is used. + This is nearly as powerful as full phonetic soundslike but it can + be computed a lot faster. (*note The Simpile Soundslike::) - If the soundslike name is `none', or this option is not spcefied, + If the soundslike name is `none', or this option is not specified, than no soundslike will be used. The effective soundslike is the - word converted to all lowercase and possible with accents stripped + word converted to all lowercase and possibly with accents stripped depending on the `store-as' option. For languages with phonetic spelling the difference will not be very noticeable. However, for languages with non-phonetic spelling there will be a noticeable @@ -1986,9 +1986,9 @@ *Note Affix Compression::. `store-as' - How the words are indexed in the dictionary. If "stripped" than - the word is indexed in a lower case and deaccented form. If - "lower" than the word is indexed in a lower case form but with + How the words are indexed in the dictionary. If "stripped" then + the word is indexed in a lower case and de-accented form. If + "lower", then the word is indexed in a lower case form but with accent info still intact. This just controls how the word is indexed, not how it is stored. The default is "stripped" unless affix compression is used. @@ -2055,9 +2055,9 @@ version string can be anything but it should be changed whenever a new version of the translation array is released. This is important because it will keep Aspell from using a compiled dictionary with the -wrong set of rules. For example if when coming up with suggestion for -`hallo' Aspell will use the new rules to come up with the soundslike -say `H*L*' but if `hello' is stored in the dictionary using the old +wrong set of rules. For example, when coming up with a suggestion for +`hallo', Aspell will use the new rules to come up with the soundslike +say `H*L*', but if `hello' is stored in the dictionary using the old rules as `HL' instead of `H*L*' Aspell will never be able to come up with `hello'. So to solve this problem Aspell checks if the version strings match and aborts with an error if they don't. Thus it is @@ -2091,14 +2091,14 @@ would match any `DGE', `DGI' and `DGY' and replace them with `J'. This way you can reduce several rules to one. - Before the search string one or more dashes `-' may be placed. + Before the search string, one or more dashes `-' may be placed. Those search strings will be matched totally but only the beginning of -the string will be replaced. Furthermore for these rules no follow-up +the string will be replaced. Furthermore, for these rules no follow-up rule will be searched (what this is will be explained later). The rule `TCH-- '-> _ will match any word containing `TCH' (like `match') but will only replace the first character `T' with an empty string. The number of dashes determines how many characters from the end will not -be replaced. After the replacement the search for transformation rules +be replaced. After the replacement, the search for transformation rules continues with the not replaced `CH'! If a `<' is appended to the search string, the search for @@ -2189,18 +2189,18 @@ "KK" (as desired) if `collapse_result' is set to 1. That's why the English rules have `collapse_result' set to `0'. - By default all accents are removed from a word before it is matched + By default, all accents are removed from a word before it is matched to the soundslike rules. If you do not want this then add the line remove_accents 0 - at the beginning of your file. The exact defination of an accent is + at the beginning of your file. The exact definition of an accent is language dependent and is controlled via the character set file. If you -set remove_accents to '0' than you should also set "store-as" to "lower" +set remove_accents to '0' then you should also set "store-as" to "lower" in the language data file (not the phonetic transformation file) otherwise Aspell will have problems when both the accented and the -de-accented version of a word appear in the dictoinary; it will consider -one of them as incorrectly spelled. +de-accented version of a word appearing in the dictionary; it will +consider one of them as incorrectly spelled. 7.3.2 How do I start finally? ----------------------------- @@ -2212,7 +2212,7 @@ 7.3.2.1 Things that come in handy ................................. -First of all you need to have a large word list of the language you +First of all, you need to have a large word list of the language you want to make phonetics for. It should contain about as many words as the dictionary of the spell checker. If you don't have such a list, you will probably find an Ispell dictionary at @@ -2238,19 +2238,19 @@ ........................................ Normal text comparison works well as long as the typer misspells a word -because he pressed one key he didn't really want to press. In this -cases mostly one character differs from the original word. +because he pressed one key he didn't really want to press. In these +cases, mostly one character differs from the original word. In cases where the writer didn't know about the correct spelling of -the word however the word may have several characters that differ from -the original word but usually the word would still sound like the -original word. Someone might think for example that `tough' is spelled -`taff'. No spell checker without phonetic code will come to the idea -that this might be `tough' but a spell checker who knows that `taff' -would be pronounced like `tough' will make good suggestions to the -user. Another example could be `funetik' and `phonetic'. +the word, the word may have several characters that differ from the +original word but usually the word would still sound like the original. +Someone might think that `tough' is spelled `taff'. No spell checker +without phonetic code will come to the idea that this might be `tough', +but a spell checker which knows that `taff' would be pronounced like +`tough' will make good suggestions to the user. Another example could +be `funetik' and `phonetic'. - From this examples you can see that the phonetic transformation + From these examples you can see that the phonetic transformation should not be too fussy and too precise. If you implement a whole phonetic dictionary as you can find it in books this will not be very useful because then there could still be many characters differing from @@ -2264,7 +2264,7 @@ is spoken like "F and so we have a `PH -> F' rule. If you take a closer look you will even see that vowels sound very -similar in English language: `contradiction', `cuntradiction', +similar in the English language: `contradiction', `cuntradiction', `cantradiction' or `centradiction' in fact sound nearly the same, don't they? Therefore the English phonetic replacement rules not only reduce all vowels to one but even remove them all (removing is done by just @@ -2299,8 +2299,8 @@ does what you want. Another good way to check that changes you make to your rules don't -have any evil side effects is to create another list from your word -list which contains not only the word of the word list but also the +have any bad side effects is to create another list from your word list +which contains not only the word of the word list but also the corresponding phonetic version of this word on the same line. If you do this once before the change and once after the change you can make a diff (see `man diff') to see what _really_ changed. To do this use the @@ -2318,7 +2318,7 @@ During your work you should write down your basic ideas so that other people are able to understand what you did (and you still know about it after a few weeks). The English table has a huge documentation -appended for example. +appended as an example. Now you can start experimenting with all the things you just read and perhaps set up a nice phonetic transformation table for your language @@ -2430,7 +2430,7 @@ partially expand a word with affix infomation so that the affix flags do not effect the first 3 letters of the word. This will allow Aspell to get more accurate results when scanning the list for near misses -since the full word can be used and not just the root. Specifing this +since the full word can be used and not just the root. Specifying this option, however, will also effectively expand any prefixes. Thus this option should not be used for prefix heavy languages such as Hebrew. @@ -2564,7 +2564,7 @@ invent one. The new charset will only be used by Aspell internally. If the option `data-encoding' is set to `utf-8', and your current locale character type is always set to `utf-8', than you can use UTF-8 -for everything and not worry your self that an 8-bit character set is +for everything and not worry yourself that an 8-bit character set is being used internally. If your language has no more than 210 distinct symbols, including different capitalizations and accents, than Aspell can support it. @@ -2582,7 +2582,7 @@ ========================================= The character set data files that Aspell uses can be fine tuned. If -this is done that character set file should be renamed to `LANG.cset'. +this is done, that character set file should be renamed to `LANG.cset'. The line(s) starting with `=' do not specify the exact name of the file, but rather the name of the unicode mapping that is used. Unless @@ -2665,23 +2665,22 @@ ============================= There is a very good reason I use 8-bit characters in Aspell. Speed and -simplicity. While many parts of my code can fairly be easily be -converted to some sort of wide character as my code is clean. Other -parts can not be. +simplicity. While many parts of my code can fairly easily be converted +to some sort of wide character since my code is clean. Other parts can +not be. One of the reasons because is many, many places I use a direct lookup to find out various information about characters. With 8-bit characters this is very feasible because there is only 256 of them. With 16-bit wide characters this will waste a LOT of space. With 32-bit characters -this is just plain impossible. Converting the lookup tables to some -other form, while certainly possible, will degrade performance -significantly. +this is just plain impossible. Converting the lookup tables to another +form is certainly possible, but degrades performance significantly. Furthermore, some of my algorithms relay on words consisting only on a small number of distinct characters (often around 30 when case and accents are not considered). When the possible character can consist of -any Unicode character this number because several thousand, if that. In -order for these algorithms to still be used some sort of limit will +any Unicode character this number becomes several thousand, if that. In +order for these algorithms to still be used, some sort of limit will need to be placed on the possible characters the word can contain. If I impose that limit, I might as well use some sort of 8-bit characters set which will automatically place the limit on what the characters can @@ -2689,44 +2688,44 @@ There is also the issue of how I should store the word lists in memory? As a string of 32 bit wide characters. Now that is using up 4 -times more memory than charters would and for languages that can fit +times more memory than characters would and for languages that can fit within an 8-bit character that is, in my view, a gross waste of memory. So maybe I should store them is some variable width format such as -UTF-8. Unfortunately, way, way to many of may algorithms will simply +UTF-8. Unfortunately, way, way to many of the algorithms will simply not work with variable width characters without significant modification which will very likely degrade performance. So the solution is to work with the characters as 32-bit wide characters and than convert it to a shorter representation when storing them in the -lookup tables. Now than can lead to an inefficiency. I could also use +lookup tables. Now that can lead to an inefficiency. I could also use 16 bit wide characters however that may not be good enough to hold all -of future versions of Unicode and it has the same problems. +future versions of Unicode and therefore it has the same problems. As a response to the space waste used by storing word lists in some sort of wide format some one asked: - Since hard drive are cheaper and cheaper, you could store + Since hard drive are cheaper and cheaper, you could store a dictionary in a usable (uncompressed) form and use it directly with memory mapping. Then the efficiency would directly depend on the disk caching method, and only the used part of the - dictionaries would relay be loaded into memory. You would no more + dictionaries would really be loaded into memory. You would no more have to load plain dictionaries into main memory, you'll just want to compute some indexes (or something like that) after mapping. However, the fact of the matter is that most of the dictionary will be read into memory anyway if it is available. If it is not available than there would be a good deal of disk swaps. Making characters 32-bit -wide will increase the change that there are more disk swap. So the -bottom line is that it will be cheaper to convert the characters from +wide will increase the chance that there are more disk swap. So the +bottom line is that it is more efficient to convert the characters from something like UTF-8 into some sort of wide character. I could also use some sort of disk space lookup table such as the Berkeley Database. However this will *definitely* degrade performance. The bottom line is that keeping Aspell 8-bit internally is a very well though out decision that is not likely to change any time soon. -Fell free to challenge me on it, but, don't expect me to change my mind +Feel free to challenge me on it, but, don't expect me to change my mind unless you can bring up some point that I have not thought of before -and quite possible a patch to solve cleanly convert Aspell to Unicode -internally with out a serious performance lost OR serious memory usage +and quite possibly a patch to solve cleanly convert Aspell to Unicode +internally without a serious performance loss OR serious memory usage increase.  @@ -2735,7 +2734,7 @@ Appendix B Languages Which Aspell can Support ********************************************* -Even though Aspell will remain 8-bit internally it should still be be +Even though Aspell will remains 8-bit internally it should still be able to support any written languages not based on a logographic script. The only logographic writing system in current use are those based on hànzi which includes Chinese, Japanese, and sometimes Korean. @@ -2851,13 +2850,13 @@ kj Kwanyama Latin - - kk Kazakh Cyrillic - - kl Kalaallisut / Latin Maybe - - Greenlandic + Greenlandic kn Kannada Kannada - - kok Konkani Latin Maybe - kr Kanuri Latin - - ks Kashmiri Arabic - - ku Kurdish Arabic, Cyrillic, Maybe - - Latin + Latin kv Komi Cyrillic - - kw Cornish Latin Maybe - ky Kirghiz Cyrillic - - @@ -2870,7 +2869,7 @@ lt Lithuanian Latin 0.60 - lu Luba-Katanga Latin - - luo Luo (Kenya and Latin Maybe - - Tanzania) + Tanzania) lv Latvian Latin 0.60 - mg Malagasy Latin Maybe - @@ -2979,7 +2978,7 @@ B.1.1 Notes on Latin Languages ------------------------------ -Any word that can be written using on of the Latin ISO-8859 character +Any word that can be written using one of the Latin ISO-8859 character sets (ISO-8859-1,2,3,4,9,10,13,14,15,16) can be written, in decomposed form, using the ASCII characters, the 23 additional letters: @@ -3034,7 +3033,7 @@ characters to the previously specifed Unicode code-points, any modified ISO-8859 character set can be used for any Latin languages covered by ISO-8859. Of course decomposing every single accented character wastes -a lot of space, so only characters that can be not be represented in the +a lot of space, so only characters that cannot be represented in the precomposed form should be broken up. By using this trick it is possible to store foreign words in the correctly accented form in the dictionary even if the precomposed character is not in the current @@ -3063,7 +3062,7 @@ two parts based on the Consonant and Vowel parts. This encoding of the syllabary is far more useful to Aspell than if they were stored in UTF-8 or UTF-16. In fact, the exiting suggestion strategy of Aspell will work -well with this encoding with out any additional modifications. However, +well with this encoding without any additional modifications. However, additional improvements may be possible by taking advantage of the consonant-vowel structure of this encoding. @@ -3113,9 +3112,9 @@ but that there are no spaces between words. This means that there is no easy way to split a sentence into individual words. However, it is still possible to spell check these scripts, it is just a lot more -difficult. I will be happy to work within someone who is interested in +difficult. I will be happy to work with someone who is interested in adding Thai, Khmer, or Lao support to Aspell, but it is not likely -something I will do in the foreseeable future. +something I will do on my own in the foreseeable future. B.2.2 Languages which use Hànzi Characters ------------------------------------------ @@ -3128,10 +3127,10 @@ until full Unicode support is implemented. However, I am not even sure if these languages need spell checking since hànzi characters are generally not entered in directly. Furthermore even if Aspell could -spell check hànzi the exiting suggestion strategy will not work well at -all, and thus a completely new strategy will need to be developed. -However, it is is the case that hànzi needs to be spell checked and you -know something about the issues involved please fell free to contact me. +spell check hànzi the existing suggestion strategy will not work well +at all, and thus a completely new strategy will need to be developed. +However, if it is the case that hànzi needs to be spell checked and you +know something about the issues involved please feel free to contact me. B.2.3 Japanese -------------- @@ -3155,17 +3154,17 @@ B.2.4 Hangul ------------ -Koren in generally written in hangul or a mixture of han and hangul. In -Hangul letters individual letters, known as jamo, are grouped together +Korean is generally written in hangul or a mixture of han and hangul. +In Hangul letters, individual letters, known as jamo, are grouped together in syllable blocks. Unicode allows Hangul to be stored in one of three ways, (A) Individual jamo letters (Hangul Compatibility Jamo, U+3130 - U+318F), (D) decomposed jamo (Hangul Jamo, U+1100 - U+11FF), and (C) precoposed sylable blocks (Hangul Syllables, U+AC00 - U+D7AF). In order -for Aspell to work with Hangul it needs to be form A. Unfortually the +for Aspell to work with Hangul it needs to be form A. Unfortunately the existing Normalization code in Aspell will not be able to adequately deal with converting Hangul from form D and C to form A and back again. -However, once this code is written Aspell should be able to spell check -Hangul with out any problem. +However, once this code is written, Aspell should be able to spell check +Hangul without any problem.  File: aspell.info, Node: Multiple Scripts, Next: Planned Dictionaries, Prev: Unsupported, Up: Languages Which Aspell can Support @@ -3175,13 +3174,13 @@ Aspell should be able to check text written in the same language but in multiple scripts with some work. If the number of unique symbols in -both scripts is less than 210 than a special character set can be used -to allow both scripts to be encoding in the same dictionary. However +both scripts is less than 210, then a special character set can be used +to allow both scripts to be encoded in the same dictionary. However this may not be the most efficient solution. An alternate solution is -to store each script in its own dictionary and allow Aspell to chose +to store each script in its own dictionary and allow Aspell to choose the correct dictionary based on which script the given word is written -in. Aspell currently does not support this mode of spell checking -however it is something that I hope to eventually support. +in. Aspell currently does not support this mode of spell checking but +it is something that I hope to eventually support.  File: aspell.info, Node: Planned Dictionaries, Next: References, Prev: Multiple Scripts, Up: Languages Which Aspell can Support @@ -3299,20 +3298,20 @@ * Xhosa (xh) -If you are interested please contact him at scannell at slu edu. +If you are interested, please contact him at scannell at slu edu. A free spell checker for the Arabic (ar) languages called Duali can be found at `http://www.arabeyes.org/project.php?proj=duali'. It uses its own form of affix compression. This needs to be converted to Aspell format. The author, Mohammed Elzubeir, expressed an interest in -converting it but I have not herd from him in a while. If you are +converting it but I have not heard from him in a while. If you are interested in helping with this conversion please let me know. An Ispell hash table has been discovered for Albanian (sq) at `http://www.7kosova.com/kde-shqip/ispell/ispell.html'. However, the raw word list is not provided and the author has not been responding to -emails. If you know how to disassemble a Ispell hash table or the wear -abouts of the word list used to create the hash table I would +emails. If you know how to disassemble an Ispell hash table or the +where-abouts of the word list used to create the hash table I would appreciate hearing from you. A dictionary marked as "Planned" or "Maybe" but not listed in the @@ -3386,9 +3385,9 @@ sophisticated support for compound words in Aspell but it was too limiting and no one used it. - After receiving feedback from several people it seams that acceptable + After receiving feedback from several people it seems that acceptable support for compound words involved two basically independent parts. -If this is not suffecent for your language please let me know. +If this is not sufficient for your language please let me know. Part One ======== @@ -3413,7 +3412,7 @@ [^ey] [aeiou]y - It does not seam necessary to change the beginning of a word when + It does not seem necessary to change the beginning of a word when forming compounds Part Two @@ -3426,7 +3425,7 @@ can be given a set of rules to describe how it can be used in a compound word for example - A + B: indicates that category A may appear at beginning of a + A + B: indicates that category A may appear at the beginning of a word when followed by a category B word. When combined it is then considered a category B word. A + C + B: here a C word may only appear between an A or B word @@ -3470,12 +3469,12 @@ Many languages, including English, have words with non-letter symbols in them. For example the apostrophe. These symbols generally appear in the middle of a word, but they can also appear at the end, such as in an -abbreviation. If a symbol can _only_ appear as part of a word than +abbreviation. If a symbol can _only_ appear as part of a word then Aspell can treat it as if it were a letter. However, the problem is most of these symbols have other uses. For example, the apostrophe is often used as a single quote and the -abbreviations marker is also used as a period. Thus, Aspell can not +abbreviations marker is also used as a period. Thus, Aspell cannot blindly treat them as if they were letters. Aspell currently handles the case where the symbol can only appear in @@ -3501,18 +3500,18 @@ Numbers in words present a different challenge to Aspell. If Aspell treats numbers as letters than every possible number a user might write -in a document must be specified in the dictionary. This could be -easily be solved by having special code to assume all numbers are -correctly spelled. But what about something like "4th". Since the +in a document must be specified in the dictionary. This could easily +be solved by having special code to assume all numbers are correctly +spelled. Yet, what about something like "4th". Since the "th" suffix can appear after any number we are left with the same problem. The solution would be to have a special symbol for "any number". Words with spaces in them, such as foreign phrases, are even more -trouble to deal with. The basic problem is that when tokonizing a +trouble to deal with. The basic problem is that when tokenizing a string there is no good way to keep phrases together. One solution is to use trial and error. If a word is not in the dictionary try grouping it -with the previous or next word and see if the combined word is the +with the previous or next word and see if the combined word is in the dictionary. But what if the combined word is not, should the misspelled word be grouped when looking for suggestions? One solution is to also store each part of the phrase in the dictionary, but tag it as part of a @@ -3520,7 +3519,7 @@ To further complicate things, most applications that use spell checkers are accustom to parsing the document themselves and sending it -to the spell checker a word at a time. In order to support word with +to the spell checker a word at a time. In order to support words with spaces in them a more complicated interface will be required.  @@ -3537,9 +3536,9 @@ or U+0061 LATIN SMALL LETTER O + U+0308 COMBINING DIAERESIS - By performing normalization first Aspell will only see one of these + By performing normalization first, Aspell will only see one of these representations. The exact form of normalization depends on the -language. Give the choice of +language. Given the choice of: 1. Precomposed character @@ -3547,8 +3546,8 @@ 3. Base letter only -if the precomposed charter is in the target character set then (1), if -both the base and combing character is present than (2), otherwise (3). +if the precomposed charter is in the target character set, then (1), if +both base and combining character is present than (2), otherwise (3). Unicode Normalization is now implemented in Aspell 0.60. @@ -3565,7 +3564,7 @@ converting all words to lowercase before looking them up in the dictionary won't work because the conversion of `SS' to lowercase is ambiguous; it can be `ss' or `ß'. I do plan on dealing with this -eventually, however. +eventually.  File: aspell.info, Node: Context Sensitive Spelling, Prev: German Sharp S, Up: Language Related Issues @@ -3612,9 +3611,9 @@ multi-threaded I would like it to be thread safe so that it can be used by multi-threaded programs. There are several areas of Aspell that that are potently thread unsafe (such as accessing a global - pool) and several several classes which have the potential of - being used by more than one thread (such as the personal - dictionary). _[In Progress]_. + pool) and several classes which have the potential of being + used by more than one thread (such as the personal dictionary). + _[In Progress]_. * Enhance *ispell.el* so that it will work better with the new Aspell. _[In Progress]_. @@ -3631,17 +3630,17 @@ ------------------------------------- I would like to get these done. However, I may still consider Aspell -finished with out. They will probably eventually get implemented. +finished without them. They will probably eventually get implemented. However, I could still use help with them. * Better support for *compound words*. The support for _conditional_ compound words found in Aspell versions 0.50 and earlier is no - longer available since no one seams to be using it. Support for + longer available since no one seems to be using it. Support for _unconditional_ compound words will still be available. *Note Compound Words::. * Be able to accept *words with spaces in them* as many languages - have words, such as a word in a foreign phrase, which only make + have words, such as a word in a foreign phrase, which only makes sense when followed by other words. *Note Words With Symbols in Them::. @@ -3655,9 +3654,9 @@ * Use Lawrence Philips' new *Double Metaphone algorithm*. See `http://aspell.net/metaphone/'. The main task involved here is converting the algorithm into table form. This will take some time - but their is no real programming experience is required. If you - want to help with Aspell but don't have any real programming - experience, this would be a great place to start. + but their is no real programming experience required. If you want + to help with Aspell but don't have much programming experience, + this would be a great place to start. * Rank suggestions based on *frequency information*. Both global frequency and document specific frequency can be used. The latter @@ -3674,7 +3673,7 @@ complicated. For example if the word is misspelled which dictionary should it use for the suggestions? - * Write a *GUI* for the aspell utility. Ideally it should be able to + * Write a *GUI* for the Aspell utility. Ideally it should be able to do everything the Aspell utility can do and not just be able spell check a document. @@ -3790,7 +3789,7 @@ where it should skip words. It could also probably do a very good job on programming languages code. - If you are interested in helping be out with this or just have + If you are interested in helping me out with this or just have general comments about the idea please let me know.  @@ -3829,11 +3828,11 @@ D.3.3 Email the Personal Dictionary ----------------------------------- -Some one suggest in a personal email: +Some one suggested in a personal email: - Have you thought of adding a function to aspell, that - when the + Have you thought of adding a function to Aspell, that - when the personal dictionary has grown significantly - sends the user's - personal dictionary to the maintainer of the corresponding aspell + personal dictionary to the maintainer of the corresponding Aspell dictionary? (if the user allows it) It would be a very useful service to the dictionary maintainers, @@ -3910,7 +3909,7 @@ correctly. If you do not have Ispell or the traditional Unix `spell' utility -installed on your system than you should also copy the compatibly +installed on your system then you should also copy the compatibly scripts `ispell' and `spell' located in the `scripts/' directory into your binary directory which is usually `/usr/local/bin' so that programs that expect the `ispell' or `spell' command will work @@ -3922,9 +3921,9 @@ E.2 Curses Notes ================ -If you are having problems compiling `check_funs.cpp' than the most +If you are having problems compiling `check_funs.cpp' then the most likely reason is due to incompatibilities with the curses implementation -on your system. If this is the case than you can explicitly disable the +on your system. If this is the case then you can explicitly disable the curses library with `--disable-curses'. By doing this you will lose the nice full screen interface but hopefully you will be able to at least get Aspell to compile correctly. @@ -3949,10 +3948,9 @@ In order for Aspell to correctly spell check UTF-8 documents the "wide" version of the curses library must be installed. This is different from -the normal version of curses library, the and is normally named -`libcursesw' (with a `w' at the end) or `libncursesw'. With out the -right curses version installed UTF-8 documents will not display -correctly. +the normal version of curses library, and is normally named `libcursesw' +(with a `w' at the end) or `libncursesw'. Without the right curses +version installed UTF-8 documents will not display correctly. In addition your system must also support the `mblen' function. Although this function was defined in the ISO C89 standard (ANSI @@ -3973,8 +3971,8 @@ stored in `SHAREDIR/aspell' before Aspell 0.60. The format of the character data files has changed. The new -character data files are installed with Aspell so you shouldn't have to -worry about it unless you made a custom one. +character data files are installed with Aspell so you should not have +to worry about it unless you made a custom one. The dictionary option `strip-accents' has been removed. For this reason the old English dictionary (up to 0.51) will no longer work. A @@ -3985,26 +3983,26 @@ -------------------------- The Aspell 0.60 library is binary compatible with the Aspell 0.50 -library. For this reason I chose _not_ to increment the major version +library. For this reason I choose _not_ to increment the major version number of the shared (so-name) library by default which means programs that were compiled for Aspell 0.50 will also work for Aspell 0.60. However, this means that having both Aspell 0.50 and Aspell 0.60 installed at the same time can be pragmatic. If you wish to allow both -Aspell 0.50 and 0.60 to be installed at the same time than you can use +Aspell 0.50 and 0.60 to be installed at the same time then you can use the configure option `--incremented-soname' which will increment so-name. You should only use this option if you know what you are -doing. It is up to you to some how insure that both the Aspell 0.50 and +doing. It is up to you to somehow ensure that both the Aspell 0.50 and 0.60 executables can coexist. If after incrementing the so-name you wish to allow programs compiled for Aspell 0.50 to use Aspell 0.60 instead (thus implying that Aspell -0.50 is not installed) than you can use a special compatibility library +0.50 is not installed) then you can use a special compatibility library which can be found in the `lib5' directory. This directory will not be entered when building or installing Aspell so you must manually build and install this library. You should build it after the rest of Aspell is built. The order in which this library is installed, with relation to the rest of Aspell, is also important. If it is installed _after_ -the rest of Aspell than new programs will link to the old library +the rest of Aspell then new programs will link to the old library (which will work for Aspell 0.50 or 0.60) when built, if installed _before_, new programs will link with the new library (Aspell 0.60 only). @@ -4020,7 +4018,7 @@ with Aspell so there in no longer two separate libraries you have to worry about. - Because of the massive changes between Aspell/Pspell and aspell 0.50 + Because of the massive changes between Aspell/Pspell and Aspell 0.50 you may want to clean out the old files before installing the the new Aspell. To do so do a `make uninstall' in the original Aspell and Pspell source directories. @@ -4053,7 +4051,7 @@ provided. Due to a change in the way dictionaries are handled, scanning for -`.pwli' files in order to get find out which dictionaries are available +`.pwli' files in order to find out which dictionaries are available will no longer work. This means that programs that relied on this technique may have problems finding dictionaries. Fortunately, GNU Aspell now provided a uniform way to list all installed dictionaries @@ -4340,7 +4338,7 @@ forcing applications to relink whenever a new Aspell version is out which was due to the use of the libtool '-release' flag. - * Fixed Makefiles so that aspell can be built outside the source tree + * Fixed Makefiles so that Aspell can be built outside the source tree (i.e. with VPATH). * Updated the section on compiling with Win32.