emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs i18n


From: Juri Linkov
Subject: Re: Emacs i18n
Date: Thu, 21 Mar 2019 23:45:31 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (x86_64-pc-linux-gnu)

>   > Indeed, a complete implementation of all Russian morphological rules
>   > takes ~1600 lines of dense Perl code:
>
>   > http://www.linkov.net/files/nlp/Lingua-RU-Inflect.pm
>
>   > I can't imagine how to include all these rules to gettext.
>
> I agree with you about that.  What I propose is something else.
>
> 1. I do not propose implementing them all.  Only some -- whichever ones
> we think are worth while.
>
> 2. I do not propose putting any of this in gettext.
> What I propose would be Emacs code that operates on the strings that
> come from gettext.

The misconception of your proposal is assuming a pure algorithmic
inflection whereas actually inflection in Russian is dictionary-based
(in addition to algorithms that process words from the dictionary),
i.e. to be able to inflect a word you need a large dictionary of all
words where each word in the dictionary has at least the following
lexical properties:
- part of speech
- noun grammatical gender: masculine, feminine, neuter
- noun animacy: animate, inanimate
- inflection type

And the main parameters that influence the declension are:
- grammatical case (one of 6 basic: nominative, genitive, dative,
  accusative, instrumental, prepositional plus some additional)
- number: singular and plural.  Dual is not a grammatical number,
  it only influences the choice of cases for words after numerals:
  for 1    - nominative case, singular
  for 2..4 - genitive case, singular
  for 5..  - genitive case, plural

An additional problem is that there are many exceptions:
some words have an additional form called "count form"
https://en.wikipedia.org/wiki/Russian_declension#Count_form

For instance, an exception is to use "5 байт" (5 byte) instead of
what should be according to the grammatical rule that requires
genitive plural for most other words, but not for bytes,
i.e. this is incorrect: "5 байтов" (5 bytes).

Such exceptions are marked in the dictionary with a special
property that has different values:

- mandatory: only the count form is allowed for such units
  of measure as amperes, watts, volts, bits, bytes, etc.
- optional: both forms are accepted for such units as angstroms,
  gauss, (kilo)grams, decibels, carats, microns, ohms, röntgen, etc.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]