bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#55694: [PATCH] Add support for the Batak scripts


From: समीर सिंह Sameer Singh
Subject: bug#55694: [PATCH] Add support for the Batak scripts
Date: Sun, 29 May 2022 17:14:57 +0530

Thank you for the feedback!

> From: समीर सिंह Sameer Singh
>  <lumarzeli30@gmail.com>
> Date: Sun, 29 May 2022 06:21:33 +0530
>
> This time the Batak scripts are added to Emacs.
> Since the Batak scripts are actually a collection of five scripts:
> Toba, Karo, Pakpak, Mandailing, and Simalungun.

I think the above are _languages_, not scripts.  They all used in the
past to use the Batak script for writing, but they aren't scripts.

The term "Batak" is not just the name for its script, it is a collective term used for the tribes in Sumatra.
So adding "Batak" to the language name like the Batak Karo language or the Batak Pakpak language is also correct.
Though not adding it also seems fine.

For e.g. check the Indonesian Wikipedia page for these languages.

https://id.wikipedia.org/wiki/Bahasa_Karo
Bahasa Batak Karo atau bahasa Karo adalah sebuah bahasa Austronesia dalam rumpun bahasa Batak.
(Tl. Karo Batak language _or_ Karo language is an Austronesian language in the Batak language family)

https://id.wikipedia.org/wiki/Bahasa_Mandailing
Here the infobox uses Bahasa Batak Mandailing instead of just Bahasa Mandailing (here Bahasa means language which comes from the Sanskrit word भाषा bhāṣā)

Is this greeting common to all the languages using the Batak script?
I don't think so.  So perhaps we should have several greetings, one
for each language?

This greeting (Horas) is common in all but one language (Batak Karo).
Even though it is the same in Batak Toba, Pakpak, Mandailing and Simalungun, there are slight variations in the way it is written.
But still they represeneting may be sufficientt just one Unicode block, writing one gre to show that this script is supported.
Though we can also have multiple greetings, it is up to you.

See above: this should distinguish between the script name and the
language names.  Something like

  **** Karo language using the Batak script and its language environment

See my first point

Btw, according to this article:

  https://en.wikipedia.org/wiki/Batak_languages

there are 2 more languages that used Batak; why aren't they included?

Sadly I could not find any information about them, the unicode proposals only talk about the five languages.
Check the points 7.1 to 7.5 of this document https://www.unicode.org/wg2/docs/n3320.pdf
There is no mention of Alas-kluet or Angkola.

The input methods look almost identical, with a few minor deviations.
Are the differences real or are they mistakes?  If they are mistakes,
we can have just one input method for all the languages using Batak.
And if the differences are real, can we still have only one input
method, where the different variants of the same ASCII letter are
selected by the user at typing time?  It seems un-economical to have
so many input methods that are almost identical.

Ok, I will merge them into one input method.


On Sun, May 29, 2022 at 12:43 PM Eli Zaretskii <eliz@gnu.org> wrote:
> From: समीर सिंह Sameer Singh
>  <lumarzeli30@gmail.com>
> Date: Sun, 29 May 2022 06:21:33 +0530
>
> This time the Batak scripts are added to Emacs.
> Since the Batak scripts are actually a collection of five scripts:
> Toba, Karo, Pakpak, Mandailing, and Simalungun.

I think the above are _languages_, not scripts.  They all used in the
past to use the Batak script for writing, but they aren't scripts.

> I have provided 5 different language environments and input-methods for them.
>
> Please review the patch.
> --- a/etc/HELLO
> +++ b/etc/HELLO
> @@ -28,6 +28,7 @@ Amharic (አማርኛ)      ሠላም
>  Arabic (العربيّة)    السّلام عليكم
>  Armenian (հայերեն)   Բարև ձեզ
>  Balinese (ᬅᬓ᭄ᬱᬭᬩᬮᬶ)  ᬒᬁᬲ᭄ᬯᬲ᭄ᬢ᭄ᬬᬲ᭄ᬢᬸ
> +Batak (ᯘᯮᯒᯗ᯲ᯅᯗᯂ᯲)    ᯂᯬᯒᯘ᯲

Is this greeting common to all the languages using the Batak script?
I don't think so.  So perhaps we should have several greetings, one
for each language?

> --- a/etc/NEWS
> +++ b/etc/NEWS
> @@ -826,6 +826,11 @@ corresponding language environments are:
>  **** Balinese script and language environment
>  **** Javanese script and language environment
>  **** Sundanese script and language environment
> +**** Batak Karo script and language environment
> +**** Batak Toba script and language environment
> +**** Batak Pakpak script and language environment
> +**** Batak Mandailing script and language environment
> +**** Batak Simalungun script and language environment

See above: this should distinguish between the script name and the
language names.  Something like

  **** Karo language using the Batak script and its language environment

> +(set-language-info-alist
> + "Batak Karo" '((charset unicode)
> +                (coding-system utf-8)
> +                (coding-priority utf-8)
> +                (input-method . "batak-karo")
> +                (sample-text . "Batak Karo (ᯘᯬᯒᯗ᯳ᯆᯗᯂ᯳)    ᯔᯧᯐᯬᯀᯱᯐᯬᯀᯱ")
> +                (documentation . "\
> +Batak Karo language and its script are supported in this language environment.")))

Likewise here.  The doc string should say something like

  Karo language using the Batak script is supported in this language
  environment.

> +
> +(set-language-info-alist
> + "Batak Toba" '((charset unicode)
> +                (coding-system utf-8)
> +                (coding-priority utf-8)
> +                (input-method . "batak-toba")
> +                (sample-text . "Batak Toba (ᯘᯮᯮᯒᯖ᯲ᯅᯖᯂ᯲)    ᯂᯬᯒᯘ᯲")
> +                (documentation . "\
> +Batak Toba language and its script are supported in this language environment.")))
> +
> +(set-language-info-alist
> + "Batak Pakpak" '((charset unicode)
> +                  (coding-system utf-8)
> +                  (coding-priority utf-8)
> +                  (input-method . "batak-pakpak")
> +                  (sample-text . "Batak Pakpak (ᯘᯮᯒᯗ᯲ᯅᯗᯂ᯲)    ᯂᯬᯒᯘ᯲")
> +                  (documentation . "\
> +Batak Pakpak language and its script are supported in this language environment.")))
> +
> +(set-language-info-alist
> + "Batak Mandailing" '((charset unicode)
> +                      (coding-system utf-8)
> +                      (coding-priority utf-8)
> +                      (input-method . "batak-mandailing")
> +                      (sample-text . "Batak Mandailing (ᯚᯮᯒᯖ᯲ᯅᯖᯄᯱ᯲)    ᯄᯬᯒᯚ᯲")
> +                      (documentation . "\
> +Batak Mandailing language and its script are supported in this language environment.")))
> +
> +(set-language-info-alist
> + "Batak Simalungun" '((charset unicode)
> +                      (coding-system utf-8)
> +                      (coding-priority utf-8)
> +                      (input-method . "batak-simalungun")
> +                      (sample-text . "Batak Simalungun (ᯙᯮᯮᯓᯖ᯳ᯅᯖᯃ᯳)    ᯃᯬᯓᯙ᯲")
> +                      (documentation . "\
> +Batak Simalungun language and its script are supported in this language environment.")))
> +

Btw, according to this article:

  https://en.wikipedia.org/wiki/Batak_languages

there are 2 more languages that used Batak; why aren't they included?

> +(quail-define-package
> + "batak-karo" "Batak Karo" "ᯂᯒᯭ" nil "Batak Karo phonetic input method."
> + nil t t t t nil nil nil nil nil t)
> +
> +(quail-define-rules

The input methods look almost identical, with a few minor deviations.
Are the differences real or are they mistakes?  If they are mistakes,
we can have just one input method for all the languages using Batak.
And if the differences are real, can we still have only one input
method, where the different variants of the same ASCII letter are
selected by the user at typing time?  It seems un-economical to have
so many input methods that are almost identical.

Thanks.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]