bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: support of Japanese in makeinfo


From: Patrice Dumas
Subject: Re: support of Japanese in makeinfo
Date: Sat, 3 Dec 2011 11:42:04 +0100
User-agent: Mutt/1.4.2.2i

Hello,

On Fri, Mar 04, 2005 at 01:52:27PM +0900, Katsumi Yamaoka wrote:
> Hello,
> 
> There are four encodings which are generally used for Japanese
> text; they are ISO-2022-JP, EUC-JP, SHIFT_JIS and UTF-8.
> ISO-2022-JP uses only 7-bit data, but some characters which are
> special to makeinfo, for example, `@' and `{', will appear in
> the raw data.  EUC-JP uses only 8-bit data, is popular in the
> Unix-like systems as well as ISO-2022-JP.  SHIFT_JIS uses a
> mixture of 7-bit and 8-bit data, mainly used in MS Windows.

There are some change on that front ongoing.  First of all now that 
we use perl, all the encodings known to perl are supported.  There are
also some progress on line breaking for east asian languages, you can 
have a look at
https://savannah.gnu.org/bugs/index.php?22696

> 1. Japanese words are not separated by spaces as if they look
>    "Japanesewordsarenotseparatedbyspaces".
>
> 2. Lines can be broken even in the middle of a word.  However,
>    ASCII words embedded in Japanese sentences cannot be broken.

What we use in tp/ is Unicode::EastAsianWidth to determine whether
the character has a double wide, and at teh same time whether it is
possible to end the line right after it.

> 3. There's no space between the Japanese comma or the Japanese
>    period and the beginning of the next sentence.

What do you mean with 'Japanese comma' or 'Japanese period'? Is it
an unicode character or a , or .?

> 4. There are special rules called `kinsoku'.  Some characters
>    including Japanese comma, period and dash, small kana
>    characters and the right parenthesis should not be places at
>    the beginning of lines.  Contrarily, the left parenthesis and
>    so forth should not be placed at the end of lines.  In Emacs,
>    they are defined in the lisp/international/kinsoku.el file.

That is not supported right now.  But my opinion is that it is more
a language support thing, that should be in perl or in a perl module,
used in tp, but not something we do only for texinfo.  Are there Unicode
classes for kinsoku characters?  Is the description on 
http://en.wikipedia.org/wiki/Line_breaking_rules_in_East_Asian_language
correct?

> I would greatly appreciate making the makeinfo command support
> them.  I attach an example of the Japanese TexInfo file below.

I ran it through tp, but don't know what to look at to see if it is 
correct, not to mention that it do not display the correct characters
for me.  If you want to, you could have a try at tp and see if the result 
is correct, by doing a checkout of the cvs version of texinfo and doing

cd tp
cp ~/my_japanese_file.texi .
./texi2any.pl my_japanese_file.texi

and look at the result with an info reader.

-- 
Pat



reply via email to

[Prev in Thread] Current Thread [Next in Thread]