help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Grep Japanese characters


From: Filipp Gunbin
Subject: Re: Grep Japanese characters
Date: Mon, 16 Jul 2018 23:11:36 +0300
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (darwin)

On 13/07/2018 17:36 +0300, Eli Zaretskii wrote:

>> From: Filipp Gunbin <fgunbin@fastmail.fm>
>> Cc: help-gnu-emacs@gnu.org
>> Date: Fri, 13 Jul 2018 17:06:38 +0300
>>
>> > The conclusion is that UTF-8 can be used as a locale's codeset
>> > (good!), but sending UTF-8 text to the console still doesn't work well
>> > (not so good).  So if people use this knob in Windows 10, they should
>> > arrange for console input and output to be in some codepage other than
>> > 65001 (a.k.a. UTF-8).
>> [..]
>>
>> But in message <86pnzsbnvu.fsf@misasa.okayama-u.ac.jp> above it was
>> reported that grepping of these non-ascii chars worked from emacs, no?
>
> When you gerp from Emacs, the results of the search are not displayed
> by the Windows console, they get read by Emacs and displayed by Emacs.
> And (GUI) Emacs can display _any_ character supported by the fonts
> installed on the systems, regardless of the codepage.  But if people
> run Grep from the shell prompt, they will see unreadable output, even
> on Windows 10 with that setting in effect.
>
>> And what does "using as locale's codeset" then means in your message?
>
> A locale's most general specification is ll_CC.ENC, where ll is the
> language, CC is the country, and ENC is the encoding.  Example from
> Posix systems: pr_BR.UTF-8, for Brazilian variety of Portuguese with
> UTF-8 encoding.  Example from Windows: French_Canada.1252 (where 1252
> is the codepage used for encoding).  The ENC part is also known as
> "codeset".
>
> More about that, for Windows in particular, here:
>
>   https://msdn.microsoft.com/en-us/library/x99tb11d.aspx
>
> You will see that the MS doc still says UTF-8 is not supported as the
> ENC part.

Thanks.  I'm familiar with locale concept, but was not sure about what
"codeset" means.

I'm still a bit lost in this.  It seems that sending/receiving to/from
subprocesses works with that Win10 setting, that's why grepping from M-x
shell started to work.  Output in graphical Emacs will work if font is
ok.

But the interactions with console confuse me, I guess I need to read
more on that before I am able to ask something meaningful.  In
particular, it's unclear to me why grep outputs Japanese correctly in
the OP (with LC_ALL=en_US.UTF-8), and you say that sending UTF-8 text to
console will not work.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]