[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Grep Japanese characters
From: |
Filipp Gunbin |
Subject: |
Re: Grep Japanese characters |
Date: |
Mon, 16 Jul 2018 23:11:36 +0300 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (darwin) |
On 13/07/2018 17:36 +0300, Eli Zaretskii wrote:
>> From: Filipp Gunbin <fgunbin@fastmail.fm>
>> Cc: help-gnu-emacs@gnu.org
>> Date: Fri, 13 Jul 2018 17:06:38 +0300
>>
>> > The conclusion is that UTF-8 can be used as a locale's codeset
>> > (good!), but sending UTF-8 text to the console still doesn't work well
>> > (not so good). So if people use this knob in Windows 10, they should
>> > arrange for console input and output to be in some codepage other than
>> > 65001 (a.k.a. UTF-8).
>> [..]
>>
>> But in message <86pnzsbnvu.fsf@misasa.okayama-u.ac.jp> above it was
>> reported that grepping of these non-ascii chars worked from emacs, no?
>
> When you gerp from Emacs, the results of the search are not displayed
> by the Windows console, they get read by Emacs and displayed by Emacs.
> And (GUI) Emacs can display _any_ character supported by the fonts
> installed on the systems, regardless of the codepage. But if people
> run Grep from the shell prompt, they will see unreadable output, even
> on Windows 10 with that setting in effect.
>
>> And what does "using as locale's codeset" then means in your message?
>
> A locale's most general specification is ll_CC.ENC, where ll is the
> language, CC is the country, and ENC is the encoding. Example from
> Posix systems: pr_BR.UTF-8, for Brazilian variety of Portuguese with
> UTF-8 encoding. Example from Windows: French_Canada.1252 (where 1252
> is the codepage used for encoding). The ENC part is also known as
> "codeset".
>
> More about that, for Windows in particular, here:
>
> https://msdn.microsoft.com/en-us/library/x99tb11d.aspx
>
> You will see that the MS doc still says UTF-8 is not supported as the
> ENC part.
Thanks. I'm familiar with locale concept, but was not sure about what
"codeset" means.
I'm still a bit lost in this. It seems that sending/receiving to/from
subprocesses works with that Win10 setting, that's why grepping from M-x
shell started to work. Output in graphical Emacs will work if font is
ok.
But the interactions with console confuse me, I guess I need to read
more on that before I am able to ask something meaningful. In
particular, it's unclear to me why grep outputs Japanese correctly in
the OP (with LC_ALL=en_US.UTF-8), and you say that sending UTF-8 text to
console will not work.
- Grep Japanese characters, Tak Kunihiro, 2018/07/11
- Re: Grep Japanese characters, Eli Zaretskii, 2018/07/11
- Re: Grep Japanese characters, YUE Daian, 2018/07/11
- Re: Grep Japanese characters, Tak Kunihiro, 2018/07/12
- Re: Grep Japanese characters, Eli Zaretskii, 2018/07/12
- Re: Grep Japanese characters, Tak Kunihiro, 2018/07/12
- Re: Grep Japanese characters, Eli Zaretskii, 2018/07/13
- Re: Grep Japanese characters, Filipp Gunbin, 2018/07/13
- Re: Grep Japanese characters, Eli Zaretskii, 2018/07/13
- Re: Grep Japanese characters,
Filipp Gunbin <=
- Re: Grep Japanese characters, Eli Zaretskii, 2018/07/16
- Re: Grep Japanese characters, YUE Daian, 2018/07/12
- Re: Grep Japanese characters, Eli Zaretskii, 2018/07/12
Re: Grep Japanese characters, Yuri Khan, 2018/07/12