[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: "args-out-of-range" error when using data from external process on W
From: |
Alexis |
Subject: |
Re: "args-out-of-range" error when using data from external process on Windows |
Date: |
Thu, 18 Apr 2024 21:20:55 +1000 |
User-agent: |
mu4e 1.12.4; emacs 29.3 |
Thanks again for your assistance!
As some additional context: i haven't actively used a Windows
system in more than a decade - it was Windows 7 - and even then, i
was running it in a VM in order to run some other software. i've
also never used Windows outside of an "Australian English"
context, and have never done any dev work on the Windows
platform. So i've got only a minimal idea of how Windows does
various things nowadays, and have never needed to become familiar
with sysadmin-/dev-level Windows documentation. Until now. :-)
Specific responses inline below.
I don't think I understand the setting of LC_ALL part. First,
AFAIK Windows programs generally ignore LC_* environment
variables. If you read the Microsoft documentation of
'setlocale', here:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170
you will not see any reference to environment variables there.
Thanks for this link; it gives me a good starting point to explore
the Win docs on this issue.
The Windows 'setlocale' supports only LC_* _categories_ in
direct calls to the function, and doesn't consider the
corresponding environment variables. The Emacs source code
doesn't reference LC_* environment variables on MS-Windows,
either. So how did the user set LC_ALL, and why did it have any
effect whatsoever on the issue?
They didn't say; all they wrote
(https://github.com/flexibeast/ebuku/issues/31#issuecomment-2058171986)
was:
I ... changed my LC_ALL to zh_CN.UTF-8. Ebuku can find the db
now.
i'll ask them.
Second, the user sets a UTF-8 locale, which as I wrote up-thread
is not a good idea on MS-Windows. It could well cause failures
in invoking external programs from Emacs, if the arguments to
those programs include non-ASCII characters. In general, on
MS-Windows Emacs can only safely invoke programs with non-ASCII
characters in the command-line arguments if those characters can
be encoded by the system codepage, in this case codepage-936
AFAIU.
Thanks, i'll add that to the information i pass back to the user
on that GitHub issue.
Regarding the "invalid string for collation: Invalid argument"
error: how does ebuku determine the LOCALE argument with which
it calls string-collate-lessp? It is important to understand
what was the locale with which w32_compare_strings was called in
that case.
The single use of `string-collate-lessp` doesn't pass any LOCALE
argument, as i just wanted it to use the user's current locale for
sorting a given bookmark's tags into the appropriate
lexicographical order.
Finally, the issues with Windows-style file names with drive
letters and with file names that begin with "~" lead me to
believe that perhaps the underlying program 'buku' is not a
native Windows program, but a Cygwin or MSYS program, in which
case there could be incompatibilities both regarding file names
and regarding handling of non-ASCII characters (Cygwin and MSYS
use UTF-8 by default, whereas the native Windows build of Emacs
does not).
Sorry; i mentioned in my first email, but didn't reiterate in my
second, that `buku` is Python-based.
You need to take a good look at whether non-ASCII characters are
passed to 'buku' in this case, and how the output from 'buku' is
decoded.
👍
Also, ebuku-buku-path and ebuku-database-path should both be
quoted with shell-quote-argument (but I don't think this is a
problem in this case). Can ARGS include whitespace or characters
special for the Windows shell? if so, each argument should be
quoted with shell-quote-argument as well.
Thanks, noted.
How output is decoded when it is put into the temporary buffer
is also of interest -- what is the value of
buffer-file-coding-system in the temporary buffer after reading
output, in the OP's case?
*nod*
Emacs on MS-Windows
cannot use UTF-8 when encoding command-line arguments for
sub-programs, it can only use the system codepage. Using
set-language-environment as above will force Emacs to encode
command-line arguments in UTF-8, which could very well be the
reason for some of these problems.
Ah okay.
No.
The issue is complicated by several factors and will take a long
post to explain. The upshot is that for passing non-ASCII
characters safely to subprograms on their command lines, Emacs
should use the system codepage, not UTF-8 or anything else (and
definitely not UTF-16). This might require some tricky juggling
with coding-system related settings when you call call-process,
because coding-system-for-write is used for both encoding of the
command-line arguments and of the stuff we send to the
sub-program, so if they both can include non-ASCII characters,
some care is in order. (By contrast, coding-system-for-read can
be always bound to UTF-8 to decode the output correctly --
assuming 'buku' outputs UTF-8 encoded text on MS-Windows.)
That's very helpful, thank you.
The more important question is: can CRAB emoji be safely encoded
by codepage 936, the system codepage of the OP? If not, and if
that emoji can appear in the command-line arguments of a 'buku'
invocation (as opposed to in the text we write to or read from
'buku'), then this character cannot be used at all with this
package on MS-Windows.
(And please note that Emacs now has a native SQLite support,
which should make many of these complications simply disappear.)
It would certainly make many things easier to just interact with
the db directly. That said, doing so would involve a substantial
rewrite, and i've got many things on my plate nowadays, including
supporting disabled loved ones while having chronic health issues
myself. But maybe i can open an issue requesting help to start and
develop a branch doing such a rewrite.
As for why the problems disappear when the CRAB emoji is
removed: as I wrote elsewhere, that's probably because all the
other characters are plain ASCII, so all the encoding-related
issues don't matter.
*nod*
They don't have any effect on Emacs on MS-Windows, that's for
sure. Whether they have effect on 'buku' depends on whether
it's a native MS-Windows program or Cygwin/MSYS program, and
also on its code (a program could potentially augment the MS
'setlocale' function with its own code which looks at the LC_*
environment variables, and does TRT in the application code).
*nod*
But what should i do to handle the more general case of an
arbitrary encoding? Do i need to have a defcustom, with
'reasonable defaults', that the user can set if necessary,
which i use as the value to pass to coding-system-for-read?
That depends on what encoding does 'buku' expect on input and
what encoding does it use on output. If it always uses UTF-8,
you just need to make sure Emacs uses UTF-8 when encoding and
decoding text passed to and from 'buku' (but note the caveat
about encoding the command-line arguments -- these _must_ be
encoded in the system codepage). If, OTOH, the encoding used by
'buku' can be changed dynamically, and Emacs cannot know what it
is (for example, if it is determined by the encoding of the text
put in the SQL database by the user), then a user option is in
order.
Great, thank you.
As i interpret their comments in the above discussions so far,
yes, they had themselves set LANG to "zh_CN.UTF-8" (and yes, as
described above, had definitely `set-language-environment` as
"UTF-8".
NOT RECOMMENDED!
*chuckle* i'll be sure to pass this on. :-)
Thanks again!
Alexis.
Re: "args-out-of-range" error when using data from external process on Windows, Eli Zaretskii, 2024/04/18