Re: "args-out-of-range" error when using data from external process on W

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "args-out-of-range" error when using data from external process on W

From:	Alexis
Subject:	Re: "args-out-of-range" error when using data from external process on Windows
Date:	Thu, 18 Apr 2024 21:20:55 +1000
User-agent:	mu4e 1.12.4; emacs 29.3


Thanks again for your assistance!

As some additional context: i haven't actively used a Windowssystem in more than a decade - it was Windows 7 - and even then, iwas running it in a VM in order to run some other software. i'vealso never used Windows outside of an "Australian English"context, and have never done any dev work on the Windowsplatform. So i've got only a minimal idea of how Windows doesvarious things nowadays, and have never needed to become familiarwith sysadmin-/dev-level Windows documentation. Until now. :-)


Specific responses inline below.

I don't think I understand the setting of LC_ALL part. First,AFAIK Windows programs generally ignore LC_* environmentvariables. If you read the Microsoft documentation of'setlocale', here:
  
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170

you will not see any reference to environment variables there.

Thanks for this link; it gives me a good starting point to explorethe Win docs on this issue.

The Windows 'setlocale' supports only LC_* _categories_ indirect calls to the function, and doesn't consider thecorresponding environment variables. The Emacs source codedoesn't reference LC_* environment variables on MS-Windows,either. So how did the user set LC_ALL, and why did it have anyeffect whatsoever on the issue?

They didn't say; all they wrote(https://github.com/flexibeast/ebuku/issues/31#issuecomment-2058171986)was:

I ... changed my LC_ALL to zh_CN.UTF-8. Ebuku can find the dbnow.


i'll ask them.

Second, the user sets a UTF-8 locale, which as I wrote up-threadis not a good idea on MS-Windows. It could well cause failuresin invoking external programs from Emacs, if the arguments tothose programs include non-ASCII characters. In general, onMS-Windows Emacs can only safely invoke programs with non-ASCIIcharacters in the command-line arguments if those characters canbe encoded by the system codepage, in this case codepage-936AFAIU.

Thanks, i'll add that to the information i pass back to the useron that GitHub issue.

Regarding the "invalid string for collation: Invalid argument"error: how does ebuku determine the LOCALE argument with whichit calls string-collate-lessp? It is important to understandwhat was the locale with which w32_compare_strings was called inthat case.

The single use of `string-collate-lessp` doesn't pass any LOCALEargument, as i just wanted it to use the user's current locale forsorting a given bookmark's tags into the appropriatelexicographical order.

Finally, the issues with Windows-style file names with driveletters and with file names that begin with "~" lead me tobelieve that perhaps the underlying program 'buku' is not anative Windows program, but a Cygwin or MSYS program, in whichcase there could be incompatibilities both regarding file namesand regarding handling of non-ASCII characters (Cygwin and MSYSuse UTF-8 by default, whereas the native Windows build of Emacsdoes not).

Sorry; i mentioned in my first email, but didn't reiterate in mysecond, that `buku` is Python-based.

You need to take a good look at whether non-ASCII characters arepassed to 'buku' in this case, and how the output from 'buku' isdecoded.


👍

Also, ebuku-buku-path and ebuku-database-path should both bequoted with shell-quote-argument (but I don't think this is aproblem in this case). Can ARGS include whitespace or charactersspecial for the Windows shell? if so, each argument should bequoted with shell-quote-argument as well.


Thanks, noted.

How output is decoded when it is put into the temporary bufferis also of interest -- what is the value ofbuffer-file-coding-system in the temporary buffer after readingoutput, in the OP's case?


*nod*

Emacs on MS-Windowscannot use UTF-8 when encoding command-line arguments forsub-programs, it can only use the system codepage. Usingset-language-environment as above will force Emacs to encodecommand-line arguments in UTF-8, which could very well be thereason for some of these problems.


Ah okay.

No.
The issue is complicated by several factors and will take a longpost to explain. The upshot is that for passing non-ASCIIcharacters safely to subprograms on their command lines, Emacsshould use the system codepage, not UTF-8 or anything else (anddefinitely not UTF-16). This might require some tricky jugglingwith coding-system related settings when you call call-process,because coding-system-for-write is used for both encoding of thecommand-line arguments and of the stuff we send to thesub-program, so if they both can include non-ASCII characters,some care is in order. (By contrast, coding-system-for-read canbe always bound to UTF-8 to decode the output correctly --assuming 'buku' outputs UTF-8 encoded text on MS-Windows.)


That's very helpful, thank you.

The more important question is: can CRAB emoji be safely encodedby codepage 936, the system codepage of the OP? If not, and ifthat emoji can appear in the command-line arguments of a 'buku'invocation (as opposed to in the text we write to or read from'buku'), then this character cannot be used at all with thispackage on MS-Windows.
(And please note that Emacs now has a native SQLite support,which should make many of these complications simply disappear.)

It would certainly make many things easier to just interact withthe db directly. That said, doing so would involve a substantialrewrite, and i've got many things on my plate nowadays, includingsupporting disabled loved ones while having chronic health issuesmyself. But maybe i can open an issue requesting help to start anddevelop a branch doing such a rewrite.

As for why the problems disappear when the CRAB emoji isremoved: as I wrote elsewhere, that's probably because all theother characters are plain ASCII, so all the encoding-relatedissues don't matter.


*nod*

They don't have any effect on Emacs on MS-Windows, that's forsure. Whether they have effect on 'buku' depends on whetherit's a native MS-Windows program or Cygwin/MSYS program, andalso on its code (a program could potentially augment the MS'setlocale' function with its own code which looks at the LC_*environment variables, and does TRT in the application code).


*nod*

But what should i do to handle the more general case of anarbitrary encoding? Do i need to have a defcustom, with'reasonable defaults', that the user can set if necessary,which i use as the value to pass to coding-system-for-read?
That depends on what encoding does 'buku' expect on input andwhat encoding does it use on output. If it always uses UTF-8,you just need to make sure Emacs uses UTF-8 when encoding anddecoding text passed to and from 'buku' (but note the caveatabout encoding the command-line arguments -- these _must_ beencoded in the system codepage). If, OTOH, the encoding used by'buku' can be changed dynamically, and Emacs cannot know what itis (for example, if it is determined by the encoding of the textput in the SQL database by the user), then a user option is inorder.


Great, thank you.

As i interpret their comments in the above discussions so far,yes, they had themselves set LANG to "zh_CN.UTF-8" (and yes, asdescribed above, had definitely `set-language-environment` as"UTF-8".
NOT RECOMMENDED!


*chuckle* i'll be sure to pass this on. :-)

Thanks again!


Alexis.

[Prev in Thread]

Current Thread

[Next in Thread]

"args-out-of-range" error when using data from external process on Windows, Alexis, 2024/04/18
- Re: "args-out-of-range" error when using data from external process on Windows, Eli Zaretskii, 2024/04/18
  - Re: "args-out-of-range" error when using data from external process on Windows, Alexis, 2024/04/18
    - Re: "args-out-of-range" error when using data from external process on Windows, Eli Zaretskii, 2024/04/18
    - Re: "args-out-of-range" error when using data from external process on Windows, Alexis <=
    - Re: "args-out-of-range" error when using data from external process on Windows, Alexis, 2024/04/19
    - Re: "args-out-of-range" error when using data from external process on Windows, Eli Zaretskii, 2024/04/19
    - Re: "args-out-of-range" error when using data from external process on Windows, Alexis, 2024/04/21
- Re: "args-out-of-range" error when using data from external process on Windows, Eli Zaretskii, 2024/04/18
- "args-out-of-range" error when using data from external process on Windows, Alexis, 2024/04/18
  - Re: "args-out-of-range" error when using data from external process on Windows, Eli Zaretskii, 2024/04/18

Prev by Date: Re: Improve the error report of format
Next by Date: Re: Improve the error report of format
Previous by thread: Re: "args-out-of-range" error when using data from external process on Windows
Next by thread: Re: "args-out-of-range" error when using data from external process on Windows
Index(es):
- Date
- Thread