emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CSV parsing and other issues (Re: LC_NUMERIC)


From: Maxim Nikulin
Subject: Re: CSV parsing and other issues (Re: LC_NUMERIC)
Date: Fri, 11 Jun 2021 23:58:24 +0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

On 10/06/2021 23:57, Eli Zaretskii wrote:
>> From: Maxim Nikulin Date: Thu, 10 Jun 2021 23:28:59 +0700
>
> For processing CSV, if there's a need to know whether the
> locale uses the comma as a decimal separator, we could
> indeed extend locale-info.  But such an extension is almost
> trivial and doesn't even touch on the significant problems
> in the rest of the discussion.
>

You forgot `setlocale(LC_NUMERIC, "C")', didn't you?

#include <langinfo.h>
#include <locale.h>
#include <stdio.h>

int main() {
        setlocale(LC_ALL, "");
        printf("%c", *nl_langinfo(RADIXCHAR));
        setlocale(LC_NUMERIC, "C");
        printf("%c\n", *nl_langinfo(RADIXCHAR));
        return 0;
}

Output is ",.". There is nl_langinfo_l(3), but it requires more work.

After parsing of rows to cells, it may be necessary to parse numbers ("2,34" to 2.34). That is why quality of CSV file import is tightly related to handling of number formats.

>> I was trying to support Boruch that buffer-local variables
>> may be important part of locale context, more precise than
>> global settings,
>
> They are more precise, but they don't support mixed
> languages in the same buffer, something that happens in
> Emacs very frequently.

In some cases I would prefer to have uniform format of numbers and dates
despite alternating language in the buffer, e.g. for my private notes.

> Here's a trivial example:
>
>     (insert (downcase (buffer-substring POS1 POS2)))
>
> Contrast with
>
>     (insert (downcase "FOO"))

Either `set-text-properties' should be called on "FOO" before passing it to `downcase' or `locale-downcase' with LOCALE first argument should be added. Moreover, such `locale-downcase' function may be used to implement higher level functions working with implicit locales. LOCALE may assume some hierarchy with user overrides for particular call, text properties, buffer variables, global settings.

> Yes: what we have already in Emacs.  That covers a lot of
> the same Unicode turf that ICU handles, because we import
> and use the same Unicode files and tables.

There are plenty of xml files in cldr-common-39.0.zip (common/main/*.xml) https://www.unicode.org/Public/cldr/39/ in addition to Unicode data in Emacs sources. They include rules for number formatting https://unicode.org/reports/tr35/tr35-numbers.html Of course, human-style number formatting, currencies, financial style, etc. may be discarded and implementation may be limited to grouping and decimal separators (leaving other features to further requests). There is newlocale(3) function in glibc to obtain minimal subset of properties. I am not familiar with other platforms.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]