[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CSV parsing and other issues (Re: LC_NUMERIC)

From: Maxim Nikulin
Subject: Re: CSV parsing and other issues (Re: LC_NUMERIC)
Date: Sat, 12 Jun 2021 21:41:48 +0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

On 11/06/2021 04:10, Stefan Monnier wrote:
>> There are plenty of CSV dialects. If decimal separator is
>> "," then office software uses ";" instead of comma as cell
>> (field) separator.
> But there's no reason to presume that a given CSV file was
> generated in the same locale as the one we're currently
> using.
> So the locale could be one ingredient in the machinery used
> to guess which separator was used, but I'm not sure it would
> be of much help.

You are right. My expectation is still that ";" is mostly used for locales with comma as decimal separator, and in such cases it must be tried with higher priority due to records that have enough amount of both characters.


Originally the question raised exactly in the context of attempt to improve guessing of separator:
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=47885 The patches have
however other problems. Advanced options for table import are likely more suitable e.g. for csv-mode and may become unnecessary burden in org-mode (especially if kill-yank would work well in both directions).

Certainly users should have opportunity to explicitly specify the dialect of the files they are going to import.

> [ BTW, I'll take the opportunity to advocate for the use of
>   TSV instead, which is slightly less ill-defined.  ]

In real world one often does have full control of file formats he has to deal with. In simple cases I can use space separated columns of numbers having fixed width. On the other hand downloaded bank statements are namely CSV with ";" as delimiter and in legacy windows 8-bit encoding (and such files have a kind of header with varying column number distinct from the following table).

So ability to get decimal separator for current locale may slightly improve user experience with import of CSV files at least in Org mode. However it is just an aspect of support of locale-aware number formats in Emacs.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]