[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-datamash] Control the decimal separator
From: |
Assaf Gordon |
Subject: |
Re: [Bug-datamash] Control the decimal separator |
Date: |
Tue, 17 Oct 2017 17:35:34 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 |
Hello,
On 2017-10-16 08:21 AM, Magnus Göransson wrote:
I used LC_NUMERIC environment variable to control decimal separator, for
example:
export LC_NUMERIC=sv_SE.UTF-8
In my bash-script solved the problem that my decimals (",") was
incorrectly interpreted when the script was running from crontab where
the environment is very limited. The error from datamash was "invalid
numeric value in line" due to the wrong interpretation.
Thank you for the report.
To ensure I understand the problem, can you confirm the following:
1.
in sv_SE locale, thousands are separated by space, and decimals
(fractions) by a comma.
I see the following on my computer:
$ env LC_NUMERIC=sv_SE.UTF-8 printf "%'.1f\n" 300.3 2000.2 1000.1
300,3
2 000,2
1 000,1
$ env LC_NUMERIC=en_CA.UTF-8 printf "%'.1f\n" 300.3 2000.2 1000.1
300.3
2,000.2
1,000.1
2.
When dealing only with fraction (no thousand separators),
I see that "datamash" does work correctly if one sets the LC_NUMERIC
locale, and that is similar to other GNU programs (e.g. "sort"):
$ env LC_NUMERIC=sv_SE.UTF-8 printf "%'.1f\n" 1.5 1.1 1.7
1,5
1,1
1,7
$ env LC_NUMERIC=sv_SE.UTF-8 printf "%'.1f\n" 1.5 1.1 1.7 \
|LC_NUMERIC=sv_SE.UTF-8 sort -k1g,1 -s
1,1
1,5
1,7
With correct locale:
$ env LC_NUMERIC=sv_SE.UTF-8 printf "%'.1f\n" 1.5 1.1 1.7 \
| LC_NUMERIC=sv_SE.UTF-8 datamash sum 1
4,3
With incorrect locale:
$ env LC_NUMERIC=sv_SE.UTF-8 printf "%'.1f\n" 1.5 1.1 1.7 \
| LC_NUMERIC=en_CA.UTF-8 datamash sum 1
datamash: invalid numeric value in line 1 field 1: '1,5'
Is that what you are experiencing?
Or do you get datamash errors even in the correct locale?
3.
When thousands-separator space character is included, I see that
datamash does have problems parsing the values.
$ env LC_NUMERIC=sv_SE.UTF-8 printf "%'.1f\n" 1000.1 300.3 2000.2
1 000,1
300,3
2 000,2
$ env LC_NUMERIC=sv_SE.UTF-8 printf "%'.1f\n" 1000.1 300.3 2000.2 \
| LC_ALL=sv_SE.UTF-8 datamash sum 1
datamash: invalid numeric value in line 1 field 1: '1 000,1'
However, I see that other GNU programs also do not parse these numbers:
$ env LC_NUMERIC=sv_SE.UTF-8 printf "%'.1f\n" 1000.1 300.3 2000.2 \
| LC_ALL=sv_SE.UTF-8 sort -k1g,1 -s --debug
sort: using ‘sv_SE.UTF-8’ sorting rules
1 000,1
_
2 000,2
_
300,3
_____
---
You mentioned only "decimal separator" issues - these should be solved
when specifying LC_NUMERIC=sv_SE.UTF-8 before executing 'datamash'.
As for the thousand separator - the current code does not support space
as a separator, in line with other gnu programs.
Is the above sufficient to work-around the issue, or do you experience
other issues ?
regards,
- assaf