gnuastro-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnuastro-commits] master 8837a5a8: Table: date-to-millisec is a new col


From: Mohammad Akhlaghi
Subject: [gnuastro-commits] master 8837a5a8: Table: date-to-millisec is a new column arithmetic operator
Date: Fri, 11 Feb 2022 09:33:52 -0500 (EST)

branch: master
commit 8837a5a8ac40f3c0241499e5c663eba99d3e04ad
Author: Mohammad Akhlaghi <mohammad@akhlaghi.org>
Commit: Mohammad Akhlaghi <mohammad@akhlaghi.org>

    Table: date-to-millisec is a new column arithmetic operator
    
    Until now, when 'date-to-sec' was given a time-string with sub-second
    precision, it would add that fractional value to the large integer and save
    the result as a 64-bit floating point type.
    
    However, the number of significant digits in the floating point standard is
    limited (its in logarithmic scale) and as a result for two different time
    strings like below, it would give the same result (even though they are
    more than one second apart.
    
        date                        seconds
        ----                        -------
        2018-05-04T04:16:46.7774    1525410945
        2018-05-04T04:15:25.8326    1525410945
    
    With this commit, the issue has been solved in two steps:
    
     - 'date-to-sec' operator now only returns 64-bit signed integer numbers
       (no more floating points), and will therefore ignore any sub-second
       precision.
    
     - 'date-to-millisec' operator has been added to column arithmetic to let
       users count (as an integer: thus preserving every digit properly) the
       number of milli-seconds since the Unix epoch. Users who need sub-second
       precision can use this.
    
    As a result of these changes, the two dates above can now be read in units
    of seconds (second column) and milli-seconds with all digits properly
    preserved:
    
        date                        seconds       milli-seconds
        ----                        -------       -------------
        2018-05-04T04:16:46.7774    1525411006    1525411006777
        2018-05-04T04:15:25.8326    1525410925    1525410925832
    
    Notice how the correct seconds of both dates is now different from the
    common one that was initially reported (when converting it to 64-bit
    floating point).
    
    This bug was reported by Zohreh Ghaffari and Pedram Ashofteh Ardakani.
    
    This fixes bug #61976.
---
 NEWS                   | 13 ++++++++++
 bin/table/arithmetic.c | 70 +++++++++++++++++++++++++-------------------------
 bin/table/arithmetic.h |  1 +
 doc/gnuastro.texi      | 21 +++++++++++----
 4 files changed, 65 insertions(+), 40 deletions(-)

diff --git a/NEWS b/NEWS
index 54a32466..39122d2c 100644
--- a/NEWS
+++ b/NEWS
@@ -77,6 +77,10 @@ See the end of the file for license conditions.
      option, see the changed features below). Like the new '--noblank',
      this option can also be called multiple times, so '--noblankend=1
      --noblankend=2' is equivalent to '--noblankend=1,2'.
+   - New operator for column arithmetic:
+     - date-to-millisec: convert the formatted date string into the number
+       of milli-seconds (as a 64-bit integer) since Unix-time (00:00:00 of
+       1970-01-01).
 
   Match:
    - k-d tree based matching has been added for finding the matching rows,
@@ -136,6 +140,12 @@ See the end of the file for license conditions.
      without a blank magnitude, so you don't need to have the magnitude
      column any more!). The old behavior of this option is now available in
      the new '--noblankend' option. This point was raised by Samane Raji.
+   - 'date-to-sec' operator now only returns the number of seconds and is
+     always in a 64-bit signed integer format. Until now, if the input had
+     sub-second precision it would return a 64-bit floating point type;
+     however, that resulted in bug #61976 (where the loss of precision due
+     to the floating-point's logarithmic scale for too many significanat
+     digits caused incorrect coversions).
 
   Library:
    - gal_fits_tab_read: now takes the number of threads to read the desired
@@ -181,6 +191,9 @@ See the end of the file for license conditions.
               Ardakani.
   bug #62008: Arithmetic not deleting existing output when the 'makenew' is
               used (no FITS file exists in the reverse polish notation).
+  bug #61976: Table arithmetic date-to-sec produces same result for
+              different times (separated by about 1 second). This bug was
+              reported by Zohreh Ghaffari and Pedram Ashofteh Ardakani.
 
 
 
diff --git a/bin/table/arithmetic.c b/bin/table/arithmetic.c
index d85a84f1..f271c779 100644
--- a/bin/table/arithmetic.c
+++ b/bin/table/arithmetic.c
@@ -133,6 +133,7 @@ arithmetic_operator_name(int operator)
       case ARITHMETIC_TABLE_OP_IMGTOWCS: out="img-to-wcs"; break;
       case ARITHMETIC_TABLE_OP_DATETOSEC: out="date-to-sec"; break;
       case ARITHMETIC_TABLE_OP_DISTANCEFLAT: out="distance-flat"; break;
+      case ARITHMETIC_TABLE_OP_DATETOMILLISEC: out="date-to-millisec"; break;
       case ARITHMETIC_TABLE_OP_DISTANCEONSPHERE: out="distance-on-sphere"; 
break;
       default:
         error(EXIT_FAILURE, 0, "%s: a bug! Please contact us at %s to fix "
@@ -188,6 +189,8 @@ arithmetic_set_operator(struct tableparams *p, char *string,
         { op=ARITHMETIC_TABLE_OP_IMGTOWCS; *num_operands=0; }
       else if( !strcmp(string, "date-to-sec"))
         { op=ARITHMETIC_TABLE_OP_DATETOSEC; *num_operands=0; }
+      else if( !strcmp(string, "date-to-millisec"))
+        { op=ARITHMETIC_TABLE_OP_DATETOMILLISEC; *num_operands=0; }
       else if( !strcmp(string, "distance-flat"))
         { op=ARITHMETIC_TABLE_OP_DISTANCEFLAT; *num_operands=0; }
       else if( !strcmp(string, "distance-on-sphere"))
@@ -649,63 +652,59 @@ arithmetic_datetosec(struct tableparams *p, gal_data_t 
**stack,
   size_t i, v;
   int64_t *iarr;
   gal_data_t *out;
+  double subsec=NAN;
   char *subsecstr=NULL;
-  double *darr, subsec=NAN;
+  char *unit, *name, *comment;
 
   /* Input dataset. */
   gal_data_t *in=arithmetic_stack_pop(stack, operator, NULL);
   char **strarr=in->array;
 
-  /* Output metadata. */
-  char *unit="sec";
-  char *name="UNIXSEC";
-  char *comment="Unix seconds (from 00:00:00 UTC, 1 January 1970)";
-
   /* Make sure the input has a 'string' type. */
   if(in->type!=GAL_TYPE_STRING)
     error(EXIT_FAILURE, 0, "the operand given to 'date-to-sec' "
           "should have a string type, but it is '%s'",
           gal_type_name(in->type, 1));
 
+  /* Output metadata. */
+  switch(operator)
+    {
+    case ARITHMETIC_TABLE_OP_DATETOSEC:
+      unit="sec";
+      name="UNIXSEC";
+      comment="Unix seconds (from 00:00:00 UTC, 1 January 1970)";
+      break;
+    case ARITHMETIC_TABLE_OP_DATETOMILLISEC:
+      unit="msec";
+      name="UNIXMILLISEC";
+      comment="Unix milli-seconds (from 00:00:00 UTC, 1 January 1970)";
+      break;
+    default:
+      error(EXIT_FAILURE, 0, "%s: a bug! Please contact us at %s "
+            "to fix the problem. The operator code %d isn't "
+            "recognized", __func__, PACKAGE_BUGREPORT, operator);
+    }
+
   /* Allocate the output dataset. */
   out=gal_data_alloc(NULL, GAL_TYPE_INT64, 1, &in->size, NULL, 1,
                      p->cp.minmapsize, p->cp.quietmmap, name, unit,
                      comment);
 
-  /* Convert each input string (first try as an integer, assuming no
-     sub-second, or floating point precision).  */
+  /* Convert each input string into number of seconds. */
   iarr=out->array;
   for(i=0; i<in->size; ++i)
     {
+      /* Read the number of seconds and sub-seconds and write into the
+         output. */
       v=gal_fits_key_date_to_seconds(strarr[i], &subsecstr,
                                      &subsec);
-      iarr[i] = v==GAL_BLANK_SIZE_T ? GAL_BLANK_INT64 : v;
-      if(subsecstr) break;
-    }
-
-  /* If a sub-second string was present, then save the output as double
-     precision floating point. */
-  if(subsecstr)
-    {
-      /* Free the initially allocated output (as an integer). */
-      free(subsecstr);
-      gal_data_free(out);
-
-      /* Allocate a double-precision output. */
-      out=gal_data_alloc(NULL, GAL_TYPE_FLOAT64, 1, &in->size, NULL, 1,
-                         p->cp.minmapsize, p->cp.quietmmap, name, unit,
-                         comment);
-
-      /* Convert the date. */
-      darr=out->array;
-      for(i=0; i<in->size; ++i)
-        {
-          subsecstr=NULL;
-          v=gal_fits_key_date_to_seconds(strarr[i], &subsecstr,
-                                               &subsec);
-          darr[i] = v==GAL_BLANK_SIZE_T ? NAN : v;
-          if(subsecstr) { darr[i]+=subsec; free(subsecstr); }
-        }
+      iarr[i] = ( v==GAL_BLANK_SIZE_T
+                  ? GAL_BLANK_INT64
+                  : ( operator == ARITHMETIC_TABLE_OP_DATETOSEC
+                      ? v
+                      : (isnan(subsec)
+                         ? v*1000
+                         : v*1000 + (int64_t)(subsec*1000) ) ) );
     }
 
   /* Clean up and put the resulting calculation back on the stack. */
@@ -839,6 +838,7 @@ arithmetic_operator_run(struct tableparams *p,
           break;
 
         case ARITHMETIC_TABLE_OP_DATETOSEC:
+        case ARITHMETIC_TABLE_OP_DATETOMILLISEC:
           arithmetic_datetosec(p, stack, token->operator);
           break;
 
diff --git a/bin/table/arithmetic.h b/bin/table/arithmetic.h
index c9175361..c1306c5b 100644
--- a/bin/table/arithmetic.h
+++ b/bin/table/arithmetic.h
@@ -40,6 +40,7 @@ enum arithmetic_operators
   ARITHMETIC_TABLE_OP_IMGTOWCS,
   ARITHMETIC_TABLE_OP_DATETOSEC,
   ARITHMETIC_TABLE_OP_DISTANCEFLAT,
+  ARITHMETIC_TABLE_OP_DATETOMILLISEC,
   ARITHMETIC_TABLE_OP_DISTANCEONSPHERE,
 };
 
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index b1e0400e..463f688f 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -11826,11 +11826,12 @@ See the @option{degree-to-ra} for more on the format 
of the output.
 @cindex Unix epoch time
 @cindex Time, Unix epoch
 @cindex Epoch, Unix time
-The popped operand should be a string column in the FITS date format (most 
generally: @code{YYYY-MM-DDThh:mm:ss.ddd...}).
-This operator will return the corresponding Unix epoch time (number of seconds 
that have passed since 00:00:00 Thursday, January 1st, 1970).
-The returned operand will be named @code{UNIXSEC} (short for Unix-seconds).
-If any of the times have sub-second precision, the numeric datatype of the 
column will be 64-bit floating point type.
-Otherwise it will be a 64-bit, signed integer, see @ref{Numeric data types}.
+Return the number of seconds from the Unix epoch time (00:00:00 Thursday, 
January 1st, 1970).
+The input (popped) operand should be a string column in the FITS date format 
(most generally: @code{YYYY-MM-DDThh:mm:ss.ddd...}).
+
+The returned operand will be named @code{UNIXSEC} (short for Unix-seconds) and 
will be a 64-bit, signed integer, see @ref{Numeric data types}.
+If the input string has sub-second precision, it will be ignored because 
floating point numbers cannot accurately store numbers with many significant 
digits.
+To preserve sub-second precision, please use @code{date-to-millisec}.
 
 For example, in the example below we are using this operator, in combination 
with the @option{--keyvalue} option of the Fits program, to sort your desired 
FITS files by observation date (value in the @code{DATE-OBS} keyword in example 
below):
 
@@ -11843,6 +11844,16 @@ $ astfits *.fits --keyvalue=DATE-OBS --colinfoinstdout 
\
 
 If you don't need to see the Unix-seconds any more, you can add a 
@option{-cFILENAME} (short for @option{--column=FILENAME}) at the end.
 For more on @option{--keyvalue}, see @ref{Keyword inspection and manipulation}.
+
+@item date-to-millisec
+Return the number of milli-seconds from the Unix epoch time (00:00:00 
Thursday, January 1st, 1970).
+The input (popped) operand should be a string column in the FITS date format 
(most generally: @code{YYYY-MM-DDThh:mm:ss.ddd...}, where @code{.ddd} is the 
optional sub-second component).
+
+The returned operand will be named @code{UNIXMILLISEC} (short for Unix 
milli-seconds) and will be a 64-bit, signed integer, see @ref{Numeric data 
types}.
+The returned value is not a floating point type because for large numbers, 
floating point datatypes loose single-digit precision (which is important here).
+
+Other than the units of the output, this operator behaves similarly to 
@code{date-to-sec}.
+See the description of that operator for an example.
 @end table
 
 @node Operation precedence in Table, Invoking asttable, Column arithmetic, 
Table



reply via email to

[Prev in Thread] Current Thread [Next in Thread]