gnuastro-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnuastro-commits] master cef3878: Statistics: Sigma-clipping results as


From: Mohammad Akhlaghi
Subject: [gnuastro-commits] master cef3878: Statistics: Sigma-clipping results as single-valued measurements
Date: Wed, 1 May 2019 21:27:58 -0400 (EDT)

branch: master
commit cef38789d6cb679ed4a4d5974962a9fa19ab81be
Author: Mohammad Akhlaghi <address@hidden>
Commit: Mohammad Akhlaghi <address@hidden>

    Statistics: Sigma-clipping results as single-valued measurements
    
    Until now, the only way to get Sigma-clipped results from Statistics was to
    use the `--sigmaclip' option. It prints a lot of information along with the
    final result. This is useful for visual inspection, but makes automatic use
    of the output a little hard.
    
    With this commit, Statistics has four new options to print the
    sigma-clipped results as single numbers: `--sigclip-number',
    `--sigclip-median', `--sigclip-mean', `--sigclip-std'. A fully working
    demonstration was also added in the book (under `--sigclip-median') on how
    to use it to select certain rows of a table.
    
    While doing this a few other issues were addressed:
    
     - I noticed that Table's `--range' option isn't removing blank pixels.
    
     - I also noticed that the `gal_statistics_no_blank_sorted' wasn't taking
       `inplace' into account when the array is already sorted.
    
     - Since this is the first commit, working in the library after the
       previous release, the library soname version was incremented to 8.
    
    This was suggested by Raul Infante-Sainz.
---
 NEWS                         |  7 ++++
 bin/statistics/args.h        | 57 +++++++++++++++++++++++++++++++
 bin/statistics/statistics.c  | 26 ++++++++++++--
 bin/statistics/ui.c          | 16 +++++++++
 bin/statistics/ui.h          |  4 +++
 bin/table/table.c            | 13 +++++--
 configure.ac                 |  2 +-
 doc/announce-acknowledge.txt |  1 +
 doc/gnuastro.texi            | 81 ++++++++++++++++++++++++++++++++++++++++++--
 lib/statistics.c             |  7 ++--
 10 files changed, 200 insertions(+), 14 deletions(-)

diff --git a/NEWS b/NEWS
index 65a2f6f..000053b 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,13 @@ See the end of the file for license conditions.
 
 ** New features
 
+  Statistics:
+   --sigclip-number, --sigclip-median, --sigclip-mean, --sigclip-std: Do
+     sigma-clipping and only print the desired value as a single-value
+     measurement. Until now sigma-clipping results included a lot of
+     visually useful information, which also made automatic usage of
+     results hard. These options fix this issue.
+
 ** Removed features
 
 ** Changed features
diff --git a/bin/statistics/args.h b/bin/statistics/args.h
index 7fbd64b..3eaed38 100644
--- a/bin/statistics/args.h
+++ b/bin/statistics/args.h
@@ -305,6 +305,63 @@ struct argp_option program_options[] =
       GAL_OPTIONS_NOT_SET,
       ui_add_to_single_value
     },
+    {
+      "sigclip-number",
+      UI_KEY_SIGCLIPNUMBER,
+      0,
+      0,
+      "Number of elements after sigma-clipping.",
+      UI_GROUP_SINGLE_VALUE,
+      &p->singlevalue,
+      GAL_OPTIONS_NO_ARG_TYPE,
+      GAL_OPTIONS_RANGE_0_OR_1,
+      GAL_OPTIONS_NOT_MANDATORY,
+      GAL_OPTIONS_NOT_SET,
+      ui_add_to_single_value
+    },
+    {
+      "sigclip-median",
+      UI_KEY_SIGCLIPMEDIAN,
+      0,
+      0,
+      "Sigma-clipped median.",
+      UI_GROUP_SINGLE_VALUE,
+      &p->singlevalue,
+      GAL_OPTIONS_NO_ARG_TYPE,
+      GAL_OPTIONS_RANGE_0_OR_1,
+      GAL_OPTIONS_NOT_MANDATORY,
+      GAL_OPTIONS_NOT_SET,
+      ui_add_to_single_value
+    },
+    {
+      "sigclip-mean",
+      UI_KEY_SIGCLIPMEAN,
+      0,
+      0,
+      "Sigma-clipped mean.",
+      UI_GROUP_SINGLE_VALUE,
+      &p->singlevalue,
+      GAL_OPTIONS_NO_ARG_TYPE,
+      GAL_OPTIONS_RANGE_0_OR_1,
+      GAL_OPTIONS_NOT_MANDATORY,
+      GAL_OPTIONS_NOT_SET,
+      ui_add_to_single_value
+    },
+    {
+      "sigclip-std",
+      UI_KEY_SIGCLIPSTD,
+      0,
+      0,
+      "Sigma-clipped standard deviation.",
+      UI_GROUP_SINGLE_VALUE,
+      &p->singlevalue,
+      GAL_OPTIONS_NO_ARG_TYPE,
+      GAL_OPTIONS_RANGE_0_OR_1,
+      GAL_OPTIONS_NOT_MANDATORY,
+      GAL_OPTIONS_NOT_SET,
+      ui_add_to_single_value
+    },
+
 
 
 
diff --git a/bin/statistics/statistics.c b/bin/statistics/statistics.c
index 333b432..30686d9 100644
--- a/bin/statistics/statistics.c
+++ b/bin/statistics/statistics.c
@@ -95,12 +95,13 @@ statistics_print_one_row(struct statisticsparams *p)
   double arg, *d;
   gal_list_i32_t *tmp;
   size_t dsize=1, counter;
-  gal_data_t *tmpv, *out=NULL, *num=NULL, *min=NULL, *max=NULL;
   gal_data_t *sum=NULL, *med=NULL, *meanstd=NULL, *modearr=NULL;
+  gal_data_t *tmpv, *sclip=NULL, *out=NULL, *num=NULL, *min=NULL, *max=NULL;
 
   /* The user can ask for any of the operators more than once, also some
      operators might return more than one usable value (like mode). So we
-     will calculate the desired values once, and then print them. */
+     will calculate the desired values once, and then print them any number
+     of times. */
   for(tmp=p->singlevalue; tmp!=NULL; tmp=tmp->next)
     switch(tmp->v)
       {
@@ -125,11 +126,21 @@ statistics_print_one_row(struct statisticsparams *p)
       case UI_KEY_MODEQUANT:
       case UI_KEY_MODESYM:
       case UI_KEY_MODESYMVALUE:
-        modearr = ( modearr ? modearr
+        modearr = ( modearr
+                    ? modearr
                     : gal_statistics_mode(p->sorted, p->mirrordist, 0) );
         d=modearr->array;
         if(d[2]<GAL_STATISTICS_MODE_GOOD_SYM) d[0]=d[1]=NAN;
         break;
+      case UI_KEY_SIGCLIPSTD:
+      case UI_KEY_SIGCLIPMEAN:
+      case UI_KEY_SIGCLIPNUMBER:
+      case UI_KEY_SIGCLIPMEDIAN:
+        sclip = ( sclip
+                  ? sclip
+                  : gal_statistics_sigma_clip(p->sorted, p->sclipparams[0],
+                                              p->sclipparams[1], 0, 1) );
+        break;
 
       /* Will be calculated as printed. */
       case UI_KEY_QUANTILE:
@@ -172,6 +183,14 @@ statistics_print_one_row(struct statisticsparams *p)
           out=statistics_pull_out_element(modearr, 2); mustfree=1; break;
         case UI_KEY_MODESYMVALUE:
           out=statistics_pull_out_element(modearr, 3); mustfree=1; break;
+        case UI_KEY_SIGCLIPSTD:
+          out=statistics_pull_out_element(sclip,   3); mustfree=1; break;
+        case UI_KEY_SIGCLIPMEAN:
+          out=statistics_pull_out_element(sclip,   2); mustfree=1; break;
+        case UI_KEY_SIGCLIPMEDIAN:
+          out=statistics_pull_out_element(sclip,   1); mustfree=1; break;
+        case UI_KEY_SIGCLIPNUMBER:
+          out=statistics_pull_out_element(sclip,   0); mustfree=1; break;
 
         /* Not previously calculated. */
         case UI_KEY_QUANTILE:
@@ -213,6 +232,7 @@ statistics_print_one_row(struct statisticsparams *p)
   if(max)     gal_data_free(max);
   if(sum)     gal_data_free(sum);
   if(med)     gal_data_free(med);
+  if(sclip)   gal_data_free(sclip);
   if(meanstd) gal_data_free(meanstd);
   if(modearr) gal_data_free(modearr);
 }
diff --git a/bin/statistics/ui.c b/bin/statistics/ui.c
index df633ae..a85881d 100644
--- a/bin/statistics/ui.c
+++ b/bin/statistics/ui.c
@@ -462,6 +462,18 @@ ui_read_check_only_options(struct statisticsparams *p)
           error(EXIT_FAILURE, 0, "`--mirrordist' is required for the "
                 "mode-related single measurements (`--mode', `--modequant', "
                 "`--modesym', and `--modesymvalue')");
+        break;
+      case UI_KEY_SIGCLIPSTD:
+      case UI_KEY_SIGCLIPMEAN:
+      case UI_KEY_SIGCLIPNUMBER:
+      case UI_KEY_SIGCLIPMEDIAN:
+        if( isnan(p->sclipparams[0]) )
+          error(EXIT_FAILURE, 0, "`--sclipparams' is necessary with "
+                "sigma-clipping measurements.\n\n"
+                "`--sclipparams' takes two values (separated by a comma) for "
+                "defining the sigma-clip: the multiple of sigma, and tolerance 
"
+                "(<1) or number of clips (>1).");
+        break;
       }
 
 
@@ -667,6 +679,10 @@ ui_make_sorted_if_necessary(struct statisticsparams *p)
       case UI_KEY_MEDIAN:
       case UI_KEY_QUANTILE:
       case UI_KEY_QUANTFUNC:
+      case UI_KEY_SIGCLIPSTD:
+      case UI_KEY_SIGCLIPMEAN:
+      case UI_KEY_SIGCLIPNUMBER:
+      case UI_KEY_SIGCLIPMEDIAN:
         is_necessary=1;
         break;
       }
diff --git a/bin/statistics/ui.h b/bin/statistics/ui.h
index de4782c..8b2422b 100644
--- a/bin/statistics/ui.h
+++ b/bin/statistics/ui.h
@@ -100,6 +100,10 @@ enum option_keys_enum
   UI_KEY_CHECKSKY,
   UI_KEY_IGNOREBLANKINTILES,
   UI_KEY_SCLIPPARAMS,
+  UI_KEY_SIGCLIPNUMBER,
+  UI_KEY_SIGCLIPMEDIAN,
+  UI_KEY_SIGCLIPMEAN,
+  UI_KEY_SIGCLIPSTD,
 };
 
 
diff --git a/bin/table/table.c b/bin/table/table.c
index 7577064..af5fd1d 100644
--- a/bin/table/table.c
+++ b/bin/table/table.c
@@ -78,7 +78,7 @@ table_range(struct tableparams *p)
   double *rarr;
   gal_data_t *mask;
   struct list_range *tmp;
-  gal_data_t *ref, *perm, *range;
+  gal_data_t *ref, *perm, *range, *blmask;
   size_t i, g, b, *s, *sf, one=1, ngood=0;
   gal_data_t *min, *max, *ltmin, *gemax, *sum;
 
@@ -107,11 +107,18 @@ table_range(struct tableparams *p)
       /* Set the reference column to read values from. */
       ref=tmp->v;
 
-      /* Find all the bad elements (smaller than the minimum and larger than
-         the maximum) so we can flag them. */
+      /* Find all the bad elements (smaller than the minimum, larger than
+         the maximum or blank) so we can flag them. */
       ltmin=gal_arithmetic(GAL_ARITHMETIC_OP_LT, 1, numok,   ref,   min);
       gemax=gal_arithmetic(GAL_ARITHMETIC_OP_GE, 1, numok,   ref,   max);
+      blmask = ( gal_blank_present(ref, 1)
+                 ? gal_arithmetic(GAL_ARITHMETIC_OP_ISBLANK, 1, 0, ref)
+                 : NULL );
+
+      /* Merge all the flags into one array. */
       ltmin=gal_arithmetic(GAL_ARITHMETIC_OP_OR, 1, inplace, ltmin, gemax);
+      if(blmask)
+        ltmin=gal_arithmetic(GAL_ARITHMETIC_OP_OR, 1, inplace, ltmin, blmask);
 
       /* Add these flags to all previous flags. */
       mask=gal_arithmetic(GAL_ARITHMETIC_OP_OR, 1, inplace, mask, ltmin);
diff --git a/configure.ac b/configure.ac
index ee6654a..adccad8 100644
--- a/configure.ac
+++ b/configure.ac
@@ -50,7 +50,7 @@ AC_CONFIG_MACRO_DIRS([bootstrapped/m4])
 
 # Library version, see the GNU Libtool manual ("Library interface versions"
 # section for the exact definition of each) for
-GAL_CURRENT=7
+GAL_CURRENT=8
 GAL_REVISION=0
 GAL_AGE=0
 GAL_LT_VERSION="${GAL_CURRENT}:${GAL_REVISION}:${GAL_AGE}"
diff --git a/doc/announce-acknowledge.txt b/doc/announce-acknowledge.txt
index 97a8052..dff317d 100644
--- a/doc/announce-acknowledge.txt
+++ b/doc/announce-acknowledge.txt
@@ -1,5 +1,6 @@
 Alphabetically ordered list to acknowledge in the next release.
 
+Raul Infante-Sainz
 David Valls-Gabaud
 
 
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index 4cf422d..bcd7a13 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -11722,6 +11722,9 @@ The chosen column doesn't have to be in the output 
columns. This is good
 when you just want to select using one column's values, but don't need that
 column anymore afterwards.
 
+For one example of using this option, see the example under
address@hidden in @ref{Invoking aststatistics}.
+
 @item -s STR
 @item --sort=STR
 Sort the output rows based on the values in the @code{STR} column (can be a
@@ -15752,11 +15755,11 @@ iteration:
 @itemize
 @item
 When a certain number of iterations has taken place (second value to the
address@hidden option is larger than 1).
address@hidden option is larger than 1).
 @item
 When the new measured standard deviation is within a certain tolerance
-level of the old one (second value to the @option{--sigclip} option is less
-than 1). The tolerance level is defined by:
+level of the old one (second value to the @option{--sclipparams} option is
+less than 1). The tolerance level is defined by:
 
 @dispmath{\sigma_{old}-\sigma_{new} \over \sigma_{new}}
 
@@ -16400,6 +16403,78 @@ distributions are no longer symmetric, see 
@option{--mode} and Appendix C
 of @url{https://arxiv.org/abs/1505.01664, Akhlaghi and Ichikawa 2015} for
 more.
 
address@hidden --sigclip-number
+Number of elements after applying @mymath{\sigma}-clipping (see @ref{Sigma
+clipping}). @mymath{\sigma}-clipping configuration is done with the
address@hidden option.
+
address@hidden --sigclip-median
+Median after applying @mymath{\sigma}-clipping (see @ref{Sigma
+clipping}). @mymath{\sigma}-clipping configuration is done with the
address@hidden option.
+
+Here is one scenario where this can be useful: assume you have a table and
+you would like to only select the rows that are within the sigma-clipping
+range. Let's assume your table is called @file{table.fits} and you want to
+only keep the rows that have a value in @code{COLUMN} within the
address@hidden range (to @mymath{3\sigma}, with a tolerance of
+0.1). This command will return the sigma-clipped median and standard
+deviation (used to define the range later).
+
address@hidden
+$ aststatistics table.fits -cCOLUMN --sclipparams=3,0.1 \
+                --sigclip-median --sigclip-std
address@hidden example
+
address@hidden GNU AWK
+You can then use the @option{--range} option of Table (see @ref{Table}) to
+select the proper rows. But for that, you need the actual starting and
+ending values of the range (@mymath{\rm{median}\pm s\times\sigma}). So
+these raw numbers alone aren't enough.
+
+To get starting and ending of the range (and put a address@hidden,}' between 
them,
+ready to be used in @option{--range}), pipe the result into AWK. But in
+AWK, we'll also need the multiple of @mymath{\sigma}, so we'll define it as
+a shell variable (@code{s}) before calling Statistics (note how @code{$s}
+is used two times now):
+
address@hidden
+$ s=3
+$ aststatistics table.fits -cCOLUMN --sclipparams=$s,0.1 \
+                --sigclip-median --sigclip-std           \
+     | awk '@{s='$s'; printf("%f,%f\n", $1-s*$2, $1+s*$2)@}'
address@hidden example
+
+We'll just need to save the printed value from the command above in another
+shell variable (@code{r}), not print it. In Bash, can do this by putting
+the whole statement within a @code{$()}:
+
address@hidden
+$ s=3
+$ r=$(aststatistics table.fits -cCOLUMN --sclipparams=$s,0.1 \
+                    --sigclip-median --sigclip-std           \
+        | awk '@{s='$s'; printf("%f,%f\n", $1-s*$2, $1+s*$2)@}')
+$ echo $r      # Just to confirm.
address@hidden example
+
+Now you can use Table with the @option{--range} option to only print the
+rows that have a value in @code{COLUMN} within the desired range:
+
address@hidden
+$ asttable table.fits --range=COLUMN,$r
address@hidden example
+
+
address@hidden --sigclip-mean
+Mean after applying @mymath{\sigma}-clipping (see @ref{Sigma
+clipping}). @mymath{\sigma}-clipping configuration is done with the
address@hidden option.
+
address@hidden --sigclip-std
+Standard deviation after applying @mymath{\sigma}-clipping (see @ref{Sigma
+clipping}). @mymath{\sigma}-clipping configuration is done with the
address@hidden option.
+
 @end table
 
 The list of options below are for those statistical operations that output
diff --git a/lib/statistics.c b/lib/statistics.c
index bbf1c56..5cab9b6 100644
--- a/lib/statistics.c
+++ b/lib/statistics.c
@@ -1398,13 +1398,12 @@ gal_statistics_no_blank_sorted(gal_data_t *input, int 
inplace)
         }
       else noblank=contig;
 
-      /* If there are any non-blank elements, make sure the array is
-         sorted. After this step, we won't be dealing with `noblank' any
-         more but with `sorted'. */
+      /* Make sure the array is sorted. After this step, we won't be
+         dealing with `noblank' any more but with `sorted'. */
       if(noblank->size)
         {
           if( gal_statistics_is_sorted(noblank, 1) )
-            sorted=noblank;
+            sorted = inplace ? noblank : gal_data_copy(noblank);
           else
             {
               if(inplace) sorted=noblank;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]