gnuastro-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnuastro-commits] master 209a1eb 1/2: Statistics: new option to show qu


From: Mohammad Akhlaghi
Subject: [gnuastro-commits] master 209a1eb 1/2: Statistics: new option to show quantile of the mean
Date: Sat, 31 Jul 2021 21:48:20 -0400 (EDT)

branch: master
commit 209a1eb311caa09968ea6b3ffe009d74d846ef4c
Author: Mohammad Akhlaghi <mohammad@akhlaghi.org>
Commit: Mohammad Akhlaghi <mohammad@akhlaghi.org>

    Statistics: new option to show quantile of the mean
    
    The quantile of the mean is a very good measure of skewness in a
    distribution. However, until now there was no simple option in the
    Statistics program to measure it: you needed to first ask for the mean in
    one call to statistics, then use the '--quantfunc' option in a second call.
    
    With this function, statistics now has the '--quantofmean' option to easily
    let the users easily find the quantile of the mean in the middle of other
    measures they may need, all in one command.
---
 NEWS                        |  5 +++++
 bin/statistics/args.h       | 14 ++++++++++++++
 bin/statistics/statistics.c | 19 ++++++++++++++++++-
 bin/statistics/ui.c         |  1 +
 bin/statistics/ui.h         |  1 +
 doc/gnuastro.texi           | 15 +++++++++++++++
 6 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index 7e6e6c4..3766433 100644
--- a/NEWS
+++ b/NEWS
@@ -62,6 +62,11 @@ See the end of the file for license conditions.
      pipes (standard input). This is not for inputs to the script (which
      should always be files), but for the script's internal pipes.
 
+  Statistics:
+   --quantofmean: the quantile of the mean of the input dataset. this is a
+     very good statistic to measure skewness in a distribution, see the
+     description of this option in the book for more.
+
   Library:
    - Arithmetic macros:
      - GAL_ARITHMETIC_OP_BOX_AROUND_ELLIPSE
diff --git a/bin/statistics/args.h b/bin/statistics/args.h
index 2abc538..5e568a2 100644
--- a/bin/statistics/args.h
+++ b/bin/statistics/args.h
@@ -252,6 +252,20 @@ struct argp_option program_options[] =
       ui_add_to_single_value
     },
     {
+      "quantofmean",
+      UI_KEY_QUANTOFMEAN,
+      0,
+      0,
+      "Quantile of the mean.",
+      UI_GROUP_SINGLE_VALUE,
+      &p->singlevalue,
+      GAL_OPTIONS_NO_ARG_TYPE,
+      GAL_OPTIONS_RANGE_0_OR_1,
+      GAL_OPTIONS_NOT_MANDATORY,
+      GAL_OPTIONS_NOT_SET,
+      ui_add_to_single_value
+    },
+    {
       "mode",
       UI_KEY_MODE,
       0,
diff --git a/bin/statistics/statistics.c b/bin/statistics/statistics.c
index 437928b..2db8113 100644
--- a/bin/statistics/statistics.c
+++ b/bin/statistics/statistics.c
@@ -120,8 +120,9 @@ statistics_print_one_row(struct statisticsparams *p)
         sum = sum ? sum : gal_statistics_sum(p->input);              break;
       case UI_KEY_MEDIAN:
         med = med ? med : gal_statistics_median(p->sorted, 0); break;
-      case UI_KEY_MEAN:
       case UI_KEY_STD:
+      case UI_KEY_MEAN:
+      case UI_KEY_QUANTOFMEAN:
         meanstd = meanstd ? meanstd : gal_statistics_mean_std(p->input);
         break;
       case UI_KEY_MODE:
@@ -196,18 +197,34 @@ statistics_print_one_row(struct statisticsparams *p)
 
         /* Not previously calculated. */
         case UI_KEY_QUANTILE:
+          mustfree=1;
           arg = statistics_read_check_args(p);
           out = gal_statistics_quantile(p->sorted, arg, 0);
           break;
 
         case UI_KEY_QUANTFUNC:
+          mustfree=1;
           arg = statistics_read_check_args(p);
           tmpv = gal_data_alloc(NULL, GAL_TYPE_FLOAT64, 1, &dsize,
                                 NULL, 1, -1, 1, NULL, NULL, NULL);
           *((double *)(tmpv->array)) = arg;
           tmpv = gal_data_copy_to_new_type_free(tmpv, p->input->type);
           out = gal_statistics_quantile_function(p->sorted, tmpv, 0);
+          gal_data_free(tmpv);
           break;
+
+        case UI_KEY_QUANTOFMEAN:
+          mustfree=1;
+          tmpv=statistics_pull_out_element(meanstd, 0);
+          out = gal_statistics_quantile_function(p->sorted, tmpv, 0);
+          gal_data_free(tmpv);
+          break;
+
+        /* The option isn't recognized. */
+        default:
+          error(EXIT_FAILURE, 0, "%s: a bug! Please contact us at %s so we "
+                "can address the problem. Operation code %d not recognized",
+                __func__, PACKAGE_BUGREPORT, tmp->v);
         }
 
       /* Print the number. Note that we don't want any extra white space
diff --git a/bin/statistics/ui.c b/bin/statistics/ui.c
index 72944fe..62e387e 100644
--- a/bin/statistics/ui.c
+++ b/bin/statistics/ui.c
@@ -738,6 +738,7 @@ ui_make_sorted_if_necessary(struct statisticsparams *p)
       case UI_KEY_QUANTILE:
       case UI_KEY_QUANTFUNC:
       case UI_KEY_SIGCLIPSTD:
+      case UI_KEY_QUANTOFMEAN:
       case UI_KEY_SIGCLIPMEAN:
       case UI_KEY_SIGCLIPNUMBER:
       case UI_KEY_SIGCLIPMEDIAN:
diff --git a/bin/statistics/ui.h b/bin/statistics/ui.h
index 22a3969..40716d9 100644
--- a/bin/statistics/ui.h
+++ b/bin/statistics/ui.h
@@ -82,6 +82,7 @@ enum option_keys_enum
   UI_KEY_MODESYM,
   UI_KEY_MODESYMVALUE,
   UI_KEY_QUANTFUNC,
+  UI_KEY_QUANTOFMEAN,
   UI_KEY_ASCIICFP,
   UI_KEY_HISTOGRAM2D,
   UI_KEY_MIRROR,
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index 50b4958..d929e4e 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -16266,6 +16266,21 @@ Formally it is known as the ``Quantile function''.
 
 Since the dataset is not continuous this function will find the nearest 
element of the dataset and use its position to estimate the quantile function.
 
+@item --quantofmean
+@cindex Quantile of the mean
+Print the quantile of the mean in the dataset.
+This is a very good measure of detecting skewness or outliers.
+The concept is used by programs like NoiseChisel to identify the presence of 
signal in a tile of the image (because signal in noise causes skewness).
+
+For example, take this simple array: @code{1 2 20 4 5 6 3}.
+The mean is @code{5.85}.
+The nearest element to this mean is @code{6} and the quantile of @code{6} in 
this distribution is 0.8333.
+Here is how we got to this: in the sorted dataset (@code{1 2 3 4 5 6 20}), 
@code{6} is the 5-th element (counting from zero, since a quantile of zero 
corresponds to the minimum, by definition) and the maximum is the 6-th element 
(again, counting from zero).
+So the quantile of the mean in this case is @mymath{5/6=0.8333}.
+
+In the example above, if we had @code{7} instead of @code{20} (which was an 
outlier), then the mean would be @code{4} and the quantile of the mean would be 
0.5 (which by definition, is the quantile of the median), showing no outliers.
+As the number of elements increases, the mean itself is less affected by a 
small number of outliers, but skewness can be nicely identified by the quantile 
of the mean.
+
 @item -O
 @itemx --mode
 Print the mode of all used elements.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]