gnuastro-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnuastro-commits] master e57f8e8: Ability to set window size in gal_sta


From: Mohammad Akhlaghi
Subject: [gnuastro-commits] master e57f8e8: Ability to set window size in gal_statistics_outlier_positive
Date: Fri, 14 Dec 2018 22:49:48 -0500 (EST)

branch: master
commit e57f8e8fbd154950e2a34782d0551c54c67d4110
Author: Mohammad Akhlaghi <address@hidden>
Commit: Mohammad Akhlaghi <address@hidden>

    Ability to set window size in gal_statistics_outlier_positive
    
    Until now, this function would fix the window size to half the size of the
    input dataset. But that was too restrictive (in cases where the outlier
    break is before the median). With this commit, the user can set the window
    size as a new argument.
---
 NEWS                      |  9 +++++----
 doc/gnuastro.texi         | 36 +++++++++++++++++++-----------------
 lib/gnuastro/statistics.h |  6 +++---
 lib/statistics.c          | 18 +++++++++---------
 lib/tile-internal.c       | 15 +++++++++++----
 5 files changed, 47 insertions(+), 37 deletions(-)

diff --git a/NEWS b/NEWS
index 5341eda..1043f92 100644
--- a/NEWS
+++ b/NEWS
@@ -107,14 +107,15 @@ GNU Astronomy Utilities NEWS                          -*- 
outline -*-
     --meanmedqdiff: new name for `--modmedqdiff'. Similar to NoiseChisel.
 
   Library:
-    - gal_data_copy_to_allocated: Also copies string metadata (e.g., name).
-    - gal_fits_key_write: filename and hdu instead of FITS pointer.
-    - gal_fits_key_write_version: filename and hdu instead of FITS pointer.
-    - gal_fits_key_write_filename: write at the top or end of the input list.
     - gal_array_read: list of strings (from standard input) acceptable.
     - gal_array_read_to_type: list of strings (from stin) acceptable.
     - gal_array_read_one_ch: list of strings (from stdin) acceptable.
     - gal_array_read_one_ch_to_type: list of strings (from stdin) acceptable.
+    - gal_data_copy_to_allocated: Also copies string metadata (e.g., name).
+    - gal_fits_key_write: filename and hdu instead of FITS pointer.
+    - gal_fits_key_write_version: filename and hdu instead of FITS pointer.
+    - gal_fits_key_write_filename: write at the top or end of the input list.
+    - gal_statistics_outlier_positive: Window-size is now adjustable (new 
argument).
     - gal_table_info: list of strings (from stdin) acceptable.
     - gal_table_read: list of strings (from stdin) acceptable.
     - gal_txt_table_info: list of strings (from stdin) acceptable.
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index b3605f3..a4d4e01 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -27977,7 +27977,7 @@ above.
 @end deftypefun
 
 
address@hidden {gal_data_t *} gal_statistics_outlier_positive (gal_data_t 
@code{*input}, float @code{sigma}, float @code{sigclip_multip}, float 
@code{sigclip_param}, int @code{inplace}, int @code{quiet})
address@hidden {gal_data_t *} gal_statistics_outlier_positive (gal_data_t 
@code{*input}, size_t @code{window_size}, float @code{sigma}, float 
@code{sigclip_multip}, float @code{sigclip_param}, int @code{inplace}, int 
@code{quiet})
 Find the first positive outlier in the @code{input} distribution. The
 returned dataset contains a single element: the first positive outlier. It
 is one of the dataset's elements, in the same type as the input. If the
@@ -27985,23 +27985,22 @@ process fails for any reason (for example no outlier 
was found), a
 @code{NULL} pointer will be returned.
 
 All (possibly existing) blank elements are first removed from the input
-dataset, then it is sorted. A sliding window equal to half the width of the
-dataset elements is parsed over the dataset. Starting from the middle of
-the dataset (median) in the direction of increasing values. This window is
-used as a reference. The first element where the distance to the previous
+dataset, then it is sorted. A sliding window of @code{window_size} elements
+is parsed over the dataset. Starting from the @code{window_size}-th element
+of the dataset, in the direction of increasing values. This window is used
+as a reference. The first element where the distance to the previous
 (sorted) element is @code{sigma} units away from the distribution of
 distances in its window is considered an outlier and returned by this
 function.
 
 Formally, if we assume there are @mymath{N} non-blank elements. They are
-first sorted. Searching for the outlier starts on element @mymath{N/2}
-(integer division). Let's take @mymath{v_i} to be the @mymath{i}-th element
-of the sorted input (with no blank values) and @mymath{m} and
address@hidden as the @mymath{\sigma}-clipped median and standard
-deviation from the distances of the previous @mymath{N/2} elements (not
-including @mymath{v_i}). If the value given to @code{sigma} is displayed
-with @mymath{s}, the @mymath{i}-th element is considered as an outlier when
-the condition below is true.
+first sorted. Searching for the outlier starts on element @mymath{W}. Let's
+take @mymath{v_i} to be the @mymath{i}-th element of the sorted input (with
+no blank values) and @mymath{m} and @mymath{\sigma} as the
address@hidden median and standard deviation from the distances of
+the previous @mymath{W} elements (not including @mymath{v_i}). If the value
+given to @code{sigma} is displayed with @mymath{s}, the @mymath{i}-th
+element is considered as an outlier when the condition below is true.
 
 @dispmath{{(v_i-v_{i-1})-m\over \sigma}>s}
 
@@ -28017,10 +28016,13 @@ done within the input dataset's allocated space. 
Otherwise, this function
 will internally allocate (and later free) the necessary space to keep the
 intermediate space that this process requires.
 
-If @code{quiet!=0}, this function will report a warning when the moving
-window size is not much smaller than the number of non-blank elements. In
-such cases, the outlier estimation may be wrong: the outlier point may lie
-before the first checked element.
+If @code{quiet!=0}, this function will report the parameters every time it
+moves the window as a separate line with several columns. The first column
+is the value, the second (in square brackets) is the sorted index, the
+third is the distance of this element from the previous one. The Fourth and
+fifth (in parenthesis) are the median and standard deviation of the
+sigma-clipped distribution within the window and the last column is the
+difference between the third and fourth, divided by the fifth.
 
 @end deftypefun
 
diff --git a/lib/gnuastro/statistics.h b/lib/gnuastro/statistics.h
index 2b84756..ec3ecd2 100644
--- a/lib/gnuastro/statistics.h
+++ b/lib/gnuastro/statistics.h
@@ -174,9 +174,9 @@ gal_statistics_sigma_clip(gal_data_t *input, float multip, 
float param,
                           int inplace, int quiet);
 
 gal_data_t *
-gal_statistics_outlier_positive(gal_data_t *input, float sigma,
-                                float sigclip_multip, float sigclip_param,
-                                int inplace, int quiet);
+gal_statistics_outlier_positive(gal_data_t *input, size_t window_size,
+                                float sigma, float sigclip_multip,
+                                float sigclip_param, int inplace, int quiet);
 
 
 
diff --git a/lib/statistics.c b/lib/statistics.c
index 11a04d1..dbe0326 100644
--- a/lib/statistics.c
+++ b/lib/statistics.c
@@ -2097,7 +2097,7 @@ gal_statistics_sigma_clip(gal_data_t *input, float 
multip, float param,
 /* Find the first outlier in a distribution. */
 #define OUTLIER_BYTYPE(IT) {                                            \
     IT *arr=nbs->array;                                                 \
-    for(i=nbs->size/2;i<nbs->size;++i)                                  \
+    for(i=window_size;i<nbs->size;++i)                                  \
       {                                                                 \
         /* Fill in the distance array. */                               \
         for(j=0; j<wtakeone; ++j)                                       \
@@ -2109,9 +2109,10 @@ gal_statistics_sigma_clip(gal_data_t *input, float 
multip, float param,
         sarr=sclip->array;                                              \
                                                                         \
         /* For a check */                                               \
-        /*printf("%f: %f (%f, %f) %f\n", (float)(arr[i]),          */   \
-        /*       (float)(arr[i]-arr[i-1]), sarr[1], sarr[3],       */   \
-        /*       (((double)(arr[i]-arr[i-1])) - sarr[1])/sarr[3]); */  \
+        if(quiet==0)                                                    \
+          printf("%f [%zu]: %f (%f, %f) %f\n", (float)(arr[i]), i,      \
+                 (float)(arr[i]-arr[i-1]), sarr[1], sarr[3],            \
+                 (((double)(arr[i]-arr[i-1])) - sarr[1])/sarr[3]);      \
                                                                         \
         /* Terminate the loop if the dist. is larger than requested. */ \
         /* This shows we have reached the first outlier's position. */  \
@@ -2132,18 +2133,17 @@ gal_statistics_sigma_clip(gal_data_t *input, float 
multip, float param,
       }                                                                 \
   }
 gal_data_t *
-gal_statistics_outlier_positive(gal_data_t *input, float sigma,
-                                float sigclip_multip, float sigclip_param,
-                                int inplace, int quiet)
+gal_statistics_outlier_positive(gal_data_t *input, size_t window_size,
+                                float sigma, float sigclip_multip,
+                                float sigclip_param, int inplace, int quiet)
 {
   float *sarr;
   double *darr;
-  size_t i, j, one=1, wtakeone, window_size;
+  size_t i, j, one=1, wtakeone;
   gal_data_t *dist, *sclip, *nbs, *out=NULL;
 
   /* Remove all blanks and sort the dataset. */
   nbs=gal_statistics_no_blank_sorted(input, inplace);
-  window_size=nbs->size/2;
 
   /* Only continue if the window size is more than 2 elements (out
      "outlier" is hard to define on smaller datasets). */
diff --git a/lib/tile-internal.c b/lib/tile-internal.c
index 114b02d..7e51fe2 100644
--- a/lib/tile-internal.c
+++ b/lib/tile-internal.c
@@ -44,7 +44,7 @@ tileinternal_no_outlier_work(gal_data_t *first, gal_data_t 
*second,
                              size_t tottilesinch, double *outliersclip,
                              float outliersigma)
 {
-  gal_data_t *outlier;
+  gal_data_t *outlier, *nbs;
   size_t i, osize=first->size;
   size_t start=tottilesinch*channelid;
   float *oa1=NULL, *oa2=NULL, *oa3=NULL;
@@ -82,9 +82,11 @@ tileinternal_no_outlier_work(gal_data_t *first, gal_data_t 
*second,
   /* Find the quantile and remove all tiles that are more than it in the
      first array. */
   arr1=first->array;
-  outlier=gal_statistics_outlier_positive(first, outliersigma,
+  nbs=gal_statistics_no_blank_sorted(first, 0);
+  outlier=gal_statistics_outlier_positive(nbs, nbs->size/2, outliersigma,
                                           outliersclip[0], outliersclip[1],
                                           0, 1);
+  gal_data_free(nbs);
   if(outlier)
     {
       o = *((float *)(outlier->array));
@@ -101,9 +103,11 @@ tileinternal_no_outlier_work(gal_data_t *first, gal_data_t 
*second,
      on each dataset to later remove any tile that is blank in atleast one
      of them. */
   arr2=second->array;
-  outlier=gal_statistics_outlier_positive(second, outliersigma,
+  nbs=gal_statistics_no_blank_sorted(second, 0);
+  outlier=gal_statistics_outlier_positive(nbs, nbs->size, outliersigma,
                                           outliersclip[0], outliersclip[1],
                                           0, 0);
+  gal_data_free(nbs);
   if(outlier)
     {
       o = *((float *)(outlier->array));
@@ -116,9 +120,12 @@ tileinternal_no_outlier_work(gal_data_t *first, gal_data_t 
*second,
   if(third)
     {
       arr3=third->array;
-      outlier=gal_statistics_outlier_positive(third, outliersigma,
+      nbs=gal_statistics_no_blank_sorted(third, 0);
+      outlier=gal_statistics_outlier_positive(nbs, nbs->size/2,
+                                              outliersigma,
                                               outliersclip[0],
                                               outliersclip[1], 0, 0);
+      gal_data_free(nbs);
       if(outlier)
         {
           o = *((float *)(outlier->array));



reply via email to

[Prev in Thread] Current Thread [Next in Thread]