gnuastro-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnuastro-commits] master fc96718: Dimensions and Statistics libraries n


From: Mohammad Akhlaghi
Subject: [gnuastro-commits] master fc96718: Dimensions and Statistics libraries now documented
Date: Mon, 1 May 2017 21:48:19 -0400 (EDT)

branch: master
commit fc96718f50a025abc6137de97cb2759f4f8dbf3c
Author: Mohammad Akhlaghi <address@hidden>
Commit: Mohammad Akhlaghi <address@hidden>

    Dimensions and Statistics libraries now documented
    
    These two headers are now fully documented in the manual. An new library
    demo was also added to show how to use `GAL_DIMENSION_NEIGHBOR_OP'. In the
    process some argument names were also corrected (mainly for easier reading
    and correspondance with the manual), for example `ind' was changed to
    `index' and `data' was changed to `input'.
---
 doc/gnuastro.texi         | 469 ++++++++++++++++++++++++++++++++++------------
 lib/dimension.c           |  12 +-
 lib/gnuastro/dimension.h  |   6 +-
 lib/gnuastro/statistics.h |  12 +-
 lib/statistics.c          |  83 ++++----
 5 files changed, 409 insertions(+), 173 deletions(-)

diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index c3ff0f4..92fbe12 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -588,7 +588,8 @@ Multithreaded programming (@file{threads.h})
 
 Library demo programs
 
-* Library demo - reading a image::
+* Library demo - reading a image::  Read a FITS image into memory.
+* Library demo - inspecting neighbors::  Inspect the neighbors of a pixel.
 
 Developing
 
@@ -17714,13 +17715,69 @@ convert an index into a coordinate and vice-versa 
along with several other
 issues for example issues with the neighbors of an element in a
 multi-dimensional context.
 
address@hidden size_t gal_dimension_total_size (size_t @code{ndim}, size_t 
@code{*dsize})
+Return the total number of elements for a dataset with @code{ndim}
+dimensions that has @code{dsize} elements along each dimension.
address@hidden deftypefun
+
address@hidden {size_t *} gal_dimension_increment (size_t @code{ndim}, size_t 
@code{*dsize})
+Return an allocated array that has the number of elements necessary to
+increment an index along every dimension. For example along the fastest
+dimension (last element in the @code{dsize} and returned arrays), the value
+is @code{1} (one).
address@hidden deftypefun
+
address@hidden size_t gal_dimension_num_neighbors (size_t @code{ndim})
+The maximum number of neighbors (any connectivity) that a data element can
+have in @code{ndim} dimensions. Effectively, this function just returns
address@hidden (where @mymath{n} is the number of dimensions).
address@hidden deftypefun
+
address@hidden {Function-like macro} GAL_DIMENSION_FLT_TO_INT (@code{FLT})
+Calculate the integer pixel position that the floating point @code{FLT}
+number belongs to. In the FITS format (and thus in Gnuastro), the center of
+each pixel is allocated on an integer (not it edge), so the pixel which
+hosts a floating point number cannot simply be found with internal type
+conversion.
address@hidden deffn
+
address@hidden void gal_dimension_add_coords (size_t @code{*c1}, size_t 
@code{*c2}, size_t @code{*out}, size_t @code{ndim})
+For every dimension, add the coordinates in @code{c1} with @code{c2} and
+put the result into @code{out}. In other words, for dimension @code{i} run
address@hidden;}. Hence @code{out} may be equal to any one of
address@hidden or @code{c2}.
address@hidden deftypefun
+
address@hidden size_t gal_dimension_coord_to_index (size_t @code{ndim}, size_t 
@code{*dsize}, size_t @code{*coord})
+Return the index (counting from zero) from the coordinates in @code{coord}
+(counting from zero) assuming the dataset has @code{ndim} elements and the
+size of the dataset along each dimension is in the @code{dsize} array.
address@hidden deftypefun
+
address@hidden void gal_dimension_index_to_coord (size_t @code{index}, size_t 
@code{ndim}, size_t @code{*dsize}, size_t @code{*coord})
+Fill in the @code{coord} array with the coordinates that correspond to
address@hidden assuming the dataset has @code{ndim} elements and the size of
+the dataset along each dimension is in the @code{dsize} array. Note that
+both @code{index} and each value in @code{coord} are assumed to start from
address@hidden (zero). Also that the space which @code{coord} points to must
+already be allocated before calling this function.
address@hidden deftypefun
+
address@hidden size_t gal_dimension_dist_manhattan (size_t @code{*a}, size_t 
@code{*b}, size_t @code{ndim})
+Return the manhattan distance (see
address@hidden://en.wikipedia.org/wiki/Taxicab_geometry, Wikipedia}) between
+the two coordinates @code{a} and @code{b} (each an array of @code{ndim}
+elements).
address@hidden deftypefun
+
 @deffn {Function-like macro} GAL_DIMENSION_NEIGHBOR_OP (@code{index}, 
@code{ndim}, @code{dsize}, @code{connectivity}, @code{dinc}, @code{operation})
 Parse the neighbors of the element located at @code{index} and do the
 requested operation on them. This is defined as a macro to allow easy
 definition of any operation on the neighbors of a given element without
 having to use loops within your source code (the loops are implemented by
-this macro). The input arguments to this function-like macro are described
-below:
+this macro). For an example of using this function, please see @ref{Library
+demo - inspecting neighbors}. The input arguments to this function-like
+macro are described below:
 
 @table @code
 @item index
@@ -17760,8 +17817,11 @@ provide you a @code{nind} variable that can be used as 
the index of the
 neighbor that is currently being studied. It is defined as address@hidden
 ndim;}'. Note that @code{operation} will be repeated the number of times
 there is a neighbor for this element.
-
 @end table
+
+This macro works fully within its own @address@hidden@}} block and except for 
the
address@hidden variable that shows the neighbor's index, all the variables
+within this macro's block start with @code{gdn_}.
 @end deffn
 
 @node Linked lists, FITS files, Dimensions, Gnuastro library
@@ -19631,148 +19691,281 @@ The output will be: @code{2, 0, 1, 3}.
 @node Statistical operations, Multithreaded programming, Qsort functions, 
Gnuastro library
 @subsection Statistical operations (@file{statistics.h})
 
-This header includes a large variety of very basic statistical operators
-for various data types. Please note that we expect to greatly simplify the
-functions here for easier and more general usage in future
-releases. Ideally, they should be completely removed and the GNU Scientific
-Library's functions should be used. Since these functions are used in
-various parts of Gnuastro for multiple purposes, in this first library
-release (Gnuastro 0.2), there might be parallels, or non-homogeneous
-arguments. Such situations arise here because of the history of Gnuastro:
-the libraries grew out of the programs, so it will take a little while to
-correct.
-
address@hidden Structure GAL_STATISTICS_MAX_SIG_CLIP_CONVERGE
-The maximum number of times to try for @mymath{\sigma}-clipping (see
address@hidden clipping}).
+After reading a dataset into memory from a file or fully simulating it with
+another process, the most common processes that will be done on it are
+statistical operations to let you quantify different aspects of the
+data. the functions in this section describe Gnuastro's current set of
+tools for this job. All these functions can work on any numeric data type
+natively (see @ref{Numeric data types}) and can also work on tiles over a
+dataset. Hence the inputs and outputs are in Gnuastro's @ref{Generic data
+container}.
+
address@hidden  Macro GAL_STATISTICS_SIG_CLIP_MAX_CONVERGE
+The maximum number of clips, when @mymath{\sigma}-clipping should be done
+by convergence. If the clipping does not converge before making this many
+clips, all sigma-clipping outputs will be NaN.
address@hidden deffn
+
address@hidden  Macro GAL_STATISTICS_MODE_GOOD_SYM
+The minimum acceptable symmetricity of the mode calculation. If the
+symmetricity of the derived mode is less than this value, all the returned
+values by @code{gal_statistics_mode} will have a value of NaN.
 @end deffn
 
address@hidden void gal_statistics_set_bins (float @code{*sorted}, size_t 
@code{size}, size_t @code{numbins}, float @code{min}, float @code{max}, float 
@code{onebinvalue}, float @code{quant}, float @code{**obins})
-Given the input @code{sorted} array with @code{size} elements, a given
-number of @code{numbins}, an optional @code{min} and @code{max}, find the
-the bin starting points and keep them in @code{obins}. @code{obins} has two
-columns, and this function only fills the first column. The second will be
-filled by the histogram or cumulative distribution plots.
-
-If @code{onebinvalue} is NaN, then the bins above will be reported.
-However, if it is not NaN, all the bins will be shifted such that the lower
-value of one of the bins will be placed on this value of this variable (if
-it is in the range of the data). When @code{min==max}, the actual data
-range will be used. When @code{quant} is not zero (let's say it has the
-value @mymath{q}, for quantile) and @code{min==max} then the automatic
-range finder will use the values at the quantiles @mymath{q} and
address@hidden
address@hidden  Macro GAL_STATISTICS_SORTED_NOT
address@hidden Macro GAL_STATISTICS_SORTED_INCREASING
address@hidden Macro GAL_STATISTICS_SORTED_DECREASING
+Macros used to identify if the dataset is sorted and increasing, sorted and
+decreasing or not sorted.
address@hidden deffn
+
address@hidden  Macro GAL_STATISTICS_BINS_INVALID
address@hidden Macro GAL_STATISTICS_BINS_REGULAR
address@hidden Macro GAL_STATISTICS_BINS_IRREGULAR
+Macros used to identify if the regularity of the bins when defining bins.
address@hidden deffn
+
address@hidden Number
address@hidden {gal_data_t *} gal_statistics_number (gal_data_t @code{*input})
+Return a single-element @code{uint64} dataset containing the number of
+non-blank elements in @code{input}.
 @end deftypefun
 
address@hidden void gal_statistics_histogram (float @code{*sorted}, size_t 
@code{size}, float @code{*bins}, size_t @code{numbins}, int @code{normhist}, 
int @code{maxhistone})
-Build the histogram of the data (@code{sorted}, with @code{size} elements)
-in the @code{bins} array which also contains the starting bin values and
-has @code{numbins} rows. See @code{gal_statistics_set_bins}.
address@hidden Minimum
address@hidden {gal_data_t *} gal_statistics_minimum (gal_data_t @code{*input})
+Return a single-element dataset containing the minimum non-blank value in
address@hidden The numerical datatype of the output is the same as
address@hidden
address@hidden deftypefun
 
-When @code{normhist} is non-zero, the histogram will be normalized so the
-sum of all the bin values is unity. When @code{maxhistone} is non-zero, the
-histogram is scaled such that the bin with the largest value gets a value
-of 1.
address@hidden Maximum
address@hidden {gal_data_t *} gal_statistics_maximum (gal_data_t @code{*input})
+Return a single-element dataset containing the maximum non-blank value in
address@hidden The numerical datatype of the output is the same as
address@hidden
 @end deftypefun
 
address@hidden void gal_statistics_cumulative_fp (float @code{*sorted}, size_t 
@code{size}, float @code{*bins}, size_t @code{numbins}, int @code{normcfp})
-Build the cumulative frequency plot of the data (@code{sorted}, with
address@hidden elements) in the @code{bins} array which also contains the
-starting bin values and has @code{numbins} rows. See
address@hidden When @code{normcfp} is non-zero, the
-cumulative frequency plot will be normalized so the last bin has a value of
-1.
address@hidden Sum
address@hidden {gal_data_t *} gal_statistics_sum (gal_data_t @code{*input})
+Return a single-element (@code{double} or @code{float64}) dataset
+containing the sum of the non-blank values in @code{input}.
 @end deftypefun
 
address@hidden void gal_statistics_save_hist (float @code{*sorted}, size_t 
@code{size}, size_t @code{numbins}, char @code{*filename}, char @code{*comment})
-Create a histogram of the data in @code{sorted} (and @code{size} elements)
-with @code{numbins} and save it into an ASCII text file named
address@hidden with a possible set of comments (pointed to by
address@hidden before he table.
address@hidden Mean
address@hidden Average
address@hidden {gal_data_t *} gal_statistics_mean (gal_data_t @code{*input})
+Return a single-element (@code{double} or @code{float64}) dataset
+containing the mean of the non-blank values in @code{input}.
 @end deftypefun
 
address@hidden size_t gal_statistics_index_from_quantile (size_t @code{size}, 
float @code{quant})
-Given the number of elements (@code{size}), return the index corresponding
-to the quantile @code{quant}.
address@hidden Standard deviation
address@hidden {gal_data_t *} gal_statistics_std (gal_data_t @code{*input})
+Return a single-element (@code{double} or @code{float64}) dataset
+containing the standard deviation of the non-blank values in @code{input}.
 @end deftypefun
 
address@hidden int gal_statistics_sigma_clip_converge (float @code{*array}, int 
@code{o1_n0}, size_t @code{num_elem}, float @code{sigma_multiple}, float 
@code{accuracy}, float @code{*outave}, float @code{*outmed}, float 
@code{*outstd}, int @code{print})
-Calculate the @mymath{\sigma}-clipped (by convergence) average, median and
-standard deviation on the array @code{array} with @code{num_elem} elements
-(see @ref{Sigma clipping}). If @code{o1_n0==1}, then it is assumed that the
-input array is sorted. @code{sigma_multiple} is the multiple of
address@hidden used in the @mymath{\sigma}-clipping. If the
address@hidden did converge, then this function will return
address@hidden, otherwise, it will return @code{1}. If @code{print} is non-zero,
-then the average, median and standard deviation of each step are printed.
address@hidden {gal_data_t *} gal_statistics_mean_std (gal_data_t @code{*input})
+Return a two-element (@code{double} or @code{float64}) dataset containing
+the mean and standard deviation of the non-blank values in
address@hidden The first element of the returned dataset is the mean and the
+second is the standard deviation.
+
+This function will calculate both values in one pass over the
+dataset. Hence when both the mean and standard deviation of a dataset are
+necessary, this function is much more efficient than calling
address@hidden and @code{gal_statistics_std} separately.
 @end deftypefun
 
address@hidden int gal_statistics_sigma_clip_certain_num (float @code{*array}, 
int @code{o1_n0}, size_t @code{num_elem}, float @code{sigma_multiple}, size_t 
@code{numtimes}, float @code{*outave}, float @code{*outmed}, float 
@code{*outstd}, int @code{print})
-Calculate the @mymath{\sigma}-clipped (with a fixed number of times)
-average, median and standard deviation on the array @code{array} with
address@hidden elements (see @ref{Sigma clipping}). If @code{o1_n0==1},
-then it is assumed that the input array is sorted. @code{sigma_multiple} is
-the multiple of @mymath{\sigma} and @code{numtimes} is the number of times
address@hidden is done. If the @mymath{\sigma}-clipping was
-normally done, then this function will return @code{0}, otherwise, it will
-return @code{1}. If @code{print} is non-zero, then the average, median and
-standard deviation of each step are printed.
address@hidden Median
address@hidden {gal_data_t *} gal_statistics_median (gal_data_t @code{*input}, 
int @code{inplace})
+Return a single-element dataset containing the median of the non-blank
+values in @code{input}. The numerical datatype of the output is the same as
address@hidden
+
+Calculating the median involves sorting the dataset and removing blank
+values, for better performance (and less memory usage), you can give a
+non-zero value to the @code{inplace} argument. In this case, the sorting
+and removal of blank elements will be done directly on the input
+dataset. However, after this function the original dataset may have changed
+(if it wasn't sorted or had blank values).
 @end deftypefun
 
address@hidden void gal_statistics_remove_outliers_flat_cdf (float 
@code{*sorted}, size_t @code{*outsize})
-Remove the outliers in a distribution based on the slope of the cumulative
-frequency plot (when it becomes too flat). Before this function,
address@hidden will point to the number of elements in @code{sorted},
-after this function, it will be the number of elements that are not
-outliers.
address@hidden Quantile
address@hidden size_t gal_statistics_quantile_index (size_t @code{size}, double 
@code{quantile})
+Return the index of the element that has a quantile of @code{quantile}
+assuming the dataset has @code{size} elements.
 @end deftypefun
 
address@hidden Macro GAL_STATISTICS_MODE_LOW_QUANTILE
-The lower quantile used in calculating the mode.
address@hidden deffn
address@hidden size_t gal_statistics_quantile (gal_data_t @code{*input}, double 
@code{quantile}, int @code{inplace})
+Return a single-element dataset containing the value with in a quantile
address@hidden of the non-blank values in @code{input}. The numerical
+datatype of the output is the same as @code{input}. See
address@hidden for a description of @code{inplace}.
address@hidden deftypefun
 
address@hidden Macro GAL_STATISTICS_MODE_HIGH_QUANTILE
-The lower quantile used in calculating the mode.
address@hidden deffn
address@hidden size_t gal_statistics_quantile_function_index (gal_data_t 
@code{*input}, gal_data_t @code{*value}, int @code{inplace})
+Return the index of the quantile function (inverse quantile) of
address@hidden at @code{value}. In other words, this function will return the
+index of the nearest element (of a sorted and non-blank) @code{input} to
address@hidden See @code{gal_statistics_median} for a description of
address@hidden
address@hidden deftypefun
 
address@hidden Macro GAL_STATISTICS_MODE_SYM_GOOD
-The acceptable value for the mode calculation symmetry.
address@hidden deffn
address@hidden {gal_data_t *} gal_statistics_quantile_function (gal_data_t 
@code{*input}, gal_data_t @code{*value}, int @code{inplace})
+Return a single-element (@code{double} or @code{float64}) dataset
+containing the quantile function of the non-blank values in @code{input} at
address@hidden In other words, this function will return the quantile of
address@hidden in @code{input}. See @code{gal_statistics_median} for a
+description of @code{inplace}.
address@hidden deftypefun
 
address@hidden Macro GAL_STATISTICS_MODE_LOW_QUANT_GOOD
-The lowest quantile that an acceptable mode can have.
address@hidden deffn
address@hidden {gal_data_t *} gal_statistics_mode (gal_data_t @code{*input}, 
float @code{mirrordist}, int @code{inplace})
+Return a four-element (@code{double} or @code{float64}) dataset that
+contains the mode of the @code{input} distribution. This function
+implements the non-parametric algorithm to find the mode that is described
+in Appendix C of @url{https://arxiv.org/abs/1505.01664, Akhlaghi and
+Ichikawa [2015]}.
+
+In short it compares the actual distribution and its ``mirror
+distribution'' to find the mode. In order to be efficient, you can
+determine how far the comparison goes away from the mirror through the
address@hidden parameter (think of it as a multiple of sigma/error). See
address@hidden for a description of @code{inplace}.
+
+The output array has the following elements (in the given order, note that
+counting in C starts from 0).
address@hidden
+array[0]: mode
+array[1]: mode quantile.
+array[2]: symmetricity.
+array[3]: value at the end of symmetricity.
address@hidden example
address@hidden deftypefun
 
address@hidden Macro GAL_STATISTICS_MODE_SYMMETRICITY_LOW_QUANT
-Lower quantile for calculating the symmetricity.
address@hidden deffn
address@hidden {gal_data_t *} gal_statistics_mode_mirror_plots (gal_data_t 
@code{*input}, gal_data_t @code{*value}, size_t @code{numbins}, int 
@code{inplace}, double @code{*mirror_val})
+Make a mirrored histogram and cumulative frequency plot (with
address@hidden) with the mirror distribution of the @code{input} with a
+value at @code{value}.
 
address@hidden Structure gal_statistics_mode_params
-The input parameters for the mode functions, this is mainly an internal
-structure and will become hidden in future releases.
address@hidden deffn
+The output is a list of data structures (see @ref{List of gal_data_t}): the
+first is the bins with one bin at the mirror point, the second is the
+histogram with a maximum of one and the third is the cumulative frequency
+plot (with a maximum of one).
address@hidden deftypefun
 
address@hidden void gal_statistics_mode_mirror_plots (float @code{*sorted}, 
size_t @code{size}, size_t @code{mirrorindex}, float @code{min}, float 
@code{max}, size_t @code{numbins}, char @code{*histsname}, char 
@code{*cfpsname}, float @code{mirrorplotdist})
-Create a mirror plot based on the index given by @code{mirrorindex}. The
-mode calculation function procedure is defined in Appendix C of Akhlaghi
-and Ichikawa (2015, ApJS 220, 1. arXiv:1505.01664).
+
address@hidden int gal_statistics_is_sorted (gal_data_t @code{*input})
+Return the respective sort macro (see above) for the @code{input} dataset.
address@hidden deftypefun
+
address@hidden void gal_statistics_sort_increasing (gal_data_t @code{*input})
+Sort the input dataset (in place) in an increasing order.
address@hidden deftypefun
+
address@hidden void gal_statistics_sort_decreasing (gal_data_t @code{*input})
+Sort the input dataset (in place) in a decreasing order.
address@hidden deftypefun
+
address@hidden {gal_data_t *} gal_statistics_no_blank_sorted (gal_data_t 
@code{*input}, int @code{inplace})
+Remove all the blanks and sort the input dataset. If @code{inplace} is
+non-zero this will happen on the input dataset (and the output will be the
+same as the input). However, if @code{inplace} is zero, this function will
+allocate a new copy of the dataset that is sorted and has no blank values.
address@hidden deftypefun
+
address@hidden {gal_data_t *} gal_statistics_regular_bins (gal_data_t 
@code{*input}, gal_data_t @code{*inrange}, size_t @code{numbins}, double 
@code{onebinstart})
+Generate an array of regularly spaced elements as a 1D array (column) of
+type @code{double} (i.e., @code{float64}, it has to be double to account
+for small differences on the bin edges). The input arguments are discribed
+below
+
address@hidden @code
address@hidden input
+The dataset you want to apply the bins to. This is only necessary if the
+range argument is not complete, see below. If @code{inrange} has all the
+necessary information, you can pass a @code{NULL} pointer for this.
+
address@hidden inrange
+This dataset keeps the desired range along each dimension of the input data
+structure, it has to be in @code{float} (i.e., @code{float32}) type.
+
address@hidden
address@hidden
+If you want the full range of the dataset (in any dimensions, then just set
address@hidden to @code{NULL} and the range will be specified from the
+minimum and maximum value of the dataset (@code{input} cannot be
address@hidden in this case).
+
address@hidden
+If there is one element for each dimension in range, then it is viewed as a
+quantile (Q), and the range will be: `Q to 1-Q'.
+
address@hidden
+If there are two elements for each dimension in range, then they are
+assumed to be your desired minimum and maximum values. When either of the
+two are NaN, the minimum and maximum will be calculated for it.
address@hidden itemize
+
address@hidden numbins
+The number of bins: must be larger than 0.
+
address@hidden onebinstart
+A desired value for onebinstart. Note that with this option, the bins won't
+start and end exactly on the given range values, it will be slightly
+shifted to accommodate this request.
address@hidden table
 @end deftypefun
 
address@hidden float gal_statistics_mode_value_from_sym (float @code{*sorted}, 
size_t @code{size}, size_t @code{modeindex}, float @code{sym})
-It happens that you have the symmetricity and you want the flux value at
-that point, this function will do that job. The mode calculation function
-procedure is defined in Appendix C of Akhlaghi and Ichikawa (2015, ApJS
-220, 1. arXiv:1505.01664).
+
address@hidden {gal_data_t *} gal_statistics_histogram (gal_data_t 
@code{*input}, gal_data_t @code{*bins}, int @code{normalize}, int @code{maxone})
+Make a histogram of all the elements in the given dataset with bin values
+that are defined in the @code{inbins} structure (see
address@hidden). @code{inbins} is not mandatory, if you
+pass a @code{NULL} pointer, the bins structure will be built within this
+function based on the @code{numbins} input. As a result, when you have
+already defined the bins, @code{numbins} is not used.
 @end deftypefun
 
address@hidden void gal_statistics_mode_index_in_sorted (float @code{*sorted}, 
size_t @code{size}, float @code{errorstdm}, size_t @code{*modeindex}, float 
@code{*modesym})
-Find the quantile of the mode of a sorted distribution. The return value is
-either @code{0} (not accurate) or @code{1} (accurate). The mode calculation
-function procedure is defined in Appendix C of Akhlaghi and Ichikawa (2015,
-ApJS 220, 1. arXiv:1505.01664).
+
address@hidden {gal_data_t *} gal_statistics_cfp (gal_data_t @code{*input}, 
gal_data_t @code{*bins}, int @code{normalize})
+Make a cumulative frequency plot (CFP) of all the elements in @code{input}
+with bin values that are defined in the @code{bins} structure (see
address@hidden).
+
+The CFP is built from the histogram: in each bin, the value is the sum of
+all previous bins in the histogram. Thus, if you have already calculated
+the histogram before calling this function, you can pass it onto this
+function as the data structure in @code{bins->next} (see @code{List of
+gal_data_t}). If @code{bin->next!=NULL}, then it is assumed to be the
+histogram. If it is @code{NULL}, then the histogram will be calculated
+internally and freed after the job is finished.
+
+When a histogram is given and it is normalized, the CFP will also be
+normalized (even if the normalized flag is not set here): note that a
+normalized CFP's maximum value is 1.
 @end deftypefun
 
 
address@hidden {gal_data_t *} gal_statistics_sigma_clip (gal_data_t 
@code{*input}, float @code{multip}, float @code{param}, int @code{inplace}, int 
@code{quiet})
+Apply @mymath{\sigma}-clipping on a given dataset and return a dataset that
+contains the results. For a description of @mymath{\sigma}-clipping see
address@hidden clipping}. @code{multip} is the multiple of the standard
+deviation (@mymath{\sigma} that is used to define outliers in each round of
+clipping.
+
+The role of @code{param} is determined based on its value. If @code{param}
+is larger than @code{1} (one), it must be an integer and will be
+interpretted as the number clips to do. If it is less than @code{1} (one),
+it is interpretted as the tollerance level to stop the iteration.
+
+The output dataset has the following elements:
address@hidden
+array[0]: Number of points used.
+array[1]: Median.
+array[2]: Mean.
+array[3]: Standard deviation.
address@hidden example
address@hidden deftypefun
 
 
 @node Multithreaded programming, World Coordinate System, Statistical 
operations, Gnuastro library
@@ -20016,10 +20209,11 @@ use those that are generated by Gnuastro after 
@command{make check} in the
 
 
 @menu
-* Library demo - reading a image::
+* Library demo - reading a image::  Read a FITS image into memory.
+* Library demo - inspecting neighbors::  Inspect the neighbors of a pixel.
 @end menu
 
address@hidden Library demo - reading a image,  , Library demo programs, 
Library demo programs
address@hidden Library demo - reading a image, Library demo - inspecting 
neighbors, Library demo programs, Library demo programs
 @subsection Library demo - reading a FITS image
 
 The following simple program demonstrates how to read a FITS image into
@@ -20080,7 +20274,50 @@ main(void)
 @end example
 
 
address@hidden Library demo - inspecting neighbors,  , Library demo - reading a 
image, Library demo programs
address@hidden Library demo - inspecting neighbors
+
+The following simple program shows ho you can inspect the neighbors of a
+pixel using the @code{GAL_DIMENSION_NEIGHBOR_OP} function-like macro that
+was introduced in @ref{Dimensions}. For easy linking/compilation of this
+program along with a first run see @ref{BuildProgram}. Before running, also
+change the @code{filename} and @code{hdu} variable values to specify an
+existing FITS file and/or extension/HDU.
 
address@hidden
+#include <stdio.h>
+#include <gnuastro/fits.h>
+#include <gnuastro/dimension.h>
+
+int
+main(void)
address@hidden
+  double sum;
+  float *array;
+  size_t i, num, *dinc;
+  gal_data_t *input=gal_fits_img_read_to_type("input.fits", "1",
+                                              GAL_TYPE_FLOAT32, -1);
+
+  /* To avoid the `void *' pointer and have `dinc'. */
+  array=input->array;
+  dinc=gal_dimension_increment(input->ndim, input->dsize);
+
+  /* Go over all the pixels. */
+  for(i=0;i<input->size;++i)
+    @{
+      num=0;
+      sum=0.0f;
+      GAL_DIMENSION_NEIGHBOR_OP( i, input->ndim, input->dsize,
+                                 input->ndim, dinc,
+                                 @{++num; sum+=array[nind];@} );
+      printf("%zu: num: %zu, sum: %f\n", i, num, sum);
+    @}
+
+  /* Clean up and return. */
+  gal_data_free(input);
+  return EXIT_SUCCESS;
address@hidden
address@hidden example
 
 
 
diff --git a/lib/dimension.c b/lib/dimension.c
index fe683c4..293df4a 100644
--- a/lib/dimension.c
+++ b/lib/dimension.c
@@ -167,7 +167,7 @@ gal_dimension_coord_to_index(size_t ndim, size_t *dsize, 
size_t *coord)
    this is that this function will often be called with a loop and a single
    allocated space would be enough for the whole loop. */
 void
-gal_dimension_index_to_coord(size_t ind, size_t ndim, size_t *dsize,
+gal_dimension_index_to_coord(size_t index, size_t ndim, size_t *dsize,
                              size_t *coord)
 {
   size_t d, *dinc;
@@ -180,13 +180,13 @@ gal_dimension_index_to_coord(size_t ind, size_t ndim, 
size_t *dsize,
 
     /* One dimensional dataset. */
     case 1:
-      coord[0] = ind;
+      coord[0] = index;
       break;
 
     /* 2D dataset. */
     case 2:
-      coord[0] = ind / dsize[1];
-      coord[1] = ind % dsize[1];
+      coord[0] = index / dsize[1];
+      coord[1] = index % dsize[1];
       break;
 
     /* Higher dimensional datasets. */
@@ -201,11 +201,11 @@ gal_dimension_index_to_coord(size_t ind, size_t ndim, 
size_t *dsize,
       for(d=0;d<ndim;++d)
         {
           /* Set the coordinate value for this dimension. */
-          coord[d] = ind / dinc[d];
+          coord[d] = index / dinc[d];
 
           /* Replace the index with its remainder with the number of
              elements in all faster dimensions. */
-          ind  %= dinc[d];
+          index  %= dinc[d];
         }
 
       /* Clean up. */
diff --git a/lib/gnuastro/dimension.h b/lib/gnuastro/dimension.h
index 9d71d68..226d534 100644
--- a/lib/gnuastro/dimension.h
+++ b/lib/gnuastro/dimension.h
@@ -65,8 +65,8 @@ gal_dimension_num_neighbors(size_t ndim);
 /************************************************************************/
 /********************          Coordinates         **********************/
 /************************************************************************/
-#define GAL_DIMENSION_FLT_TO_INT(FLT) (FLT)-(int)(FLT) > 0.5f    \
-  ? (int)(FLT)+1 : (int)(FLT);
+#define GAL_DIMENSION_FLT_TO_INT(FLT) ( (FLT)-(int)(FLT) > 0.5f  \
+                                        ? (int)(FLT)+1 : (int)(FLT) )
 
 void
 gal_dimension_add_coords(size_t *c1, size_t *c2, size_t *out, size_t ndim);
@@ -75,7 +75,7 @@ size_t
 gal_dimension_coord_to_index(size_t ndim, size_t *dsize, size_t *coord);
 
 void
-gal_dimension_index_to_coord(size_t ind, size_t ndim, size_t *dsize,
+gal_dimension_index_to_coord(size_t index, size_t ndim, size_t *dsize,
                              size_t *coord);
 
 
diff --git a/lib/gnuastro/statistics.h b/lib/gnuastro/statistics.h
index 0701135..eba914d 100644
--- a/lib/gnuastro/statistics.h
+++ b/lib/gnuastro/statistics.h
@@ -104,13 +104,13 @@ gal_statistics_mean_std(gal_data_t *input);
 gal_data_t *
 gal_statistics_median(gal_data_t *input, int inplace);
 
+size_t
+gal_statistics_quantile_index(size_t size, double quantile);
+
 gal_data_t *
 gal_statistics_quantile(gal_data_t *input, double quantile, int inplace);
 
 size_t
-gal_statistics_quantile_index(size_t size, double quant);
-
-size_t
 gal_statistics_quantile_function_index(gal_data_t *input, gal_data_t *value,
                                        int inplace);
 
@@ -143,13 +143,13 @@ gal_statistics_mode_mirror_plots(gal_data_t *input, 
gal_data_t *value,
  ****************************************************************/
 
 int
-gal_statistics_is_sorted(gal_data_t *data);
+gal_statistics_is_sorted(gal_data_t *input);
 
 void
-gal_statistics_sort_increasing(gal_data_t *data);
+gal_statistics_sort_increasing(gal_data_t *input);
 
 void
-gal_statistics_sort_decreasing(gal_data_t *data);
+gal_statistics_sort_decreasing(gal_data_t *input);
 
 gal_data_t *
 gal_statistics_no_blank_sorted(gal_data_t *input, int inplace);
diff --git a/lib/statistics.c b/lib/statistics.c
index 419b9af..8922fe4 100644
--- a/lib/statistics.c
+++ b/lib/statistics.c
@@ -282,19 +282,19 @@ gal_statistics_median(gal_data_t *input, int inplace)
 /* For a given size, return the index (starting from zero) that is at the
    given quantile.  */
 size_t
-gal_statistics_quantile_index(size_t size, double quant)
+gal_statistics_quantile_index(size_t size, double quantile)
 {
   double floatindex;
 
-  if(quant<0.0f || quant>1.0f)
+  if(quantile<0.0f || quantile>1.0f)
     error(EXIT_FAILURE, 0, "%s: the input quantile should be between 0.0 "
-          "and 1.0 (inclusive). You have asked for %g", __func__, quant);
+          "and 1.0 (inclusive). You have asked for %g", __func__, quantile);
 
   /* Find the index of the quantile. */
-  floatindex=(double)(size-1)*quant;
+  floatindex=(double)(size-1)*quantile;
 
   /*
-  printf("quant: %f, size: %zu, findex: %f\n", quant, size, floatindex);
+  printf("quantile: %f, size: %zu, findex: %f\n", quantile, size, floatindex);
   */
   /* Note that in the conversion from float to size_t, the floor
      integer value of the float will be used. */
@@ -977,7 +977,7 @@ statistics_make_mirror(gal_data_t *noblank_sorted, size_t 
index,
    distribution of the input with a value at `value'.
 
    The output is a linked list of data structures: the first is the bins
-   with on bin at the mirror point, the second is the histogram with a
+   with one bin at the mirror point, the second is the histogram with a
    maximum of one and the third is the cumulative frequency plot. */
 gal_data_t *
 gal_statistics_mode_mirror_plots(gal_data_t *input, gal_data_t *value,
@@ -1047,7 +1047,7 @@ gal_statistics_mode_mirror_plots(gal_data_t *input, 
gal_data_t *value,
      - 2: dataset is sorted and decreasing.                  */
 
 #define IS_SORTED(IT) {                                                 \
-    IT *aa=data->array, *a=data->array, *af=a+data->size-1;             \
+    IT *aa=input->array, *a=input->array, *af=a+input->size-1;          \
     if(a[1]>=a[0]) do if( *(a+1) < *a ) break; while(++a<af);           \
     else           do if( *(a+1) > *a ) break; while(++a<af);           \
     return ( a==af                                                      \
@@ -1058,14 +1058,14 @@ gal_statistics_mode_mirror_plots(gal_data_t *input, 
gal_data_t *value,
   }
 
 int
-gal_statistics_is_sorted(gal_data_t *data)
+gal_statistics_is_sorted(gal_data_t *input)
 {
   /* A one-element dataset can be considered, sorted, so we'll just return
      1 (for sorted and increasing). */
-  if(data->size==1) return GAL_STATISTICS_SORTED_INCREASING;
+  if(input->size==1) return GAL_STATISTICS_SORTED_INCREASING;
 
   /* Do the check. */
-  switch(data->type)
+  switch(input->type)
     {
     case GAL_TYPE_UINT8:     IS_SORTED( uint8_t  );    break;
     case GAL_TYPE_INT8:      IS_SORTED( int8_t   );    break;
@@ -1079,7 +1079,7 @@ gal_statistics_is_sorted(gal_data_t *data)
     case GAL_TYPE_FLOAT64:   IS_SORTED( double   );    break;
     default:
       error(EXIT_FAILURE, 0, "%s: type code %d not recognized",
-            __func__, data->type);
+            __func__, input->type);
     }
 
   /* Control shouldn't reach this point. */
@@ -1095,13 +1095,13 @@ gal_statistics_is_sorted(gal_data_t *data)
 
 /* This function is ignorant to blank values, if you want to make sure
    there is no blank values, you can call `gal_blank_remove' first. */
-#define STATISTICS_SORT(QSORT_F) {                                        \
-    qsort(data->array, data->size, gal_type_sizeof(data->type), QSORT_F); \
+#define STATISTICS_SORT(QSORT_F) {                                      \
+    qsort(input->array, input->size, gal_type_sizeof(input->type), QSORT_F); \
   }
 void
-gal_statistics_sort_increasing(gal_data_t *data)
+gal_statistics_sort_increasing(gal_data_t *input)
 {
-  switch(data->type)
+  switch(input->type)
     {
     case GAL_TYPE_UINT8:
       STATISTICS_SORT(gal_qsort_uint8_increasing);    break;
@@ -1125,7 +1125,7 @@ gal_statistics_sort_increasing(gal_data_t *data)
       STATISTICS_SORT(gal_qsort_float64_increasing);  break;
     default:
       error(EXIT_FAILURE, 0, "%s: type code %d not recognized",
-            __func__, data->type);
+            __func__, input->type);
     }
 }
 
@@ -1135,9 +1135,9 @@ gal_statistics_sort_increasing(gal_data_t *data)
 
 /* See explanations above `gal_statistics_sort_increasing'. */
 void
-gal_statistics_sort_decreasing(gal_data_t *data)
+gal_statistics_sort_decreasing(gal_data_t *input)
 {
-  switch(data->type)
+  switch(input->type)
     {
     case GAL_TYPE_UINT8:
       STATISTICS_SORT(gal_qsort_uint8_decreasing);    break;
@@ -1161,7 +1161,7 @@ gal_statistics_sort_decreasing(gal_data_t *data)
       STATISTICS_SORT(gal_qsort_float64_decreasing);  break;
     default:
       error(EXIT_FAILURE, 0, "%s: type code %d not recognized",
-            __func__, data->type);
+            __func__, input->type);
     }
 }
 
@@ -1272,16 +1272,16 @@ gal_statistics_no_blank_sorted(gal_data_t *input, int 
inplace)
 /****************************************************************
  ********     Histogram and Cumulative Frequency Plot     *******
  ****************************************************************/
-/* Generate an array of regularly spaced elements. For a 1D dataset, the
-   output will be 1D, for 2D, it will be 2D.
+/* Generate an array of regularly spaced elements.
 
    Input arguments:
 
-     * The `data' set you want to apply the bins to. This is only necessary
-       if the range argument is not complete, see below. If `range' has all
-       the necessary information, you can pass a NULL pointer for `data'.
+     * The `input' set you want to apply the bins to. This is only
+       necessary if the range argument is not complete, see below. If
+       `range' has all the necessary information, you can pass a NULL
+       pointer for `input'.
 
-     * The `range' data structure keeps the desired range along each
+     * The `inrange' data structure keeps the desired range along each
        dimension of the input data structure, it has to be in float32
        type. Note that if
 
@@ -1308,7 +1308,7 @@ gal_statistics_no_blank_sorted(gal_data_t *input, int 
inplace)
   account for small differences on the bin edges.
 */
 gal_data_t *
-gal_statistics_regular_bins(gal_data_t *data, gal_data_t *inrange,
+gal_statistics_regular_bins(gal_data_t *input, gal_data_t *inrange,
                             size_t numbins, double onebinstart)
 {
   size_t i;
@@ -1341,7 +1341,7 @@ gal_statistics_regular_bins(gal_data_t *data, gal_data_t 
*inrange,
           /* If the minimum isn't set (is blank), find it. */
           if( isnan(ra[0]) )
             {
-              tmp=gal_data_copy_to_new_type_free(gal_statistics_minimum(data),
+              tmp=gal_data_copy_to_new_type_free(gal_statistics_minimum(input),
                                                  GAL_TYPE_FLOAT64);
               min=*((double *)(tmp->array));
               gal_data_free(tmp);
@@ -1352,7 +1352,7 @@ gal_statistics_regular_bins(gal_data_t *data, gal_data_t 
*inrange,
              value, so all points are included. */
           if( isnan(ra[1]) )
             {
-              tmp=gal_data_copy_to_new_type_free(gal_statistics_maximum(data),
+              tmp=gal_data_copy_to_new_type_free(gal_statistics_maximum(input),
                                                  GAL_TYPE_FLOAT64);
               max=*((double *)(tmp->array))+1e-6;
               gal_data_free(tmp);
@@ -1366,11 +1366,11 @@ gal_statistics_regular_bins(gal_data_t *data, 
gal_data_t *inrange,
   /* No range was given, find the minimum and maximum. */
   else
     {
-      tmp=gal_data_copy_to_new_type_free(gal_statistics_minimum(data),
+      tmp=gal_data_copy_to_new_type_free(gal_statistics_minimum(input),
                                          GAL_TYPE_FLOAT64);
       min=*((double *)(tmp->array));
       gal_data_free(tmp);
-      tmp=gal_data_copy_to_new_type_free(gal_statistics_maximum(data),
+      tmp=gal_data_copy_to_new_type_free(gal_statistics_maximum(input),
                                          GAL_TYPE_FLOAT64);
       max=*((double *)(tmp->array)) + 1e-6;
       gal_data_free(tmp);
@@ -1379,7 +1379,7 @@ gal_statistics_regular_bins(gal_data_t *data, gal_data_t 
*inrange,
 
   /* Allocate the space for the bins. */
   bins=gal_data_alloc(NULL, GAL_TYPE_FLOAT64, 1, &numbins, NULL,
-                      0, data->minmapsize, "bin_center", data->unit,
+                      0, input->minmapsize, "bin_center", input->unit,
                       "Center value of each bin.");
 
 
@@ -1429,13 +1429,13 @@ gal_statistics_regular_bins(gal_data_t *data, 
gal_data_t *inrange,
    the bins, `numbins' is not used. */
 
 #define HISTOGRAM_TYPESET(IT) {                                         \
-    IT *a=data->array, *af=a+data->size;                                \
+    IT *a=input->array, *af=a+input->size;                              \
     do if( *a>=min && *a<max) ++h[ (size_t)( (*a-min)/binwidth ) ];     \
     while(++a<af);                                                      \
   }
 
 gal_data_t *
-gal_statistics_histogram(gal_data_t *data, gal_data_t *bins, int normalize,
+gal_statistics_histogram(gal_data_t *input, gal_data_t *bins, int normalize,
                          int maxone)
 {
   size_t *h;
@@ -1463,7 +1463,7 @@ gal_statistics_histogram(gal_data_t *data, gal_data_t 
*bins, int normalize,
   /* Allocate the histogram (note that we are clearning it so all values
      are zero. */
   hist=gal_data_alloc(NULL, GAL_TYPE_SIZE_T, bins->ndim, bins->dsize,
-                      NULL, 1, data->minmapsize, "hist_number", "counts",
+                      NULL, 1, input->minmapsize, "hist_number", "counts",
                       "Number of data points within each bin.");
 
 
@@ -1476,7 +1476,7 @@ gal_statistics_histogram(gal_data_t *data, gal_data_t 
*bins, int normalize,
 
   /* Go through all the elements and find out which bin they belong to. */
   h=hist->array;
-  switch(data->type)
+  switch(input->type)
     {
     case GAL_TYPE_UINT8:     HISTOGRAM_TYPESET(uint8_t);     break;
     case GAL_TYPE_INT8:      HISTOGRAM_TYPESET(int8_t);      break;
@@ -1490,7 +1490,7 @@ gal_statistics_histogram(gal_data_t *data, gal_data_t 
*bins, int normalize,
     case GAL_TYPE_FLOAT64:   HISTOGRAM_TYPESET(double);      break;
     default:
       error(EXIT_FAILURE, 0, "%s: type code %d not recognized",
-            __func__, data->type);
+            __func__, input->type);
     }
 
 
@@ -1562,10 +1562,9 @@ gal_statistics_histogram(gal_data_t *data, gal_data_t 
*bins, int normalize,
 
    When a histogram is given and it is normalized, the CFP will also be
    normalized (even if the normalized flag is not set here): note that a
-   normalized CFP's maximum value is 1.
-*/
+   normalized CFP's maximum value is 1. */
 gal_data_t *
-gal_statistics_cfp(gal_data_t *data, gal_data_t *bins, int normalize)
+gal_statistics_cfp(gal_data_t *input, gal_data_t *bins, int normalize)
 {
   double sum;
   float *f, *ff, *hf;
@@ -1584,7 +1583,7 @@ gal_statistics_cfp(gal_data_t *data, gal_data_t *bins, 
int normalize)
   /* Prepare the histogram. */
   hist = ( bins->next
            ? bins->next
-           : gal_statistics_histogram(data, bins, 0, 0) );
+           : gal_statistics_histogram(input, bins, 0, 0) );
 
 
   /* If the histogram has float32 type it was given by the user and is
@@ -1596,13 +1595,13 @@ gal_statistics_cfp(gal_data_t *data, gal_data_t *bins, 
int normalize)
       sum=0.0f;
       ff=(f=hist->array)+hist->size; do sum += *f++;   while(f<ff);
       if(sum!=1.0f)
-        hist=gal_statistics_histogram(data, bins, 0, 0);
+        hist=gal_statistics_histogram(input, bins, 0, 0);
     }
 
 
   /* Allocate the cumulative frequency plot's necessary space. */
   cfp=gal_data_alloc( NULL, hist->type, bins->ndim, bins->dsize,
-                      NULL, 1, data->minmapsize,
+                      NULL, 1, input->minmapsize,
                       ( hist->type==GAL_TYPE_FLOAT32
                         ? "cfp_normalized" : "cfp_number" ),
                       ( hist->type==GAL_TYPE_FLOAT32



reply via email to

[Prev in Thread] Current Thread [Next in Thread]