gnuastro-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnuastro-commits] master f5d7d1a 19/19: Match: added tests, completed d


From: Mohammad Akhlaghi
Subject: [gnuastro-commits] master f5d7d1a 19/19: Match: added tests, completed docs of new k-d tree based matching
Date: Sun, 14 Nov 2021 20:41:01 -0500 (EST)

branch: master
commit f5d7d1a351f7eb659f0cbd7f8943a3b53d0c43c6
Author: Mohammad Akhlaghi <mohammad@akhlaghi.org>
Commit: Mohammad Akhlaghi <mohammad@akhlaghi.org>

    Match: added tests, completed docs of new k-d tree based matching
    
    Until now I was busy implementing, testing, and imptoving the various
    phases of the k-d tree based matching. Finally, things seem to be in place
    properly so it was necessary to add tests in 'make check' and complete the
    uncomplete documentation of this new feature (with the new "Matching
    algorithms" section in the book).
    
    With this commit, some tests were added, and the functions were cleaned and
    made more clear, while fixing some problems that occurred in the previous
    implementation (for small tables, edge-effects of the binning algorithm
    were causing problems, and not letting matches getting found). Also some of
    the new features have now been explained in 'NEWS'.
---
 NEWS                                          |  36 +++-
 THANKS                                        |   2 +-
 bin/match/main.h                              |   4 +-
 bin/match/match.c                             |  43 ++--
 bin/match/ui.c                                |   5 +-
 bin/table/table.c                             |   3 +-
 doc/gnuastro.texi                             | 198 ++++++++++++------
 lib/binary.c                                  |  39 ++++
 lib/match.c                                   | 283 +++++++++++++++++---------
 lib/statistics.c                              |  12 +-
 tests/Makefile.am                             |   9 +-
 tests/match/{kdtree.sh => kdtree-internal.sh} |  10 +-
 tests/match/{kdtree.sh => kdtree-separate.sh} |  13 +-
 tests/match/merged-cols.sh                    |   2 +-
 tests/match/{positions.sh => sort-based.sh}   |   7 +-
 15 files changed, 460 insertions(+), 206 deletions(-)

diff --git a/NEWS b/NEWS
index 3df4fcd..ace08dd 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,14 @@ See the end of the file for license conditions.
 
 ** New features
 
+  All programs:
+   - Columns of FITS tables are now read in parallel (when '--numthreads'
+     isn't set to '1', and if the dependency CFITSIO library was built with
+     '--enable-reentrant'). In scenarios where many columns may be
+     necessary (for example the Table or Match programs), this will greatly
+     improve the running time of the programs. This task was implemented
+     based on feedback from Andrés Del Pino Molina.
+
   Crop:
    --widthinpix: when in WCS-mode, the value given to '--width' will be
      interpreted in units of pixels. This will allow having crops of a
@@ -23,6 +31,23 @@ See the end of the file for license conditions.
      Sánchez Benavente.
    --catrowhdu: The HDU(s) of the FITS file(s) given to '--catrowfile'.
 
+  Match:
+   - k-d tree based matching has been added for finding the matching rows,
+     and is set to the default mode. It will do the matching in parallel,
+     which will improve the speed when many rows are present. The k-d tree
+     can also optionally be saved in a file for matching new tables with
+     the same one (further improving running speed). For a full description
+     of Match's new behavior, please see the new "Matching algorithms"
+     section of the book. This feature was implemented with the help of
+     Sachin Kumar Singh.
+   --kdtree: new option to specify the algorithm used by Match. It can take
+     four values: 'internal' (which is the default), 'build' (to build a
+     k-d tree), 'FILE.fits' (name of k-d tree to import), or 'disable' (to
+     do the matching with the old sort-based method). For more, please see
+     the new "Matching algorithms" section of the book.
+   --kdtreehdu: the HDU of the external k-d tree (if a FITS file was given
+     to '--kdtree').
+
 ** Removed features
 
 ** Changed features
@@ -37,10 +62,17 @@ See the end of the file for license conditions.
      '--rawoutput' option. This renaming was done to avoid such confusions
      and was raised by Sepideh Eskandarlou.
 
+  Match:
+   - By default (when '--quiet' isn't called), Match will print the names
+     of the inputs and timings of important steps to the standard output
+     (on the command-line).
+
   Library:
-   - gal_fits_tab_read: now takes the number of threads (only relevant for
+   - gal_fits_tab_read: now takes the number of threads to read the desired
+     columns in parallel (this will greatly improve reading speed when
+     there are many columns and many rows).
+   - gal_table_read: simlar to 'gal_fits_tab_read' (only relevant for
      reading FITS tables).
-   - gal_table_read: simlar to 'gal_fits_tab_read'.
    - gal_blank_initialize: also works on string data, but initializing only
      a tile over a larger block of elements is only supported for numeric
      data types.
diff --git a/THANKS b/THANKS
index 1d27c4c..ede64b2 100644
--- a/THANKS
+++ b/THANKS
@@ -36,7 +36,7 @@ support in Gnuastro. The list is ordered alphabetically (by 
family name).
     Nushkia Chamba                       chamba@iac.es
     Benjamin Clement                     benjamin.clement@univ-lyon1.fr
     Nima Dehdilani                       nimadehdilani@gmail.com
-    Andres Del Pino Molina               adelpino@cefca.es
+    Andrés Del Pino Molina               adelpino@cefca.es
     Antonio Diaz Diaz                    antonio@gnu.org
     Alexey Dokuchaev                     danfe@freebsd.org
     Pierre-Alain Duc                     pierre-alain.duc@astro.unistra.fr
diff --git a/bin/match/main.h b/bin/match/main.h
index bd958d2..5db5c9e 100644
--- a/bin/match/main.h
+++ b/bin/match/main.h
@@ -86,8 +86,8 @@ struct matchparams
   char              *out2name;  /* Name of second matched output.       */
   gal_list_str_t  *stdinlines;  /* Lines given by Standard input.       */
   int              kdtreemode;  /* The k-d tree mode.                   */
-  gal_data_t      *kdtreedata;
-  size_t           kdtreeroot;
+  gal_data_t      *kdtreedata;  /* The k-d tree data.                   */
+  size_t           kdtreeroot;  /* The root node of the k-d tree.       */
 
   /* Output: */
   time_t              rawtime;  /* Starting time of the program.        */
diff --git a/bin/match/match.c b/bin/match/match.c
index 902cdf8..c0179c7 100644
--- a/bin/match/match.c
+++ b/bin/match/match.c
@@ -287,8 +287,6 @@ match_arrange(void *in_prm)
       c=0;
       for(tmp=map->cat; tmp!=NULL; tmp=tmp->next) if(c++==index) break;
 
-      printf("%zu: %s\n", index, tmp->name);
-
       /* Rearrange this columns' elements. */
       match_arrange_in_new_col(map->p, tmp, map->permutation,
                                map->nummatched);
@@ -531,17 +529,17 @@ match_catalog_write_one_col(struct matchparams *p, 
gal_data_t *a,
         break;
 
       default:
-        error(EXIT_FAILURE, 0, "a bug! Please contact us at %s to fix the "
-              "problem. the value to strarr[%zu][0] (%c) is not recognized",
-              PACKAGE_BUGREPORT, i, strarr[i][0]);
+        error(EXIT_FAILURE, 0, "a bug! Please contact us at %s to "
+              "fix the problem. the value to strarr[%zu][0] (%c) "
+              "is not recognized", PACKAGE_BUGREPORT, i, strarr[i][0]);
       }
 
   /* A small sanity check. */
   if(a || b)
-    error(EXIT_FAILURE, 0, "%s: a bug! Please contact us to fix the problem. "
-          "The two 'a' and 'b' arrays must be NULL by this point: "
-          "'a' %s NULL, 'b' %s NULL", __func__, a?"is not":"is",
-          b?"is not":"is");
+    error(EXIT_FAILURE, 0, "%s: a bug! Please contact us to fix "
+          "the problem. The two 'a' and 'b' arrays must be NULL "
+          "by this point: 'a' %s NULL, 'b' %s NULL", __func__,
+          a?"is not":"is", b?"is not":"is");
 
   /* Reverse the table and write it out. */
   gal_list_data_reverse(&cat);
@@ -573,7 +571,8 @@ match_catalog_kdtree_build(struct matchparams *p)
   kdtree = gal_kdtree_create(p->cols1, &root);
   if(!p->cp.quiet)
     {
-      if( asprintf(&msg, "k-d tree constructed (%zu rows).", p->cols1->size)<0 
)
+      if( asprintf(&msg, "k-d tree constructed (%zu rows).",
+                   p->cols1->size)<0 )
         error(EXIT_FAILURE, errno, "asprintf allocation");
       gal_timing_report(&t1, msg, 1);
       free(msg);
@@ -643,7 +642,8 @@ match_catalog_kdtree(struct matchparams *p, size_t 
*nummatched)
                              p->cp.quietmmap, nummatched);
       if(!p->cp.quiet)
         {
-          if( asprintf(&msg, "... done (%zu matches found).", *nummatched)<0 )
+          if( asprintf(&msg, "... %zu matches found, done!",
+                       *nummatched)<0 )
             error(EXIT_FAILURE, errno, "asprintf allocation");
           gal_timing_report(&t1, msg, 1);
           free(msg);
@@ -651,7 +651,11 @@ match_catalog_kdtree(struct matchparams *p, size_t 
*nummatched)
       gal_list_data_free(p->kdtreedata);
       break;
 
-    /* No 'default' necessary because the modes include disabling. */
+    /* Abort if the mode isn't recognized (its a bug!). */
+    default:
+      error(EXIT_FAILURE, 0, "%s: a bug! Please contact us at %s to fix "
+            "the problem! The code %d isn't recognized for 'kdtreemode'",
+            __func__, PACKAGE_BUGREPORT, p->kdtreemode);
     }
 
   /* Return the final match. */
@@ -684,7 +688,7 @@ match_catalog_sort_based(struct matchparams *p, size_t 
*nummatched)
   /* Let the user know that it finished. */
   if(!p->cp.quiet)
     {
-      if( asprintf(&msg, "... done (%zu matches found).", *nummatched)<0 )
+      if( asprintf(&msg, "... %zu matches found, done!", *nummatched)<0 )
         error(EXIT_FAILURE, errno, "asprintf allocation");
       gal_timing_report(&t1, msg, 1);
       free(msg);
@@ -713,13 +717,10 @@ match_catalog(struct matchparams *p)
       mcols=match_catalog_kdtree(p, &nummatched);
 
       /* If the user just asked to build a k-d tree, no futher processing
-         is necessary. */
+         is necessary, so don't continue. */
       if(p->kdtreemode==MATCH_KDTREE_BUILD) return;
     }
-
-  /* If k-d tree-based matching wasn't used, use the sort-based
-     matching. */
-  if(mcols==NULL)
+  else
     mcols=match_catalog_sort_based(p, &nummatched);
 
   /* If the output is to be taken from the input columns (it isn't just the
@@ -730,8 +731,8 @@ match_catalog(struct matchparams *p)
       if(!p->cp.quiet)
         {
           gettimeofday(&t1, NULL);
-          printf("  - Re-arranging matched rows "
-                 "(skip this with '--logasoutput')...\n");
+          printf("  - Arranging matched rows (skip this with "
+                 "'--logasoutput')...\n");
         }
 
       /* Read (and possibly write) the outputs. Note that we only need to
@@ -769,7 +770,7 @@ match_catalog(struct matchparams *p)
 
       /* Let the user know. */
       if( !p->cp.quiet )
-        gal_timing_report(&t1, "... done.", 1);
+        gal_timing_report(&t1, "... done!", 1);
     }
 
   /* Write the raw information in a log file if necessary.  */
diff --git a/bin/match/ui.c b/bin/match/ui.c
index 51ef3e2..560c420 100644
--- a/bin/match/ui.c
+++ b/bin/match/ui.c
@@ -887,7 +887,7 @@ ui_read_columns(struct matchparams *p)
   if( !p->cp.quiet
       && p->kdtreemode!=MATCH_KDTREE_BUILD
       && p->kdtreemode!=MATCH_KDTREE_DISABLE
-      && p->cols1->size > p->cols2->size )
+      && p->cols1->size > (2*p->cols2->size) )
     error(EXIT_SUCCESS, 0, "TIP: the matching speed will GREATLY IMPROVE "
           "if you swap the two inputs. Currently the second input has "
           "fewer rows than the first. In the k-d tree based matching, "
@@ -895,7 +895,7 @@ ui_read_columns(struct matchparams *p)
           "the k-d will be constructed for the first input. Fewer "
           "first-input rows therefore means a smaller tree, and more "
           "second-input rows will be better distributed in your "
-          "system's CPU");
+          "system's CPU threads");
 
   /* Free the extra spaces. */
   gal_list_str_free(cols1, 1);
@@ -1209,7 +1209,6 @@ ui_read_check_inputs_setup(int argc, char *argv[], struct 
matchparams *p)
     gal_options_as_fits_keywords(&p->cp);
 
 
-  /* Report the starting information. */
   /* Let the user know that processing has started. */
   if(!p->cp.quiet)
     {
diff --git a/bin/table/table.c b/bin/table/table.c
index c5bb8ac..e0cc27b 100644
--- a/bin/table/table.c
+++ b/bin/table/table.c
@@ -915,7 +915,8 @@ table_catrows(struct tableparams *p)
       hdu=table_catrows_findhdu(filell->v, &hdull);
       new=gal_table_read(filell->v, hdu, NULL, p->columns,
                          p->cp.searchin, p->cp.ignorecase,
-                         p->cp.minmapsize, p->cp.quietmmap, NULL);
+                         p->cp.numthreads, p->cp.minmapsize,
+                         p->cp.quietmmap, NULL);
 
       /* Make sure that the same number of columns were extracted from this
          table as they were from the original table. */
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index d6dfe28..ba6c6b9 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -576,7 +576,7 @@ Invoking MakeCatalog
 
 Match
 
-* Matching algorithms::
+* Matching algorithms::         Different ways to find the match
 * Invoking astmatch::           Inputs, outputs and options of Match
 
 Modeling and fitting
@@ -19960,87 +19960,113 @@ The aperture can be a circle or an ellipse with any 
orientation.
 @subsection Matching algorithms
 
 Matching involves two catalogs, let's call them catalog A (with N rows) and 
catalog B (with M rows).
-The most basic algorithm that immediately comes to mind is this:
+The most basic matching algorithm that immediately comes to mind is this:
 for each row in A (let's call it @mymath{A_i}), go over all the rows in B 
(@mymath{B_j}, where @mymath{0<j<M}) and calculate the distance 
@mymath{|B_j-A_i|}.
-If this distance is less than a certain acceptable distance threshold (or 
radius), consider @mymath{A_i} and @mymath{B_j} as a match.
-
-During the parsing, once you have found that @mymath{A_i} and @mymath{B_j} 
satisfy the match above, you can even remove @mymath{B_j} from the search of 
matches for @mymath{A_k} (where @mymath{k>i}).
-Therefore, as you go down A (and more matches are found), you have to 
calculate less distances (there are fewer elements in B that remain to bex 
checked).
-However, this will introduce an important bias: @mymath{B_j} may actually be 
closer to @mymath{A_k} than to the matched @mymath{A_i}!
-But because @mymath{A_i} happened to be before @mymath{A_k} in your table, you 
removed @mymath{B_j} from the potential search domain of @mymath{A_k}.
-You will therefore loose that match, and replace it by a false match between 
@mymath{A_i} and @mymath{B_j}!
-
-In a single-dimentional match, this bias depends on the sorting of your two 
datasets (leading to different matches if you shuffle your datasets).
-But it will get more complex as you add dimensionality (for example catalogs 
dervied from 2D images or 3D cubes, where you have 2 and 3 different 
coordinates for each point).
-Here is how we adress this problem in Gnuastro's Match program: as we are 
doing the first parsing, we keep a log of all elements in A (@mymath{A_i} and 
@mymath{A_k} in the example above) that were within the radial threshold of 
@mymath{B_j}.
-Afterwards, for each @mymath{B_j} we can find the nearest element of A, 
irrespective of how the inputs were sorted, or their dimensionality.
+If this distance is less than a certain acceptable distance threshold (or 
radius, or aperture), consider @mymath{A_i} and @mymath{B_j} as a match.
 
-On the other hand, the basic parsing algorithm mentioned above is very 
computationally expensive:
-@mymath{N\times M} distances have to measured, and calculating the distance 
requires a square root and power of 2 (in 2 dimensions it would be 
@mymath{\sqrt{(B_{ix}-A_{ix})^2+(B_{iy}-A_{iy})^2}}), which are not simple 
operations for the CPU.
+This basic parsing algorithm is very computationally expensive:
+@mymath{N\times M} distances have to measured, and calculating the distance 
requires a square root and power of 2: in 2 dimensions it would be 
@mymath{\sqrt{(B_{ix}-A_{ix})^2+(B_{iy}-A_{iy})^2}}.
+If an elliptical aperture is necessary, it can even get more complicated, see 
@ref{Defining an ellipse and ellipsoid}.
+Such operations are not simple, and will consume many cycles of your CPU!
 As a result, this basic algorithm will become terribly slow as your datasets 
grow in size.
 For example when N or M exceed hundreds of thousands (which is common in the 
current days with datasets like the European Space Agency's Gaia mission).
-Therefore that basic parsing algoritm will take too much time and we need to 
use more efficient ways to parse the two catalogs.
-Gnuastro's Match currently has two such algorithms:
+Therefore that basic parsing algoritm will take too much time and more 
efficient ways to @emph{find the nearest neighbor} need to be found.
+Gnuastro's Match currently has algorithms for finding the nearest neighbor:
 
 @table @asis
 @item Sort-based
-To avoid the problem of calculating @mymath{N\times M} Euclidean distances 
(that involve a square root), this method works like this:
+In this algorithm, we will use a moving window over the sorted datasets:
 
 @enumerate
 @item
 Sort the two datasets by their first coordinate.
-Therefore @mymath{A_i<A_j} (only in first coordinate; when @mymath{i<j}), and 
similarly, sort the elements of B based on the first coordinate.
+Therefore @mymath{A_i<A_j} (when @mymath{i<j}; only in first coordinate), and 
similarly, sort the elements of B based on the first coordinate.
 @item
 Use the radial distance threshold to define the width of a moving interval 
over both A and B.
-Therefore, with a single parsing of A, you can find all the elements in A that 
are sufficiently near every element of B (while rejecting the ones that are too 
distant when taking other dimensions into account).
-@item
-The nearest A within the subset that is nearest to @mymath{B_j} will be 
considered as the match.
+Therefore, with a single parsing of both simultaneously, for each A-point, we 
can find all the elements in B that are sufficiently near to it (within the 
requested aperture).
 @end enumerate
 
 This method has some caveats:
-1) It requires sorting, which can again be slow.
+1) It requires sorting, which can again be slow on large numbers.
 2) It can only be done on a single CPU thread! So it can't benefit from the 
modern CPUs with many threads.
 3) There is no way to preserve intermediate information for future matches, 
for example this can greatly help when one of the matched datasets is always 
the same.
+To use this sorting method in Match, use @option{--kdtree=disable}.
 
 @item k-d tree based
-The k-d tree concept is much more abstract, but powerful (addressing all the 
caveats of the sort-based method).
-In short a k-d tree is a partitioning of a k-dimentional space.
-The k-d tree of table A is another table with the same number of rows, but 
only two integer columns: the integers contain the row indexs (counting from 
zero) of the left and right branch of that row.
-For more on the k-d tree and Gnuastro's implementation of it, please see 
@ref{K-d tree}.
+The k-d tree concept is much more abstract, but powerful (addressing all the 
caveats of the sort-based method described above.).
+In short a k-d tree is a partitioning of a k-dimentional space (``k'' is just 
a place-holder, so together with ``d'' for dimension, ``k-d'' means ``any 
number of dimensions''!).
+The k-d tree of table A is another table with the same number of rows, but 
only two integer columns: the integers contain the row indexs (counting from 
zero) of the left and right ``branch'' (in the ``tree'') of that row.
+With a k-d tree we can find the nearest point with much fewer (statistically) 
checks, compared to always parsing from the top-down.
+For more on the k-d tree concept and Gnuastro's implementation, please see 
@ref{K-d tree}.
 
-When given two catalogs (like the command below), Gnuastro's Match will 
internally construct a k-d tree for catalog A (the first catalog given to it) 
and use it for matching with catalog B.
+When given two catalogs (like the command below), Gnuastro's Match will 
internally construct a k-d tree for catalog A (the first catalog given to it) 
and use the k-d tree of A, for finding the nearest B-point(s) to each A-point 
(this is done in parallel on all available CPU threads, unless you specify a 
certain number of threads to use with @option{--numthreads}, see 
@ref{Multi-threaded operations})
 @example
-$ astmatch A.fits --ccol1=ra,dec B.fits --ccol2=ra,dec \
+$ astmatch A.fits --ccol1=ra,dec B.fits --ccol2=RA,DEC \
            --aperture=1/3600
 @end example
-However, optionally, you can also build the k-d tree of A and save it into a 
file, with a separate call to Match (with @option{--kdtree=build} and 
@option{--output=mykdtree.fits}).
-This external k-d tree can be fed to Match later (to avoid having to 
reconstruct it every time you want to match with the same first catalog; using 
@option{--kdtree=mykdtree.fits}).
-
-For each point in catalog B, parsing the k-d tree will enable discovery of the 
nearest point in catalog A.
-But the advantage (compared to parsing from the first element of A to the 
last, for the nearest point) is that the k-d tree will find the nearest item 
with much fewer (statistically) checks.
+However, optionally, you can also build the k-d tree of A and save it into a 
file, with a separate call to Match, like below
+@example
+$ astmatch A.fits --ccol1=ra,dec --kdtree=build \
+           --output=A-kdtree.fits
+@end example
+This external k-d tree (@file{A-kdtree.fits}) can be fed to Match later (to 
avoid having to reconstruct it every time you want to match a new catalog with 
A) like below for matching both @file{B.fits} and @file{C.fits} with 
@file{A.fits}.
+Note that the same @option{--kdtree} option above, is now given the file name 
of the k-d tree, instead of @code{build}.
+@example
+$ astmatch A.fits --ccol1=ra,dec --kdtree=A-kdtree.fits \
+           B.fits --ccol2=RA,DEC --aperture=1/3600 \
+           --output=A-B.fits
+$ astmatch A.fits --ccol1=ra,dec --kdtree=A-kdtree.fits \
+           C.fits --ccol2=RA,DEC --aperture=1/3600 \
+           --output=A-C.fits
+@end example
 
-Therefore, the rows of catalog B will be distributed within the T threads of 
your CPU.
-The k-d tree is parsed independently for each row of B in each thread, to find 
the nearest element of A to that row in B.
-Once all threads are finished the inverse checking process mentioned above 
will be implemented to find the best matchs.
+Irrespective of how the k-d tree is made ready (by importing or by 
constructing internally), it will be used to find the nearest A-point to each 
B-point.
+The k-d tree is parsed independently (on different CPU threads) for each row 
of B.
 
-There is just one technical issue however: when there is no neighbor within 
the acceptable distance of the k-d tree, it is forced to parse all elements.
-Therefore if one catalog only overlaps with the larger catalog in a small 
area, the algorithm will be forced to parse the full k-d tree for teh majority 
of points!
+There is just one technical issue however: when there is no neighbor within 
the acceptable distance of the k-d tree, it is forced to parse all elements to 
confirm that there is no match!
+Therefore if one catalog only covers a small portion (in the coordinate space) 
of the other catalog, the k-d tree algorithm will be forced to parse the full 
k-d tree for the majority of points!
 This will dramatically decrease the running speed of Match.
-Therefore, Match first divides the range of the first input in all its 
dimensions into 1000 bins (similar to a histogram), and will only do the k-d 
tree based search when the point in catalog B actually falls within that bin.
+Therefore, Match first divides the range of the first input in all its 
dimensions into bins that have a width of the requested aperture (similar to a 
histogram), and will only do the k-d tree based search when the point in 
catalog B actually falls within a bin that has at least one element in A.
 @end table
 
+Above, we described different ways of finding the @mymath{A_i} that is nearest 
to each @mymath{B_j}.
+But this isn't the whole matching process!
+Let's go ahead with a ``basic'' description of what happens next...
+You may be tempted to remove @mymath{A_i} from the search of matches for 
@mymath{B_k} (where @mymath{k>j}).
+Therefore, as you go down B (and more matches are found), you have to 
calculate less distances (there are fewer elements in A that remain to be 
checked).
+However, this will introduce an important bias: @mymath{A_i} may actually be 
closer to @mymath{B_k} than to @mymath{B_j}!
+But because @mymath{B_j} happened to be before @mymath{B_k} in your table, 
@mymath{A_i} was removed from the potential search domain of @mymath{B_k}.
+The good match (@mymath{B_k} with @mymath{A_i} will therefore be lost, and 
replaced by a false match between @mymath{B_j} and @mymath{A_i}!
+
+In a single-dimentional match, this bias depends on the sorting of your two 
datasets (leading to different matches if you shuffle your datasets).
+But it will get more complex as you add dimensionality.
+For example catalogs dervied from 2D images or 3D cubes, where you have 2 and 
3 different coordinates for each point.
+
+To address this problem, in Gnuastro (the Match program, or the matching 
functions of the library) similar to above, we first parse over the elements of 
B.
+But we won't associate the first nearest-neighbor with a match!
+Instead, we will use an array (with the number of rows in A, let's call it 
``B-in-A'') to keep the list of all nearest element(s) in B that match each 
A-point.
+Once all the points in B are parsed, each A-point in B-in-A will (possibly) 
have a sorted list of B-points (there may be multiple B-points that fall within 
the acceptable aperture of each A-point).
+In the previous example, the @mymath{i} element (corresponding to 
@mymath{A_i}) of B-in-A will contain the following list of B-points: 
@mymath{B_j} and @mymath{B_k}.
+
+A new array (with the number of points in B, let's call it A-in-B) is then 
used to find the final match.
+We parse over B-in-A (that was completed above), and from it extract the 
nearest B-point to each A-point (@mymath{B_k} for @mymath{A_i} in the example 
above).
+If this is the first A-point that is found for this B-point, then we put this 
A-point into A-in-B (in the example above, element @mymath{k} is filled with 
@mymath{A_k}).
+If another A-point was previously found for this B-point, then the distance of 
the two A-points to that B-point are compared, and the A-point with the smaller 
distance is kept in A-in-B.
+This will give us the best match between the two catalogs, independent of any 
sorting issues.
+Both the B-in-A and A-in-B will also keep the distances, so distances are only 
measured once.
+
 @noindent
-In summary, here are the take-home messages.
+In summary, here are the points to consider when selecting an algorithm, or 
the order of your inputs (for optimal speed, the match will the the same):
 @itemize
 @item
 For larger datasets, the k-d tree based method (when running on all threads 
possible) is much more faster than the classical sort-based method.
 @item
 The k-d tree is constructed for the first input table and the multi-threading 
is done on the rows of the second input table.
 The construction of a larger dataset's k-d tree will take longer, but 
multi-threading will work better when you have more rows.
-As a result, the optimal way to run match is to give the smaller table as the 
first argument (so its k-d tree is constructed), and the larger table as the 
second argument (so its rows are checked in parallel).
+As a result, the optimal way to place your inputs is to give the smaller input 
table (with fewer rows) as the first argument (so its k-d tree is constructed), 
and the larger table as the second argument (so its rows are checked in 
parallel).
 @item
 If you always need to match against one catalog (that is large!), the k-d tree 
construction itself can take a significant fraction of the running time.
-Therefore you can save its k-d tree into a file and simply give it to later 
calls.
+Therefore you can save its k-d tree into a file and simply give it to later 
calls, like the example given in the description of the k-d algorithm mentioned 
above.
 @end itemize
 
 @node Invoking astmatch,  , Matching algorithms, Match
@@ -20061,17 +20087,22 @@ One line examples:
 ## The wavelengths are in the 5th and 10th columns respectively.
 $ astmatch --aperture=5e-10 --ccol1=5 --ccol2=10 in1.fits in2.txt
 
-## Match the two catalogs with a circular aperture of width 2.
-## (Units same as given positional columns).
-## (By default two columns are given for `--ccol1' and `--ccol2',
-##  The number of values to these determines the dimensions).
-$ astmatch --aperture=2 input1.txt input2.fits
+## Find the row that is closest to (RA,DEC) of (12.3456,6.7890)
+## with a maximum distance of 1 arcseconds (1/3600 degrees).
+$ astmatch input1.txt --ccol1=ra,dec --coord=12.3456,6.7890 \
+           --aperture=1/3600
+
+## Find matching rows of two catalogs with a circular aperture
+## of width 2 (same unit as position columns: pixels in this case).
+$ astmatch input1.txt input2.fits --aperture=2 \
+           --ccol1=X,Y --ccol2=IMG_X,IMG_Y
 
 ## Similar to before, but the output is created by merging various
 ## columns from the two inputs: columns 1, RA, DEC from the first
 ## input, followed by all columns starting with `MAG' and the `BRG'
-## column from second input and finally the 10th from first input.
-$ astmatch --aperture=2 input1.txt input2.fits                   \
+## column from second input and the 10th column from first input.
+$ astmatch input1.txt input2.fits --aperture=1/3600 \
+           --ccol1=ra,dec --ccol2=RAJ2000,DEJ2000 \
            --outcols=a1,aRA,aDEC,b/^MAG/,bBRG,a10
 
 ## Assuming both inputs have the same column metadata (same name
@@ -20097,26 +20128,43 @@ $ astmatch --ccol1=2,3,4 --ccol2=2,3,4 
-a0.5/3600,0.5/3600,5e-10 \
 @end example
 
 Match will find the rows that are nearest to each other in two catalogs (given 
some coordinate columns).
-Therefore two catalogs are necessary for input.
-However, they don't necessarily have to be files:
-1) the first catalog can also come from the standard input (for example a 
pipe, see @ref{Standard input});
-2) when only one point is needed, you can use the @option{--coord} option to 
avoid creating a file for the second catalog.
-When the inputs are files, they can be plain text tables or FITS tables, for 
more see @ref{Tables}.
+Alternatively, it can construct the k-d tree of one catalog to save in a FITS 
file for future matching of the same catalog with many others.
+To understand the inner working of Match and its algorithms, see @ref{Matching 
algorithms}.
+
+When matching, two catalogs are necessary for input.
+But for constructing a k-d tree, only a single catalog should be given.
+The input tables can be plain text tables or FITS tables, for more see 
@ref{Tables}.
+But other ways of feeding inputs area also supported:
+@itemize
+@item
+The @emph{first} catalog can also come from the standard input (for example a 
pipe that feeds the output of a previous command to Match, see @ref{Standard 
input});
+@item
+When you only want to match one point with another catalog, you can use the 
@option{--coord} option to avoid creating a file for the @emph{second} input 
catalog.
+@end itemize
 
 Match follows the same basic behavior of all Gnuastro programs as fully 
described in @ref{Common program behavior}.
 If the first input is a FITS file, the common @option{--hdu} option (see 
@ref{Input output options}) should be used to identify the extension.
 When the second input is FITS, the extension must be specified with 
@option{--hdu2}.
 
-When @option{--quiet} is not called, Match will print the number of matches 
found in standard output (on the command-line).
-When matches are found, by default, the output file(s) will be the re-arranged 
input tables such that the rows match each other: both output tables will have 
the same number of rows which are matched with each other.
-If @option{--outcols} is called, the output is a single table with rows chosen 
from either of the two inputs in any order.
-If the @option{--logasoutput} option is called, the output will be a single 
table with the contents of the log file, see below.
+When @option{--quiet} is not called, Match will print its various processing 
phases (including the number of matches found) in standard output (on the 
command-line).
+When matches are found, by default, two tables will be output (if in FITS 
format, as two HDUs).
+Each output table will contain the re-arranged rows of the respective input 
table.
+In other words, both tables will have the same number of rows, and row N in 
both corresponds to the 10th match between the two.
 If no matches are found, the columns of the output table(s) will have zero 
rows (with proper meta-data).
+The output format can be changed with the following options:
+@itemize
+@item
+@option{--outcols}: The output will be a single table with rows chosen from 
either of the two inputs in any order.
+@item
+@option{--notmatched}: The output tables will contain the rows that didn't 
match between the two tables.
+If called with @option{--outcols}, the output will be a single table with all 
non-matched rows of both tables.
+@item
+@option{--logasoutput}: The output will be a single table with the contents of 
the log file, see below.
+@end itemize
 
 If no output file name is given with the @option{--output} option, then 
automatic output @ref{Automatic output} will be used to determine the output 
name(s).
-Depending on @option{--tableformat} (see @ref{Input output options}), the 
output will then be a (possibly multi-extension) FITS file or (possibly two) 
plain text file(s).
-When the output is a FITS file, the default re-arranged inputs will be two 
extensions of the output FITS file.
-With @option{--outcols} and @option{--logasoutput}, the FITS output will be a 
single table (in one extension).
+Depending on @option{--tableformat} (see @ref{Input output options}), the 
output will be a (possibly multi-extension) FITS file or (possibly two) plain 
text file(s).
+Generally, giving a filename to @option{--output} is recommended.
 
 When the @option{--log} option is called (see @ref{Operating mode options}), 
and there was a match, Match will also create a file named @file{astmatch.fits} 
(or @file{astmatch.txt}, depending on @option{--tableformat}, see @ref{Input 
output options}) in the directory it is run in.
 This log table will have three columns.
@@ -20131,6 +20179,7 @@ In this case, the output file (possibly given by the 
@option{--output} option) w
 @strong{@option{--log} isn't thread-safe}: As described above, when 
@option{--logasoutput} is not called, the Log file has a fixed name for all 
calls to Match.
 Therefore if a separate log is requested in two simultaneous calls to Match in 
the same directory, Match will try to write to the same file.
 This will cause problems like unreasonable log file, undefined behavior, or a 
crash.
+Remember that @option{--log} is mainly intended for debuging purposes, if you 
want the log file with a specific name, simply use @option{--logasoutput} 
(which will also be faster, since no arranging of the input columns is 
necessary).
 @end cartouche
 
 @table @option
@@ -20140,6 +20189,29 @@ The extension/HDU of the second input if it is a FITS 
file.
 When it isn't a FITS file, this option's value is ignored.
 For the first input, the common option @option{--hdu} must be used.
 
+@item -k STR
+@itemx --kdtree=STR
+Select the algorithm and/or the way to construct or import the k-d tree.
+A summary of the four acceptable strings for this option are described here 
for completeness.
+However, for a much more detailed discussion on Match's algorithms with 
examples, see @ref{Matching algorithms}.
+@table @code
+@item internal
+Construct a k-d tree for the first input internally (within the same run of 
Match), and parallelize over the rows of the second to find the nearest points.
+This is the default algorithm/method used by Match (when this option isn't 
called).
+@item build
+Only construct a k-d tree of a single input and abort.
+The name of the k-d tree is value to @option{--output}.
+@item CUSTOM-FITS-FILE
+Use the given FITS file as a k-d tree (that was previously constructed with 
Match itself) of the first input, and don't construct any k-d tree internally.
+The FITS file should have two columns with an unsigned 32-bit integer data 
type and a @code{KDTROOT} keyword that contains the index of the root of the 
k-d tree.
+For more on Gnuastro's k-d tree format, see @ref{K-d tree}.
+@item disable
+Don't use the k-d tree algorithm for finding the nearest neighbor, intead, use 
the sort-based method.
+@end table
+
+@item --kdtreehdu=STR
+The HDU of the FITS file, when a FITS file is given to the @option{--kdtree} 
option that was described above.
+
 @item --outcols=STR[,STR,[...]]
 Columns (from both inputs) to write into a single matched table output.
 The value to @code{--outcols} must be a comma-separated list of column 
identifiers (number or name, see @ref{Selecting table columns}).
diff --git a/lib/binary.c b/lib/binary.c
index ba0568f..e311223 100644
--- a/lib/binary.c
+++ b/lib/binary.c
@@ -47,6 +47,41 @@ along with Gnuastro. If not, see 
<http://www.gnu.org/licenses/>.
 /*****************      Erosion and dilation      ********************/
 /*********************************************************************/
 static void
+binary_erode_dilate_1d(gal_data_t *input, int dilate0_erode1)
+{
+  size_t i, nr=input->dsize[0];
+  uint8_t f, b, *pt, *fpt, *byt=input->array;
+
+  /* Do a sanity check: */
+  if(dilate0_erode1!=1 && dilate0_erode1!=0)
+    error(EXIT_FAILURE, 0, "%s: a bug! Please contact us at %s so we can "
+          "fix this problem. The value to 'dilate0_erode1' is %u while it "
+          "should be 0 or 1", __func__, PACKAGE_BUGREPORT, dilate0_erode1);
+
+  /* Set the foreground and background values. */
+  if(dilate0_erode1==0) {f=1; b=0;}
+  else                  {f=0; b=1;}
+
+  /* Check the two extremes. */
+  if(byt[0]==b && byt[1]==f ) byt[0]=GAL_BINARY_TMP_VALUE;
+  if(nr>2)
+    if(byt[nr-1]==b && byt[nr-2]==f) byt[nr-1] = GAL_BINARY_TMP_VALUE;
+
+  /* Go over the full array. */
+  for(i=1;i<nr;++i)
+    if( byt[i]==b && ( byt[i-1]==f || byt[i+1]==f ) )
+      byt[i]=GAL_BINARY_TMP_VALUE;
+
+  /* Set all the changed pixels to the proper values: */
+  fpt=(pt=byt)+nr;
+  do *pt = *pt==GAL_BINARY_TMP_VALUE ? f : *pt; while(++pt<fpt);
+}
+
+
+
+
+
+static void
 binary_erode_dilate_2d_4con(gal_data_t *input, int dilate0_erode1)
 {
   uint8_t f, b, *pt, *fpt, *byt=input->array;
@@ -290,6 +325,10 @@ binary_erode_dilate(gal_data_t *input, size_t num, int 
connectivity,
   /* Go over every element and do the erosion. */
   switch(binary->ndim)
     {
+    case 1:
+      binary_erode_dilate_1d(binary, d0e1);
+      break;
+
     case 2:
       for(counter=0;counter<num;++counter)
         switch(connectivity)
diff --git a/lib/match.c b/lib/match.c
index bc6bf40..f62c07d 100644
--- a/lib/match.c
+++ b/lib/match.c
@@ -34,6 +34,7 @@ along with Gnuastro. If not, see 
<http://www.gnu.org/licenses/>.
 #include <gnuastro/box.h>
 #include <gnuastro/list.h>
 #include <gnuastro/blank.h>
+#include <gnuastro/binary.h>
 #include <gnuastro/kdtree.h>
 #include <gnuastro/pointer.h>
 #include <gnuastro/threads.h>
@@ -287,10 +288,9 @@ match_rearrange(gal_data_t *A, gal_data_t *B, struct 
match_sfll **bina)
   size_t ai, ar=A->size, br=B->size;
 
   /* Allocate the space for 'ainb' and initialize it to NaN (since zero is
-     meaningful) in this context (both for indexs and also for
-     floats). This is a two column array that will keep the distance and
-     index of the closest element in catalog 'a' for each element in
-     catalog b. */
+     meaningful in this context; both for indexs and also for floats). This
+     is a two column array that will keep the distance and index of the
+     closest element in catalog 'a' for each element in catalog b. */
   errno=0; ainb=calloc(2*br, sizeof *ainb);
   if(ainb==NULL)
     error(EXIT_FAILURE, errno, "%s: %zu bytes for 'ainb'", __func__,
@@ -300,11 +300,12 @@ match_rearrange(gal_data_t *A, gal_data_t *B, struct 
match_sfll **bina)
   /* Go over each object in catalog 'a' and re-distribute the near objects,
      to find which ones in catalog 'a' are within the search radius of
      catalog b in a sorted manner. Note that we only need the 'ai' with the
-     minimum distance to 'bi', the rest are junk.*/
+     minimum distance to 'bi', the rest are junk. */
   for( ai=0; ai<ar; ++ai )
     while( bina[ai] )  /* As long as its not NULL.            */
       {
-       /* Pop out a 'bi' and its distance to this 'ai' from 'bina'. */
+       /* Pop out a 'bi' and its distance to this 'ai' from 'bina' (this
+           is the nearest B item to this A element). */
        match_pop_from_sfll(&bina[ai], &bi, &r);
 
        /* If nothing has been put here (the 'isnan' condition below is
@@ -988,8 +989,7 @@ struct match_kdtree_params
   double              *a[3];  /* Direct pointers to column arrays.    */
   double              *b[3];  /* Direct pointers to column arrays.    */
   struct match_sfll  **bina;  /* Second cat. items in first.          */
-  gal_data_t         *Abins;  /* Bins along each dimension of A.      */
-  gal_data_t        *Aexist;  /* Bins along each dimension of A.      */
+  gal_data_t        *Aexist;  /* If any element of A exists in bins.  */
   double         *Abinwidth;  /* Width of bins along each dimension.  */
   double              *Amin;  /* Minimum value of A along each dim.   */
   double              *Amax;  /* Maximum value of A along each dim.   */
@@ -999,27 +999,166 @@ struct match_kdtree_params
 
 
 
+/* Find the "coverage" of A along each dimension to help in rejecting
+   non-matches without even calling the k-d tree function.
+
+   The 'MATCH_KDTREE_COVERAGE_MAXBINS' is currently just a place-holder to
+   get the other parts of the algorithm going. But most probably there is a
+   way to optimally select the maximum number automatically. */
+#define MATCH_KDTREE_COVERAGE_MAXBINS 10000
+static void
+match_kdtree_A_coverage(struct match_kdtree_params *p)
+{
+  double *d, min, max;
+  size_t *s, *sf, dim, two=2, numbins;
+  gal_data_t *tmp, *stat, *hist, *range=NULL, *bins=NULL;
+
+  /* Allocate the space to keep the range of first input dimensions. */
+  p->Amin=gal_pointer_allocate(GAL_TYPE_FLOAT64, p->ndim, 0,
+                               __func__, "p->Amin");
+  p->Amax=gal_pointer_allocate(GAL_TYPE_FLOAT64, p->ndim, 0,
+                               __func__, "p->Amax");
+  p->Abinwidth=gal_pointer_allocate(GAL_TYPE_FLOAT64, p->ndim, 0,
+                                    __func__, "p->Abinwidth");
+
+  /* Set the coverage along each dimension. */
+  dim=0;
+  p->Aexist=NULL;
+  for(tmp=p->A; tmp!=NULL; tmp=tmp->next)
+    {
+      /* Find the number of bins based on the range and aperture size. */
+      stat=gal_statistics_minimum(tmp);
+      min=((double *)(stat->array))[0];
+      gal_data_free(stat);
+      stat=gal_statistics_maximum(tmp);
+      max=((double *)(stat->array))[0];
+      gal_data_free(stat);
+
+      /* Set the generic constants. */
+      p->Amin[dim] = min - p->aperture[0];
+      p->Amax[dim] = max + p->aperture[0];
+      numbins=(p->Amax[dim] - p->Amin[dim])/p->aperture[0];
+      if(numbins>MATCH_KDTREE_COVERAGE_MAXBINS)
+        numbins=MATCH_KDTREE_COVERAGE_MAXBINS;
+      if(numbins==0) numbins=1;
+
+      /*************************/
+      //numbins=1;
+      /*************************/
+
+      /* Generate the 'Aexist' list for this dimension. Note that if we
+         have a single bin in this dimension, we can just set everything
+         automatically. */
+      if(numbins==1)
+        {
+          /* We only have one bin, so set the width and a single-element
+             histogram. */
+          p->Abinwidth[dim] = p->Amax[dim] - p->Amin[dim];
+          hist=gal_data_alloc(NULL, GAL_TYPE_UINT8, 1, &numbins, NULL,
+                              0, -1, 1, NULL, NULL, NULL);
+          ((uint8_t *)(hist->array))[0]=1;
+        }
+      else
+        {
+          /* Set the 'range' for the bins. */
+          range=gal_data_alloc(NULL, GAL_TYPE_FLOAT64, 1, &two, NULL,
+                               0, -1, 1, NULL, NULL, NULL);
+          d=range->array;
+          d[0]=p->Amin[dim];
+          d[1]=p->Amax[dim];
+
+          /* Generate the histogram of elements in this dimension. */
+          bins=gal_statistics_regular_bins(tmp, range, numbins, NAN);
+          hist=gal_statistics_histogram(tmp, bins, 0, 0);
+
+          /* Set all histograms with atleast one element to 1 and convert
+             it to 8-bit unsigned integer. */
+          sf = (s=hist->array) + hist->size; do *s=*s>0; while(++s<sf);
+          hist=gal_data_copy_to_new_type_free(hist, GAL_TYPE_UINT8);
+
+          /* Dilate the binary histogram to avoid bin-edge-effect (missing
+             a match because the two points are on opposite sides of the
+             bin boundary). Note that we know that the bins are
+             equal/larger than ther user's given aperture and that these
+             bins are only for rejecting points before the k-d tree (they
+             aren't used within the k-d tree matching). */
+          gal_binary_dilate(hist, 1, 1, 1);
+
+          /* Set the general bin properties along this dimension. */
+          d=bins->array;
+          p->Abinwidth[dim] = d[1]-d[0];
+          p->Amin[dim]      = d[ 0            ] - p->Abinwidth[dim]/2;
+          p->Amax[dim]      = d[ bins->size-1 ] + p->Abinwidth[dim]/2;
+        }
+
+      /* For a check.
+      {
+        size_t i;
+        double *d;
+        uint8_t *u=hist->array;
+        printf("\ndim: %zu\n", dim);
+        printf("min: %g\n", p->Amin[dim]);
+        printf("max: %g\n", p->Amax[dim]);
+        printf("binwidth: %g\n", p->Abinwidth[dim]);
+        printf("----------------\n");
+        if(bins)
+          {
+            d=bins->array;
+            for(i=0;i<bins->size;++i)
+              printf("%zu: %-15.8f%-15.8f%u\n", i, d[i]-p->Abinwidth[dim]/2,
+                     d[i]+p->Abinwidth[dim]/2, u[i]);
+          }
+        else
+          printf("0: %-15.8f%-15.8f%u\n", p->Amin[dim],
+                 p->Amax[dim], u[0]);
+        printf("----------------\n");
+      } */
+
+      /* Add the histogram to the list and increment the dimensionality. */
+      gal_list_data_add(&p->Aexist, hist);
+      ++dim;
+
+      /* Clean up (these are done here in case the 'For a check' is
+         uncommented, and we want to debug things). */
+      if(bins)  { gal_data_free(bins);  bins=NULL;  }
+      if(range) { gal_data_free(range); range=NULL; }
+    }
+
+  //printf("%s: Good!\n", __func__); exit(0);
+
+  /* Reverse the list to be in the proper dimensional order. */
+  gal_list_data_reverse(&p->Aexist);
+}
+
+
+
+
+
 static void
 match_kdtree_sanity_check(struct match_kdtree_params *p)
 {
-  double *d;
-  gal_data_t *tmp, *hist, *bins;
-  size_t *s, *sf, dim, numbins=1000;
+  gal_data_t *tmp;
 
   /* Make sure all coordinates and the k-d tree have the same number of
      rows. */
   p->ndim=gal_list_data_number(p->A);
-  if( (   p->ndim != gal_list_data_number(p->B))
-      || (p->ndim != gal_list_data_number(p->A_kdtree)) )
-    error(EXIT_FAILURE, 0, "%s: the 'coord1', 'coord2' and "
-          "'coord1_kdtree' arguments should have the same number "
-          "of nodes/columns (elements in a simply linked list). "
-          "But they each respectively have %zu, %zu and %zu "
-          "nodes/columns", __func__, p->ndim,
+  if( p->ndim != gal_list_data_number(p->B) )
+    error(EXIT_FAILURE, 0, "%s: the 'coord1' and 'coord2' arguments "
+          "should have the same number of nodes/columns (elements "
+          "in a simply linked list). But they each respectively "
+          "have %zu, %zu and %zu nodes/columns", __func__, p->ndim,
           gal_list_data_number(p->B),
           gal_list_data_number(p->A_kdtree));
 
-  /* Make sure the coordinates have a 'double' type. */
+  /* Make sure that the k-d tree only has two columns. */
+  if( gal_list_data_number(p->A_kdtree)!=2 )
+    error(EXIT_FAILURE, 0, "%s: the 'kdtree' argument should only "
+          "two nodes/columns (elements in a simply linked list), "
+          "but it has %zu nodes/columns", __func__,
+          gal_list_data_number(p->A_kdtree));
+
+  /* Make sure the coordinates have a 'double' type and that the k-d tree
+     has an unsigned 32-bit integer type.*/
   for(tmp=p->A; tmp!=NULL; tmp=tmp->next)
     if( tmp->type!=GAL_TYPE_FLOAT64 )
       error(EXIT_FAILURE, 0, "%s: the type of all columns in 'coord1' "
@@ -1060,57 +1199,11 @@ match_kdtree_sanity_check(struct match_kdtree_params *p)
       if( p->B->next->next ) p->b[2]=p->B->next->next->array;
     }
 
-  /* Allocate the space to keep the range of first input dimensions. */
-  p->Amin=gal_pointer_allocate(GAL_TYPE_FLOAT64, p->ndim, 0,
-                               __func__, "p->Amin");
-  p->Amax=gal_pointer_allocate(GAL_TYPE_FLOAT64, p->ndim, 0,
-                               __func__, "p->Amax");
-  p->Abinwidth=gal_pointer_allocate(GAL_TYPE_FLOAT64, p->ndim, 0,
-                                    __func__, "p->Abinwidth");
-
   /* Find the bins of the first input along all its dimensions and select
      those that contain data. This is very important in optimal k-d tree
      based matching because confirming a non-match in a k-d tree is very
      computationally expensive. */
-  dim=0;
-  p->Abins=p->Aexist=NULL;
-  for(tmp=p->A; tmp!=NULL; tmp=tmp->next)
-    {
-      /* Generate the histogram of elements in this dimension. */
-      bins=gal_statistics_regular_bins(tmp, NULL, numbins, NAN);
-      hist=gal_statistics_histogram(tmp, bins, 0, 0);
-
-      /* Set all histograms with atleast one element to 1 and convert it to
-         8-bit unsigned integer. */
-      sf = (s=hist->array) + hist->size; do *s=*s>0; while(++s<sf);
-      hist=gal_data_copy_to_new_type_free(hist, GAL_TYPE_UINT8);
-
-      /* Set the general bin properties along this dimension. */
-      d=bins->array;
-      p->Abinwidth[dim] = d[1]-d[0];
-      p->Amin[dim]      = d[ 0            ] - p->Abinwidth[dim]/2;
-      p->Amax[dim]      = d[ bins->size-1 ] + p->Abinwidth[dim]/2;
-
-      /* For a check:
-      size_t i;
-      double *d=bins->array;
-      uint8_t *u=hist->array;
-      for(i=0;i<bins->size;++i)
-        printf("%-15.8f%-15.8f%u\n", d[i]-p->Abinwidth[dim]/2,
-               d[i]+p->Abinwidth[dim]/2, u[i]);
-      printf("\n"); */
-
-      /* Add the hist and bins to the list for later. */
-      gal_list_data_add(&p->Abins, bins);
-      gal_list_data_add(&p->Aexist, hist);
-
-      /* Increment the dimensionality. */
-      ++dim;
-    }
-
-  /* Reverse the list to be in the proper dimensional order. */
-  gal_list_data_reverse(&p->Abins);
-  gal_list_data_reverse(&p->Aexist);
+  match_kdtree_A_coverage(p);
 }
 
 
@@ -1126,7 +1219,7 @@ match_kdtree_worker(void *in_prm)
   struct match_kdtree_params *p=(struct match_kdtree_params *)tprm->params;
 
   /* High level definitions. */
-  int inrange;
+  int iscovered;
   uint8_t *existA;
   double r, delta[3];
   size_t i, j, ai, bi, h_i;
@@ -1142,18 +1235,18 @@ match_kdtree_worker(void *in_prm)
      thread. */
   for(i=0; tprm->indexs[i] != GAL_BLANK_SIZE_T; ++i)
     {
-      /* Set the easy-to-read indexs: note that this is the index in the
-         second catalog, hence 'bi'. */
+      /* Set the easy-to-read indexs: this is the index in the second
+         catalog, hence 'bi'. */
       bi = tprm->indexs[i];
 
-      /* Fill the 'point' for this thread, and check if each of its
+      /* Fill the 'point' for this thread. But first, check if each of its
          dimensions fall within the coverage of A. */
       j=0;
-      inrange=1;
+      iscovered=1;
       Aexist=p->Aexist;
       for(ccol=p->B; ccol!=NULL; ccol=ccol->next)
         {
-          if(inrange)
+          if(iscovered)
             {
               /* Fill the point location in this dimension and set the
                  pointer. */
@@ -1162,15 +1255,14 @@ match_kdtree_worker(void *in_prm)
 
               /* Make sure it covers the range of A (following the same set
                  of tests as in 'gal_statistics_histogram'). */
-              if( (    po >= p->Amin[j] - p->aperture[0] )
-                  && ( po <= p->Amax[j] + p->aperture[0] ) )
+              if( po >= p->Amin[j] && po <= p->Amax[j] )
                 {
                   h_i=(po-p->Amin[j])/p->Abinwidth[j];
                   if( existA[ h_i - (h_i==p->Aexist->size ? 1 : 0) ] == 0 )
-                    inrange=0;
+                    iscovered=0;
                 }
               else
-                inrange=0;
+                iscovered=0;
             }
 
           /* Increment the dimensionality counter. */
@@ -1179,7 +1271,7 @@ match_kdtree_worker(void *in_prm)
         }
 
       /* Continue with the match if the point is in-range. */
-      if(inrange)
+      if(iscovered)
         {
           /* Find the index of the nearest neighbor in the first catalog to
              this point in the second catalog. */
@@ -1187,21 +1279,28 @@ match_kdtree_worker(void *in_prm)
                                             p->kdtree_root, point,
                                             &least_dist);
 
-          /* Make sure the matched point is within the given aperture
-             (which may be elliptical). If it isn't, then the reported
-             index for this element will be 'GAL_BLANK_SIZE_T'. */
-          for(j=0;j<p->ndim;++j)
-            delta[j]=p->b[j][bi] - p->a[j][ai];
-          r=match_distance(delta, p->iscircle, p->ndim, p->aperture,
-                           p->c, p->s);
-
-          /* If the radial distance is smaller than the radial measure,
-             then add this item to a match with 'ai'. */
-          if(r<p->aperture[0])
-            match_add_to_sfll(&p->bina[ai], bi, r);
+          /* If nothing was found within the least distance, then the 'ai'
+             will be 'GAL_BLANK_SIZE_T'. */
+          if(ai!=GAL_BLANK_SIZE_T)
+            {
+              /* Make sure the matched point is within the given aperture
+                 (which may be elliptical). */
+              for(j=0;j<p->ndim;++j)
+                delta[j]=p->b[j][bi] - p->a[j][ai];
+              r=match_distance(delta, p->iscircle, p->ndim, p->aperture,
+                               p->c, p->s);
+
+              /* If the radial distance is smaller than the radial measure,
+                 then add this item to a match with 'ai'. */
+              if(r<p->aperture[0])
+                match_add_to_sfll(&p->bina[ai], bi, r);
+            }
 
           /* For a check:
-             printf("first[%zu] matched with second[%zu]\n", ai, bi);
+          if(ai==GAL_BLANK_SIZE_T)
+            printf("second[%zu] matched with first[%zu].\n", bi);
+          else
+            printf("second[%zu] DIDN'T match with first.\n", bi, bi);
           */
         }
     }
@@ -1258,7 +1357,8 @@ gal_match_kdtree(gal_data_t *coord1, gal_data_t *coord2,
   /* Basic sanity checks. */
   match_kdtree_sanity_check(&p);
 
-  /* Find all of the items in the second catalog items in the first. */
+  /* Find all of the second catalog points that are within the acceptable
+     radius of the first. */
   match_kdtree_second_in_first(&p, numthreads, minmapsize, quietmmap);
 
   /* Find the best match for each item (from possibly multiple matches). */
@@ -1276,7 +1376,6 @@ gal_match_kdtree(gal_data_t *coord1, gal_data_t *coord2,
   free(p.Amin);
   free(p.Amax);
   free(p.Abinwidth);
-  gal_list_data_free(p.Abins);
   gal_list_data_free(p.Aexist);
   return out;
 }
diff --git a/lib/statistics.c b/lib/statistics.c
index e0720e9..0fbe9d8 100644
--- a/lib/statistics.c
+++ b/lib/statistics.c
@@ -1653,15 +1653,15 @@ gal_statistics_regular_bins(gal_data_t *input, 
gal_data_t *inrange,
       /* Set the minimum and maximum of the bins. */
       ra=range->array;
       if( (range->size)%2 )
-        error(EXIT_FAILURE, 0, "%s: quantile ranges are not implemented yet",
-              __func__);
+        error(EXIT_FAILURE, 0, "%s: quantile ranges are not "
+              "implemented yet", __func__);
       else
         {
           /* If the minimum isn't set (is blank), find it. */
           if( isnan(ra[0]) )
             {
               tmp=gal_data_copy_to_new_type_free(
-                            gal_statistics_minimum(input), GAL_TYPE_FLOAT64);
+                          gal_statistics_minimum(input), GAL_TYPE_FLOAT64);
               min=*((double *)(tmp->array));
               gal_data_free(tmp);
             }
@@ -1681,9 +1681,10 @@ gal_statistics_regular_bins(gal_data_t *input, 
gal_data_t *inrange,
           else max=ra[1];
         }
 
-      /* Clean up: if 'range' was allocated. */
+      /* Clean up: if 'range' was allocated within this function. */
       if(range!=inrange) gal_data_free(range);
     }
+
   /* No range was given, find the minimum and maximum. */
   else
     {
@@ -1781,6 +1782,9 @@ gal_statistics_histogram(gal_data_t *input, gal_data_t 
*bins, int normalize,
      functionality. */
   if(bins==NULL)
     error(EXIT_FAILURE, 0, "%s: 'bins' is NULL", __func__);
+  if(bins->size==1)
+    error(EXIT_FAILURE, 0, "%s: 'bins' has to have more than "
+          "one element", __func__);
   if(bins->status!=GAL_STATISTICS_BINS_REGULAR)
     error(EXIT_FAILURE, 0, "%s: the input bins are not regular. Currently "
           "it is only implemented for regular bins", __func__);
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 7cf7cbf..3b2112d 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -126,12 +126,13 @@ if COND_FITS
   fits/copyhdu.sh: fits/write.sh.log mkprof/mosaic2.sh.log
 endif
 if COND_MATCH
-  MAYBE_MATCH_TESTS = match/positions.sh match/merged-cols.sh \
-  match/kdtree.sh
+  MAYBE_MATCH_TESTS = match/sort-based.sh match/merged-cols.sh \
+  match/kdtree-internal.sh match/kdtree-separate.sh
 
-  match/kdtree.sh: prepconf.sh.log
-  match/positions.sh: prepconf.sh.log
+  match/sort-based.sh: prepconf.sh.log
   match/merged-cols.sh: prepconf.sh.log
+  match/kdtree-internal.sh: prepconf.sh.log
+  match/kdtree-separate.sh: prepconf.sh.log
 endif
 if COND_MKCATALOG
   MAYBE_MKCATALOG_TESTS = mkcatalog/detections.sh mkcatalog/simple-3d.sh   \
diff --git a/tests/match/kdtree.sh b/tests/match/kdtree-internal.sh
similarity index 81%
copy from tests/match/kdtree.sh
copy to tests/match/kdtree-internal.sh
index 4a76150..0dcd44c 100755
--- a/tests/match/kdtree.sh
+++ b/tests/match/kdtree-internal.sh
@@ -1,4 +1,5 @@
-# Match the two input catalogs based on kdtree matching
+# Match the two input catalogs based on k-d tree matching (the k-d tree is
+# constructed and used internally).
 #
 # See the Tests subsection of the manual for a complete explanation
 # (in the Installing gnuastro section).
@@ -6,6 +7,7 @@
 # Original author:
 #     Sachin Kumar Singh <sachinkumarsingh092@gmail.com>
 # Contributing author(s):
+#     Mohammad Akhlaghi <mohammad@akhlaghi.org>
 # Copyright (C) 2021, Free Software Foundation, Inc.
 #
 # Copying and distribution of this file, with or without modification,
@@ -54,6 +56,6 @@ if [ ! -f $execname ]; then echo "$execname not created."; 
exit 77; fi
 # 'check_with_program' can be something like Valgrind or an empty
 # string. Such programs will execute the command if present and help in
 # debugging when the developer doesn't have access to the user's system.
-$check_with_program $execname $cat1 $cat2 --aperture=0.5 --ccol1=2,3   \
-                               --ccol2=2,3 --kdtree=internal           \
-                               --output=match-kdtree.fits
+$check_with_program $execname $cat1 $cat2 --aperture=0.5 \
+                              --ccol1=2,3 --ccol2=2,3 \
+                              --output=match-kdtree-internal.fits
diff --git a/tests/match/kdtree.sh b/tests/match/kdtree-separate.sh
similarity index 76%
rename from tests/match/kdtree.sh
rename to tests/match/kdtree-separate.sh
index 4a76150..7272de5 100755
--- a/tests/match/kdtree.sh
+++ b/tests/match/kdtree-separate.sh
@@ -1,10 +1,11 @@
-# Match the two input catalogs based on kdtree matching
+# Match the two input catalogs based on k-d tree matching (the k-d tree is
+# constructed first as a separate file, then used in a later call).
 #
 # See the Tests subsection of the manual for a complete explanation
 # (in the Installing gnuastro section).
 #
 # Original author:
-#     Sachin Kumar Singh <sachinkumarsingh092@gmail.com>
+#     Mohammad Akhlaghi <mohammad@akhlaghi.org>
 # Contributing author(s):
 # Copyright (C) 2021, Free Software Foundation, Inc.
 #
@@ -54,6 +55,8 @@ if [ ! -f $execname ]; then echo "$execname not created."; 
exit 77; fi
 # 'check_with_program' can be something like Valgrind or an empty
 # string. Such programs will execute the command if present and help in
 # debugging when the developer doesn't have access to the user's system.
-$check_with_program $execname $cat1 $cat2 --aperture=0.5 --ccol1=2,3   \
-                               --ccol2=2,3 --kdtree=internal           \
-                               --output=match-kdtree.fits
+$check_with_program $execname $cat1 --ccol1=2,3 --kdtree=build \
+                              --output=match-kdtree.fits
+$check_with_program $execname $cat1 $cat2 --aperture=0.5 --ccol1=2,3 \
+                              --ccol2=2,3 --kdtree=match-kdtree.fits \
+                              --output=match-kdtree-separate.fits
diff --git a/tests/match/merged-cols.sh b/tests/match/merged-cols.sh
index 01fd125..f0153ba 100755
--- a/tests/match/merged-cols.sh
+++ b/tests/match/merged-cols.sh
@@ -56,4 +56,4 @@ if [ ! -f $execname ]; then echo "$execname not created."; 
exit 77; fi
 # debugging when the developer doesn't have access to the user's system.
 $check_with_program $execname $cat1 $cat2 --aperture=0.5 --ccol1=2,3   \
                                --ccol2=2,3 -omatch-merged-cols.txt     \
-                              --outcols=a1,aEFGH,bACCU1,aIJKL,bACCU2
+                               --outcols=a1,aEFGH,bACCU1,aIJKL,bACCU2
diff --git a/tests/match/positions.sh b/tests/match/sort-based.sh
similarity index 84%
rename from tests/match/positions.sh
rename to tests/match/sort-based.sh
index dc76677..2bb013f 100755
--- a/tests/match/positions.sh
+++ b/tests/match/sort-based.sh
@@ -1,4 +1,4 @@
-# Match the two input catalogs
+# Match the two input catalogs (using the classical sort-based method).
 #
 # See the Tests subsection of the manual for a complete explanation
 # (in the Installing gnuastro section).
@@ -54,5 +54,6 @@ if [ ! -f $execname ]; then echo "$execname not created."; 
exit 77; fi
 # 'check_with_program' can be something like Valgrind or an empty
 # string. Such programs will execute the command if present and help in
 # debugging when the developer doesn't have access to the user's system.
-$check_with_program $execname $cat1 $cat2 --aperture=0.5 --log --ccol1=2,3 \
-                              --ccol2=2,3 --output=match-positions.fits
+$check_with_program $execname $cat1 $cat2 --aperture=0.5 --log \
+                              --ccol1=2,3 --ccol2=2,3 --kdtree=disable \
+                              --output=match-sort-based.fits



reply via email to

[Prev in Thread] Current Thread [Next in Thread]