gnuastro-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnuastro-commits] master 7354e33 09/19: Library (match.h): match_coordi


From: Mohammad Akhlaghi
Subject: [gnuastro-commits] master 7354e33 09/19: Library (match.h): match_coordinate_ replaced by match_sort_based_
Date: Sun, 14 Nov 2021 20:40:59 -0500 (EST)

branch: master
commit 7354e3361899e4c0481b48ba09123cc1767f77a4
Author: Sachin Kumar Singh <sachinkumarsingh092@gmail.com>
Commit: Mohammad Akhlaghi <mohammad@akhlaghi.org>

    Library (match.h): match_coordinate_ replaced by match_sort_based_
    
    Until now, there was only a single match algorithm in Gnuastro, so the name
    of the respective functions in the library had a 'match_coordinate_'
    prefix. However, in this branch we are adding a new k-d tree based
    matching, so that name for the initial algorithm could cause confusion
    (because k-d tree also uses coordinates!).
    
    With this commit, those same function names are now prefixed with
    'match_sort_based_'. This change is done in all the relevant places:
    bin/match/match.c, lib/gnuastro/match.h, lib/match.c and in
    doc/gnuastro.texi.
    
    Furthermore, a 'make check' test has been added for the k-dtree based
    matching which does simple matching based on the predefined inputs used for
    the old matching algorithm.
    
    Finally, some scripts have been added in the temporary directory for
    benchmarking tests to see the efficiency of k-d tree over sort-based
    matching. Two scripts are written in '/during-dev-test-data/scripts'
    (namely 'kdtree-gen.py', which generates the pseudo random output tests,
    and 'benchmark.py') which will calculate the run time for the scripts and
    generate a graph for number of times the program is executed vs. time
    taken. The script for benchmarking is simple and might take long time for
    >10 executions. A possible efficient method is multithreading, which will
    be implemented soon. After this we can implement the --kdtree=automatic
    option based on the result.
---
 bin/match/match.c                          |  7 +--
 doc/gnuastro.texi                          |  2 +-
 during-dev-test-data/match-query.txt       | 13 ++++--
 during-dev-test-data/scripts/benchmark.py  | 69 ++++++++++++++++++++++++++++++
 during-dev-test-data/scripts/kdtree-gen.py | 31 ++++++++++++++
 lib/gnuastro/match.h                       |  3 +-
 lib/match.c                                | 31 +++++++-------
 tests/Makefile.am                          |  4 +-
 tests/match/kdtree-match.sh                | 59 +++++++++++++++++++++++++
 9 files changed, 195 insertions(+), 24 deletions(-)

diff --git a/bin/match/match.c b/bin/match/match.c
index f421e0a..c398146 100644
--- a/bin/match/match.c
+++ b/bin/match/match.c
@@ -5,6 +5,7 @@ Match is part of GNU Astronomy Utilities (Gnuastro) package.
 Original author:
      Mohammad Akhlaghi <mohammad@akhlaghi.org>
 Contributing author(s):
+     Sachin Kumar Singh <sachinkumarsingh092@gmail.com>
 Copyright (C) 2017-2021, Free Software Foundation, Inc.
 
 Gnuastro is free software: you can redistribute it and/or modify it
@@ -497,7 +498,7 @@ match_catalog_kdtree_auto(struct matchparams *p)
 
 
 /* Wrapper over the k-d tree library to return an output in the same format
-   as 'gal_match_coordinates'. */
+   as 'gal_match_sort_based'. */
 static gal_data_t *
 match_catalog_kdtree(struct matchparams *p, size_t *nummatched)
 {
@@ -563,7 +564,7 @@ match_catalog(struct matchparams *p)
      place. Incase it was decided not to use a k-d tree at all
      (in 'automatic' mode), then we need to use the classic mode. */
   if(mcols==NULL)
-    mcols=gal_match_coordinates(p->cols1, p->cols2, p->aperture->array,
+    mcols=gal_match_sort_based(p->cols1, p->cols2, p->aperture->array,
                                 0, 1, p->cp.minmapsize, p->cp.quietmmap,
                                 &nummatched);
 
@@ -618,7 +619,7 @@ match_catalog(struct matchparams *p)
       mcols=tmp;
 
       /* We also want everything to be incremented by one. In a C
-         program, counting starts with zero, so 'gal_match_coordinates'
+         program, counting starts with zero, so 'gal_match_sort_based'
          will return indexs starting from zero. But outside a C
          program, on the command-line people expect counting to start
          from 1 (for example with AWK). */
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index 5ddd4bb..cb1accb 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -29375,7 +29375,7 @@ When the aperture is an ellipse, distances between the 
points are also calculate
 
 
 
-@deftypefun {gal_data_t *} gal_match_coordinates (gal_data_t @code{*coord1}, 
gal_data_t @code{*coord2}, double @code{*aperture}, int @code{sorted_by_first}, 
int @code{inplace}, size_t @code{minmapsize}, int @code{quietmmap}, size_t 
@code{*nummatched})
+@deftypefun {gal_data_t *} gal_match_sort_based (gal_data_t @code{*coord1}, 
gal_data_t @code{*coord2}, double @code{*aperture}, int @code{sorted_by_first}, 
int @code{inplace}, size_t @code{minmapsize}, int @code{quietmmap}, size_t 
@code{*nummatched})
 
 Use a basic sort-based match to find the matching points of two input 
coordinates.
 See the descriptions above on the format of the inputs and outputs.
diff --git a/during-dev-test-data/match-query.txt 
b/during-dev-test-data/match-query.txt
index f88c129..394bf51 100644
--- a/during-dev-test-data/match-query.txt
+++ b/during-dev-test-data/match-query.txt
@@ -1,3 +1,10 @@
-7.2      2.1
-7.1      2.01
-4.2      7.01
+14.55495528995644      13.952740903340256      14.512879995960738      
+14.156180015003525     12.80845342418684       14.826212427784835      
+13.505267742341026     13.013443271481133      11.183241471767612      
+13.09715467421369      10.936844869289065      10.692733635319755      
+11.855001056969742     14.100516383294387      11.839440238220615      
+11.766844757741728     10.752701002494069      12.958678702895487      
+12.455862588902148     10.943892162638429      11.946732314414476      
+13.386374627834787     11.49741032889326       13.000603833689892      
+14.49239538261452      11.133665168041762      12.959051224814937      
+10.968887621940603     13.55609990177711       14.863758607257214      
diff --git a/during-dev-test-data/scripts/benchmark.py 
b/during-dev-test-data/scripts/benchmark.py
new file mode 100755
index 0000000..830e642
--- /dev/null
+++ b/during-dev-test-data/scripts/benchmark.py
@@ -0,0 +1,69 @@
+#! /usr/bin/env python
+# Do benchmark test and generate resulting graphs.
+# TODO: Use multithreading for faster parallel execution.
+
+import time
+import subprocess
+import matplotlib.pyplot as plt
+
+
+######## Set the variable and execute the script. #########
+script = "/home/sachin/gnuastro_dev/gnuastro/tests/during-dev.sh"
+maximum_executions = 10
+maximun_jump_factor = 10
+graph_name = "KD-tree"
+###########################################################
+
+
+# Wrapper function for calculating runtime.
+def timed_execution(function):
+    def wrapper(args):
+        start = time.time()
+        function(args)
+        end = time.time()
+        return (end-start)
+    return wrapper
+
+
+@timed_execution
+def execute_script(script_):
+    subprocess.call([script_], stdout=subprocess.DEVNULL, 
stderr=subprocess.STDOUT)
+
+
+# Make the list of number of times the script is to be executed.
+def make_num_execution_list(num, jump_factor):
+    num_execution_list = []
+    i = 1
+    while(i <= num):
+        num_execution_list.append(i)
+        i *= jump_factor
+
+    return num_execution_list
+
+
+# Make the list for the time taken by the script to run for a particular 
number of times.
+def make_time_list(num_execution_list):
+    time_list = []
+    for num in num_execution_list:
+        time_taken = 0
+        for _ in range(int(num)):
+            t = execute_script(script)
+            time_taken += t
+        time_list.append(time_taken)
+
+    return time_list
+
+
+# Make a graph between the number of times the script is executed and the time 
taken for it.
+def make_graph(name, time_list, num_execution_list):
+    plt.plot(num_execution_list, time_list, marker='o', markerfacecolor='red', 
markersize=10)
+    plt.xlabel("number of executions")
+    plt.ylabel("time required for executions")
+    plt.title(name)
+    plt.show()
+
+
+# Interface
+if __name__ == "__main__":
+    num_execution_list = make_num_execution_list(maximum_execution, 
maximum_jump_factor)
+    make_graph(graph_name, make_time_list(num_execution_list), 
num_execution_list)
diff --git a/during-dev-test-data/scripts/kdtree-gen.py 
b/during-dev-test-data/scripts/kdtree-gen.py
new file mode 100755
index 0000000..8583506
--- /dev/null
+++ b/during-dev-test-data/scripts/kdtree-gen.py
@@ -0,0 +1,31 @@
+#! /usr/bin/env python
+# Generate random floating numbers to make a kd-tree.
+# Usage: ./kdtree-gen.py [filename] [number of rows] [number of columns]
+
+import sys
+import random
+
+
+if len(sys.argv) == 1:
+    row_num = 10
+    col_num = 2
+else:
+    row_num = sys.argv[2]
+    col_num = sys.argv[3]
+
+
+def format_row(ncols=col_num, seed=10):
+    row = ""
+    for _ in range(int(ncols)):
+        col = random.uniform(seed, seed+5)
+        row += f"{col}\t"
+    row += "\n"
+
+    return row
+
+
+if __name__ == "__main__":
+    with open(f"../{sys.argv[1]}", "w+") as input_file:
+        for _ in range(int(row_num)):
+            input_file.write(format_row())
+
diff --git a/lib/gnuastro/match.h b/lib/gnuastro/match.h
index 1921464..c0b3c44 100644
--- a/lib/gnuastro/match.h
+++ b/lib/gnuastro/match.h
@@ -5,6 +5,7 @@ This is part of GNU Astronomy Utilities (Gnuastro) package.
 Original author:
      Mohammad Akhlaghi <mohammad@akhlaghi.org>
 Contributing author(s):
+     Sachin Kumar Singh <sachinkumarsingh092@gmail.com>
 Copyright (C) 2017-2021, Free Software Foundation, Inc.
 
 Gnuastro is free software: you can redistribute it and/or modify it
@@ -45,7 +46,7 @@ along with Gnuastro. If not, see 
<http://www.gnu.org/licenses/>.
 
 
 gal_data_t *
-gal_match_coordinates(gal_data_t *coord1, gal_data_t *coord2,
+gal_match_sort_based(gal_data_t *coord1, gal_data_t *coord2,
                       double *aperture, int sorted_by_first,
                       int inplace, size_t minmapsize, int quietmmap,
                       size_t *nummatched);
diff --git a/lib/match.c b/lib/match.c
index 6a0f9fb..dbea82d 100644
--- a/lib/match.c
+++ b/lib/match.c
@@ -5,6 +5,7 @@ This is part of GNU Astronomy Utilities (Gnuastro) package.
 Original author:
      Mohammad Akhlaghi <mohammad@akhlaghi.org>
 Contributing author(s):
+     Sachin Kumar Singh <sachinkumarsingh092@gmail.com>
 Copyright (C) 2017-2021, Free Software Foundation, Inc.
 
 Gnuastro is free software: you can redistribute it and/or modify it
@@ -47,7 +48,7 @@ along with Gnuastro. If not, see 
<http://www.gnu.org/licenses/>.
 
 
 /**********************************************************************/
-/*****************   Coordinate match custom list   *******************/
+/*****************   Sort-Based match custom list   *******************/
 /**********************************************************************/
 struct match_sfll
 {
@@ -501,12 +502,12 @@ gal_match_output(gal_data_t *A, gal_data_t *B, size_t 
*A_perm,
 
 
 /********************************************************************/
-/*************            Coordinate matching           *************/
+/*************            Sort-Based matching           *************/
 /********************************************************************/
 /* Since these checks are repetative, its easier to have a separate
    function for both inputs. */
 static void
-match_coordinate_sanity_check_columns(gal_data_t *coord, char *info,
+match_sort_based_sanity_check_columns(gal_data_t *coord, char *info,
                                       int inplace, int *allf64)
 {
   gal_data_t *tmp;
@@ -545,7 +546,7 @@ match_coordinate_sanity_check_columns(gal_data_t *coord, 
char *info,
 
 /* To keep the main function clean, we'll do the sanity checks here. */
 static void
-match_coordinates_sanity_check(gal_data_t *coord1, gal_data_t *coord2,
+match_sort_based_sanity_check(gal_data_t *coord1, gal_data_t *coord2,
                                double *aperture, int inplace, int *allf64)
 {
   size_t ncoord1=gal_list_data_number(coord1);
@@ -565,8 +566,8 @@ match_coordinates_sanity_check(gal_data_t *coord1, 
gal_data_t *coord2,
           "maximum of 3 dimensions", __func__, ncoord1);
 
   /* Check the column properties. */
-  match_coordinate_sanity_check_columns(coord1, "first", inplace, allf64);
-  match_coordinate_sanity_check_columns(coord2, "second", inplace, allf64);
+  match_sort_based_sanity_check_columns(coord1, "first", inplace, allf64);
+  match_sort_based_sanity_check_columns(coord2, "second", inplace, allf64);
 
   /* Check the aperture values. */
   if(aperture[0]<=0)
@@ -607,7 +608,7 @@ match_coordinates_sanity_check(gal_data_t *coord1, 
gal_data_t *coord2,
 /* To keep things clean, the sorting of each input array will be done in
    this function. */
 static size_t *
-match_coordinates_prepare_sort(gal_data_t *coords, size_t minmapsize)
+match_sort_based_prepare_sort(gal_data_t *coords, size_t minmapsize)
 {
   size_t i;
   double *darr;
@@ -662,7 +663,7 @@ match_coordinates_prepare_sort(gal_data_t *coords, size_t 
minmapsize)
 
 /* Do the preparations for matching of coordinates. */
 static void
-match_coordinates_prepare(gal_data_t *coord1, gal_data_t *coord2,
+match_sort_based_prepare(gal_data_t *coord1, gal_data_t *coord2,
                           int sorted_by_first, int inplace, int allf64,
                           gal_data_t **A_out, gal_data_t **B_out,
                           size_t **A_perm, size_t **B_perm,
@@ -714,8 +715,8 @@ match_coordinates_prepare(gal_data_t *coord1, gal_data_t 
*coord2,
         }
 
       /* Sort each dataset by the first coordinate. */
-      *A_perm = match_coordinates_prepare_sort(*A_out, minmapsize);
-      *B_perm = match_coordinates_prepare_sort(*B_out, minmapsize);
+      *A_perm = match_sort_based_prepare_sort(*A_out, minmapsize);
+      *B_perm = match_sort_based_prepare_sort(*B_out, minmapsize);
     }
 }
 
@@ -727,7 +728,7 @@ match_coordinates_prepare(gal_data_t *coord1, gal_data_t 
*coord2,
    catalog (catalog b) are within the acceptable distance of each record in
    the first (a). */
 static void
-match_coordinates_second_in_first(gal_data_t *A, gal_data_t *B,
+match_sort_based_second_in_first(gal_data_t *A, gal_data_t *B,
                                   double *aperture,
                                   struct match_sfll **bina)
 {
@@ -883,7 +884,7 @@ match_coordinates_second_in_first(gal_data_t *A, gal_data_t 
*B,
        Node 2: Second catalog index (counting from zero).
        Node 3: Distance between the match.                    */
 gal_data_t *
-gal_match_coordinates(gal_data_t *coord1, gal_data_t *coord2,
+gal_match_sort_based(gal_data_t *coord1, gal_data_t *coord2,
                       double *aperture, int sorted_by_first,
                       int inplace, size_t minmapsize, int quietmmap,
                       size_t *nummatched)
@@ -895,9 +896,9 @@ gal_match_coordinates(gal_data_t *coord1, gal_data_t 
*coord2,
 
   /* Do a small sanity check and make the preparations. After this point,
      we'll call the two arrays 'a' and 'b'.*/
-  match_coordinates_sanity_check(coord1, coord2, aperture, inplace,
+  match_sort_based_sanity_check(coord1, coord2, aperture, inplace,
                                  &allf64);
-  match_coordinates_prepare(coord1, coord2, sorted_by_first, inplace,
+  match_sort_based_prepare(coord1, coord2, sorted_by_first, inplace,
                             allf64, &A, &B, &A_perm, &B_perm,
                             minmapsize);
 
@@ -914,7 +915,7 @@ gal_match_coordinates(gal_data_t *coord1, gal_data_t 
*coord2,
 
 
   /* All records in 'b' that match each 'a' (possibly duplicate). */
-  match_coordinates_second_in_first(A, B, aperture, bina);
+  match_sort_based_second_in_first(A, B, aperture, bina);
 
 
   /* Two re-arrangings will fix the issue. */
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 31182ff..0240bb4 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -126,10 +126,12 @@ if COND_FITS
   fits/copyhdu.sh: fits/write.sh.log mkprof/mosaic2.sh.log
 endif
 if COND_MATCH
-  MAYBE_MATCH_TESTS = match/positions.sh match/merged-cols.sh
+  MAYBE_MATCH_TESTS = match/positions.sh match/merged-cols.sh \
+  match/kdtree-match.sh
 
   match/positions.sh: prepconf.sh.log
   match/merged-cols.sh: prepconf.sh.log
+  match/kdtree-match.sh: prepconf.sh.log
 endif
 if COND_MKCATALOG
   MAYBE_MKCATALOG_TESTS = mkcatalog/detections.sh mkcatalog/simple-3d.sh   \
diff --git a/tests/match/kdtree-match.sh b/tests/match/kdtree-match.sh
new file mode 100644
index 0000000..2e8ac0e
--- /dev/null
+++ b/tests/match/kdtree-match.sh
@@ -0,0 +1,59 @@
+# Match the two input catalogs based on kdtree matching
+#
+# See the Tests subsection of the manual for a complete explanation
+# (in the Installing gnuastro section).
+#
+# Original author:
+#     Sachin Kumar Singh <sachinkumarsingh092@gmail.com>
+# Contributing author(s):
+# Copyright (C) 2015-2021, Free Software Foundation, Inc.
+#
+# Copying and distribution of this file, with or without modification,
+# are permitted in any medium without royalty provided the copyright
+# notice and this notice are preserved.  This file is offered as-is,
+# without any warranty.
+
+
+
+
+
+# Preliminaries
+# =============
+#
+# Set the variables (The executable is in the build tree). Do the
+# basic checks to see if the executable is made or if the defaults
+# file exists (basicchecks.sh is in the source tree).
+prog=match
+execname=../bin/$prog/ast$prog
+cat1=$topsrc/tests/$prog/positions-1.txt
+cat2=$topsrc/tests/$prog/positions-2.txt
+
+
+
+
+
+# Skip?
+# =====
+#
+# If the dependencies of the test don't exist, then skip it. There are two
+# types of dependencies:
+#
+#   - The executable was not made (for example due to a configure option),
+#
+#   - The input data was not made (for example the test that created the
+#     data file failed).
+if [ ! -f $execname ]; then echo "$execname not created."; exit 77; fi
+
+
+
+
+
+# Actual test script
+# ==================
+#
+# 'check_with_program' can be something like Valgrind or an empty
+# string. Such programs will execute the command if present and help in
+# debugging when the developer doesn't have access to the user's system.
+$check_with_program $execname $cat1 $cat2 --aperture=0.5 --ccol1=2,3   \
+                               --ccol2=2,3 --kdtree=internal           \
+                               --output=match-kdtree.fits



reply via email to

[Prev in Thread] Current Thread [Next in Thread]