[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnuastro-commits] master aaa7993: Query: default dataset, using random
From: |
Mohammad Akhlaghi |
Subject: |
[gnuastro-commits] master aaa7993: Query: default dataset, using random string when no --output |
Date: |
Sat, 9 Jan 2021 23:22:01 -0500 (EST) |
branch: master
commit aaa7993f64014efdb22dfa622184760a320b525e
Author: Mohammad Akhlaghi <mohammad@akhlaghi.org>
Commit: Mohammad Akhlaghi <mohammad@akhlaghi.org>
Query: default dataset, using random string when no --output
Until now, the query program required the '--dataset' arguments. However,
in the case of Gaia (and possibly many other databases), it is safe to
assume a certain dataset as default (to use when people don't explicitly
ask for a dataset).
Another issue with the Query program was that when no '--output' was given
by the user, Query would calculate the DATASUM of the downloaded table and
use that as the default name. But this had multiple problems: 1) only
worked for FITS tables, 2) needed extra processing, 3) Needed full download
of the dataset before assigning a name, 4) would right over a previously
downloaded file with the same DATASUM (differing metadata), and etc.
With this commit, a default dataset is assumed for Gaia ('edr3'), allowing
users to run Query, without having to give '--dataset=edr3' if this is what
they want (which is the most probable scenario). In this case, Query will
first print an extra line saying that which dataset it will be using.
To address the DATASUM issue, when no output name is given, Query simply
use a random 6-character string and append it to the name of the database
to create the default FITS file.
Also, Query, now prints the dimensions of the downloaded table just before
printing the final statement and finishing.
The issue with DATASUM was recommended by Francois Ochsenbein.
---
bin/query/gaia.c | 13 +++++++++----
bin/query/main.h | 1 +
bin/query/query.c | 32 ++++++++++++--------------------
bin/query/ui.c | 10 ++++++----
doc/gnuastro.texi | 53 +++++++++++++++++++++++++++++------------------------
5 files changed, 57 insertions(+), 52 deletions(-)
diff --git a/bin/query/gaia.c b/bin/query/gaia.c
index 328920e..10427dc 100644
--- a/bin/query/gaia.c
+++ b/bin/query/gaia.c
@@ -55,10 +55,15 @@ gaia_sanitycheck(struct queryparams *p)
error(EXIT_FAILURE, 0, "the '--radius' ('-r') or '--width' ('-w') "
"options are necessary with the '--center' ('-C') option");
- /* Make sure a dataset is also given. */
+ /* If no dataset is explicitly given, then use default one and let
+ the user know. */
if( p->datasetstr==NULL)
- error(EXIT_FAILURE, 0, "the '--dataset' ('-s') option is necessary "
- "with the '--center' ('-C') option");
+ {
+ gal_checkset_allocate_copy("edr3", &p->datasetstr);
+ error(EXIT_SUCCESS, 0, "using '%s' dataset since no dataset "
+ "was explicitly requested (with '--dataset')",
+ p->datasetstr);
+ }
/* Use simpler names for the commonly used datasets. */
if( !strcmp(p->datasetstr, "edr3") )
@@ -194,7 +199,7 @@ gaia_query(struct queryparams *p)
/* Print the calling command for the user to know. */
if(p->cp.quiet==0)
- printf("Running: %s\n", command);
+ error(EXIT_SUCCESS, 0, "running: %s", command);
/* Run the command. */
if(system(command))
diff --git a/bin/query/main.h b/bin/query/main.h
index 857fcde..0de0511 100644
--- a/bin/query/main.h
+++ b/bin/query/main.h
@@ -58,6 +58,7 @@ struct queryparams
char *databasestr; /* Name of input database. */
char *downloadname; /* Temporary output name. */
char *processedname; /* Temporary output name. */
+ size_t outtableinfo[2]; /* To print in output. */
/* Output: */
time_t rawtime; /* Starting time of the program. */
diff --git a/bin/query/query.c b/bin/query/query.c
index 2eb32c7..54aed11 100644
--- a/bin/query/query.c
+++ b/bin/query/query.c
@@ -47,7 +47,6 @@ query_check_download(struct queryparams *p)
char *logname;
fitsfile *fptr;
gal_data_t *table;
- unsigned long datasum;
/* Open the FITS file and if the status value is still zero, it means
everything worked properly. */
@@ -70,25 +69,14 @@ query_check_download(struct queryparams *p)
remove(p->downloadname);
free(p->downloadname);
- /* If no output name was specified, calculate the 'datasum' of the
- table and put that after the file name. */
+ /* If no output name was specified, use the 'processedname'. */
if(p->cp.output==NULL)
- {
- /* Calculate the extension's datasum. */
- datasum=gal_fits_hdu_datasum(p->processedname, "1");
-
- /* Allocate the output name. */
- if( asprintf(&p->cp.output, "%s-%lu.fits", p->databasestr,
datasum)<0 )
- error(EXIT_FAILURE, 0, "%s: asprintf allocation", __func__);
-
- /* Make sure the desired output name doesn't exist. */
- gal_checkset_writable_remove(p->cp.output, p->cp.keep,
- p->cp.dontdelete);
-
- /* Rename the processed name to the desired output. */
- rename(p->processedname, p->cp.output);
- free(p->processedname);
- }
+ p->cp.output=p->processedname;
+
+ /* Get basic information about the table and free it. */
+ p->outtableinfo[0]=gal_list_data_number(table);
+ p->outtableinfo[1]=table->size;
+ gal_list_data_free(table);
}
else
{
@@ -136,5 +124,9 @@ query(struct queryparams *p)
/* Let the user know that things went well. */
if(p->cp.quiet==0)
- printf("Query output written to: %s\n", p->cp.output);
+ {
+ printf("Query resulted in %zu columns and %zu rows.\n",
+ p->outtableinfo[0], p->outtableinfo[1]);
+ printf("Query output written to: %s\n", p->cp.output);
+ }
}
diff --git a/bin/query/ui.c b/bin/query/ui.c
index 0a2cf3f..bebb73a 100644
--- a/bin/query/ui.c
+++ b/bin/query/ui.c
@@ -250,11 +250,12 @@ static void
ui_read_check_only_options(struct queryparams *p)
{
size_t i;
+ char *basename;
gal_data_t *tmp;
/* See if database has been specified. */
if(p->databasestr==NULL)
- error(EXIT_FAILURE, 0, "no input dataset.\n\n"
+ error(EXIT_FAILURE, 0, "no input database.\n\n"
"Please use the '--database' ('-d') option to specify your "
"desired database, see manual ('info gnuastro astquery' "
"command) for the current databases");
@@ -266,7 +267,6 @@ ui_read_check_only_options(struct queryparams *p)
"For the full list of recognized databases, please see the "
"documentation (with the command 'info astquery')", p->databasestr);
-
/* Make sure that '--query' and '--center' are not called together. */
if(p->query && (p->center || p->overlapwith) )
error(EXIT_FAILURE, 0, "the '--query' option cannot be called together "
@@ -309,7 +309,7 @@ ui_read_check_only_options(struct queryparams *p)
i==1 ? "st" : i==2 ? "nd" : i==3 ? "rd" : "th");
}
- /* Sanity checks on width (if we are in the center-mode). */
+ /* Sanity checks on width (if we are in the center-mode). */
if(p->width && p->center)
{
/* Width should have the same number of elements as the center
@@ -337,11 +337,13 @@ ui_read_check_only_options(struct queryparams *p)
/* Set the name for the downloaded and processed files. These are due to
an internal low-level processing that will be done on the raw
downloaded file. */
+ basename=gal_checkset_malloc_cat(p->databasestr, ".fits");
p->processedname=gal_checkset_make_unique_suffix(p->cp.output
? p->cp.output
- : "query.fits",
+ : basename,
".fits");
p->downloadname=gal_checkset_make_unique_suffix(p->processedname, NULL);
+ free(basename);
}
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index 70ae474..9efe5cb 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -10317,8 +10317,8 @@ One line examples:
@example
## Import all the columns of all entries in the Gaia early DR3
-## catalog within 20 arc-minutes of the given coordinate.
-$ astquery --database=gaia --dataset=edr3 --output=my-gaia.fits \
+## catalog (default) within 20 arc-minutes of the given coordinate.
+$ astquery --database=gaia --output=my-gaia.fits \
--center=113.8729761,31.9027152 --radius=20/60
## Similar to above, but return all objects within a square box of
@@ -10327,18 +10327,18 @@ $ astquery --database=gaia --dataset=edr3
--output=my-gaia.fits \
--center=113.8729761,31.9027152 --width=30/60
## Similar to above, but for objects with magnitude range 10 to 15,
-$ astquery --database=gaia --dataset=edr3 --output=my-gaia.fits \
+$ astquery --database=gaia --output=my-gaia.fits \
--center=113.8729761,31.9027152 --width=30/60 \
--range=phot_g_mean_mag,10:15
## Similar to first example, but only import the ID, RA, Dec and G-band
## magnitude of the sources (not all the columns).
-$ astquery --database=gaia --dataset=edr3 --output=my-gaia.fits \
+$ astquery --database=gaia --output=my-gaia.fits \
--center=113.8729761,31.9027152 --radius=0.1 \
--column=source_id,ra,dec,phot_g_mean_mag
## Find the ID, RA and Dec of all Gaia sources within an image.
-$ astquery --database=gaia --dataset=edr3 --overlapwith=image.fits
+$ astquery --database=gaia --overlapwith=image.fits
-csource_id,ra,dec
## Use a custom query to extract entries in the Gaia early DR3 catalog.
@@ -10346,27 +10346,31 @@ $ astquery --database=gaia --dataset=edr3
--overlapwith=image.fits
$ astquery --database=gaia --query="XXXX YYYY" --output=my-gaia.fits
@end example
-Query doesn't take any input argument, because the main goal is to retreive
data from external sources.
-The main input to Query is the @option{--database} option which specifies
which database should be contacted for submitting the query.
-The name of the downloaded output file can optionally be set with
@option{--output} (see below for when @option{--output} is not specified).
+The name of the downloaded output file can be set with @option{--output}.
+The requested output format can any of the @ref{Recognized table formats}
(currently @file{.txt} or @file{.fits}).
Like all Gnuastro programs, if the output is a FITS file, the zero-th/first
HDU of the output will contain all the command-line options given to Query.
-
-There are two methods to query the database: 1) with @option{--query} you can
directly give a raw query statement that is recognized by the database, 2) with
the @option{--center} and @option{--radius}, the low-level query will
constructed automatically for the particular database.
-The former is very low level and will require some knowledge of the database's
query language, but of course, it is much more powerful.
-The latter is much more limited in terms of capabilities (the only constraint
is the location of the objects compared to the given center), but doesn't
require any knowledge of the database's query language.
-
-@cindex @code{DATASUM}: FITS keyword
-If @option{--output} is not set, the output name will be in the format of
@file{STRING-NUMBER.fits}, where @file{STRING} is the name of the database, and
@file{NUMBER} is the @code{DATASUM} of the downloaded table, which will be
unique for a given table's data (For more on @code{DATASUM} in the FITS
standard, see @ref{Keyword manipulation}, under the @code{checksum} component
of @option{--write}).
-With this feature, a second run of @command{astquery} that isn't called with
@option{--output}, and downloads the same final table data from the same
database will have the same output name.
-However, a query (that is again not called with @option{--output}) which
results in a different downloaded table (even differing by a single number)
will have a different output name to avoid overriding a previously downloaded
dataset.
+If @option{--output} is not set, the output name will be in the format of
@file{NAME-STRING.fits}, where @file{NAME} is the name of the database (same
value given to @option{--database}), and @file{STRING} is a randomly selected
6-character set of numbers and alphabetic characters.
+With this feature, a second run of @command{astquery} that isn't called with
@option{--output} will not over-write an already downloaded one.
Generally, when calling Query more than once, it is recommended to set an
output name for each call based on your project's context.
-@cartouche
-@noindent
+Query doesn't take any input argument, because the main goal is to retreive
data from external sources.
+The main input to Query is the @option{--database} option which specifies
which database should be contacted for submitting the query.
+There are two methods to query the database, each is more fully discussed in
its option's description below.
+@itemize
+@item
+With @option{--query} you can directly give a raw query statement that is
recognized by the database.
+This is very low level and will require some knowledge of the database's query
language, but of course, it is much more powerful.
+If this option is given, the raw string is directly passed to the server and
all other constraints/options are ignored.
+@item
+With the @option{--center}, @option{--radius} and other constraining options
below, the low-level query will be constructed automatically for the particular
database.
+This method is only limited to the generic capabilities that Query provides
for all servers.
+So @option{--query} is more powerful, however, in this mode, you don't need
any knowledge of the database's query language.
+When query is run, before contacting the server, it will print the full
command that it executes which contains the raw server query that is
constructed.
+@end itemize
+
@strong{Under development, request for feedback:} Query is a new member of the
Gnuastro family of programs.
It currently requires that the @command{curl} executable (for the cURL
downloading program) to be present on the host and the number of databases it
supports is still limited, see the list under the @option{--database} option
below.
More downloader tools, and databases will be added in the near future as it is
used more often, so please don't hesitate to suggest any that you may need.
-@end cartouche
@table @option
@@ -10391,9 +10395,10 @@ The queries will generally contain space and other
meta-characters, so we recomm
@item -s STR
@itemx --dataset=STR
-The dataset to query within the database for the automatically generated query
(not compatible with @option{--query}).
+The dataset to query within the database (not compatible with
@option{--query}).
The reason for this is that many databases have different types of datasets,
for example different data releases (DRs), or various high-level calculations
on subsets of the database elements.
-For example when @option{--database=gaia}, you can set @option{--dataset=edr3}
to only select objects within the early data release 3.
+If not excplicity given, a particular dataset will be internally set as
default for each database, and its name will be printed on the command-line.
+The default dataset is highlighted under each database listed below.
You can either use the database's official name of the datasets, for example
@code{gaiaedr3.gaia_source} for the early data release 3 the Gaia database, or
a simplified version that maps to it (@code{edr3}) for easy typing on the
command-line.
Below is a list of the simplified names for the databases that have them.
@@ -10402,7 +10407,7 @@ Below is a list of the simplified names for the
databases that have them.
@item gaia
@itemize
@item
-@code{edr3 --> gaiaedr3.gaia_source}
+@code{edr3 --> gaiaedr3.gaia_source} (the default dataset)
@item
@code{dr2 --> gaiadr2.gaia_source}
@item
@@ -25966,7 +25971,7 @@ The number of rows (or the number of elements in each
@code{gal_data_t}) in the
All these functions will all be satisfied if you use @code{gal_table_read} to
read the two coordinate columns, see @ref{Table input output}.
@cindex Permutation
-The functions below return a simply-linked list of three 1D datasets (see
@ref{List of @code{gal_data_t}}), let's call the returned dataset @code{ret}.
+The functions below return a simply-linked list of three 1D datasets (see
@ref{List of gal_data_t}), let's call the returned dataset @code{ret}.
The first two (@code{ret} and @code{ret->next}) are permutaitons.
In other words, the @code{array} elements of both have a type of
@code{size_t}, see @ref{Permutations}.
The third node (@code{ret->next->next}) is the calculated distance for that
match and its array has a type of @code{double}.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [gnuastro-commits] master aaa7993: Query: default dataset, using random string when no --output,
Mohammad Akhlaghi <=