gnuastro-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnuastro-commits] master acad34c 049/125: Sanity checks for reading txt


From: Mohammad Akhlaghi
Subject: [gnuastro-commits] master acad34c 049/125: Sanity checks for reading txt tables
Date: Sun, 23 Apr 2017 22:36:35 -0400 (EDT)

branch: master
commit acad34ca262cc4940c321cdf59931bccf4b9ab9f
Author: Mohammad Akhlaghi <address@hidden>
Commit: Mohammad Akhlaghi <address@hidden>

    Sanity checks for reading txt tables
    
    Several checks were added in the library and also the Table program when
    reading plain text tables. The manual was also correspondingly
    updated. They are listed below:
    
     - Previously, the `NAME' argument in the meta-data column was
       (unintentionally) mandatory. But now, when the name isn't specified the
       whole line will be ignored.
    
     - The checks for the column number (being readable, and not being
       repeated) are also now moved to before we start parsing the contents
       within the brackets.
    
     - In parsing the meta-data comment, there was no check if `typestr' was
       present or not! So now there is a check to change the type of the column
       (from the default which is double) only if this string is given.
    
     - While parsing the first row to check for the actual number of columns in
       the file, the (possibly) extra columns in the meta-data will also be
       removed. This is important if the table has `# Column 20: ...' but the
       table only has 5 columns for example.
    
     - When there are no rows in the table, the table information and writing
       functions will return a NULL pointer.
    
     - When there aren't enough columns in the subsequent rows of the table, an
       error will be printed.
    
    Finally, to make an invalid type for `gal_data_t' more systematic, a new
    `GAL_DATA_TYPE_INVALID' macro is now defined in `data.h'.
---
 bin/table/ui.c      |  11 +++
 doc/gnuastro.texi   | 115 +++++++++++++----------
 lib/data.c          |   4 +-
 lib/gnuastro/data.h |   1 +
 lib/table.c         |  16 +++-
 lib/txt.c           | 263 ++++++++++++++++++++++++++++++++++------------------
 6 files changed, 265 insertions(+), 145 deletions(-)

diff --git a/bin/table/ui.c b/bin/table/ui.c
index ff9a1d7..4bfd1b9 100644
--- a/bin/table/ui.c
+++ b/bin/table/ui.c
@@ -388,6 +388,11 @@ preparearrays(struct tableparams *p)
       allcols=gal_table_info(p->up.filename, p->cp.hdu, &numcols,
                              &numrows, &tabletype);
 
+      /* If there was no actual data in the file, then inform the user */
+      if(allcols==NULL)
+        error(EXIT_FAILURE, 0, "%s: no usable data rows (non-commented and "
+              "non-blank lines)", p->up.filename);
+
       /* Free the information from all the columns. */
       for(i=0;i<numcols;++i)
         gal_data_free(&allcols[i], 1);
@@ -409,6 +414,12 @@ preparearrays(struct tableparams *p)
   p->table=gal_table_read(p->up.filename, p->cp.hdu, p->columns,
                           p->searchin, p->ignorecase, p->cp.minmapsize);
 
+  /* If there was no actual data in the file, then inform the user and
+     abort. */
+  if(p->table==NULL)
+    error(EXIT_FAILURE, 0, "%s: no usable data rows (non-commented and "
+          "non-blank lines)", p->up.filename);
+
   /* Now that the data columns are ready, we can free the string linked
      list. */
   gal_linkedlist_free_stll(p->columns, 1);
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index f5a5a15..fe4677e 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -5150,14 +5150,14 @@ also contain non-numerical columns.
 @node Gnuastro text table format,  , Table formats, Table formats
 @subsection Gnuastro text table format
 
-Plain text files are most generic and portable way to manually create,
-visually inspect, or manually edit a table. In this format, the ending of a
-row is defined by the new-line character (a line on a text editor). So when
-you view it on a text editor, every row will occupy one line. The
-delimiters (or characters separating the columns) are white space
+Plain text files are most generic, portable, and easiest way to (manually)
+create, (visually) inspect, or (manually) edit a table. In this format, the
+ending of a row is defined by the new-line character (a line on a text
+editor). So when you view it on a text editor, every row will occupy one
+line. The delimiters (or characters separating the columns) are white space
 characters (space, horizontal tab, vertical tab) and a comma (@key{,}). The
-only further requirement is that all rows must have the same number of
-columns.
+only further requirement is that all rows/lines must have the same number
+of columns.
 
 The columns don't have to be exactly under each other and the rows can be
 arbitrarily long with different lengths. For example the following contents
@@ -5171,55 +5171,70 @@ each element interpretted as a @code{double} type (see 
@ref{Data types}).
 
 However, the example above has no other information about the columns. For
 example, Gnuastro's programs/libraries, you aren't limited to using the
-column's number/index. If the columns have names, units, or comments you
-can also select your columns based on searches/matches in these fields, for
-example see @ref{Table}. Also, in this manner, you can't guide the program
-reading the table on how to read the numbers. As an example, the first and
-third columns above can be read as integer types: for example, the first
-column can be an ID and the third can be the number of pixels it
-occupies. So there is no need to read it as a @code{double} type (which
-takes more memory, and is also slower).
+column's number. If the columns have names, units, or comments you can also
+select your columns based on searches/matches in these fields, for example
+see @ref{Table}. It is also bad for sending to a colleague, because they
+will find it hard to remember/use the columns properly. Also, in this
+manner, you can't guide the program reading the table on how to read the
+numbers. As an example, the first and third columns above can be read as
+integer types: the first column might be an ID and the third can be the
+number of pixels it occupies. So there is no need to read it as a
address@hidden type (which takes more memory, and is slower).
 
 In this bare-minimum example, you also can't use strings of characters, for
 example the names of filters, or some other identifier that includes
 non-numerical characters. In the absence of any information, only numbers
-can be read. Assuming we read columns with non-numerical characters as
-string, there would still be the problem that the strings might contain
-space (or any delimiter) character for some rows. So, each `word' will be
-interpretted as a column and the program will abort with an error that the
-rows don't have the same number of columns.
+can be read robustly. Assuming we read columns with non-numerical
+characters as string, there would still be the problem that the strings
+might contain space (or any delimiter) character for some rows. So, each
+`word' in the string will be interpretted as a column and the program will
+abort with an error that the rows don't have the same number of columns.
 
 To correct for these limitations, Gnuastro defines the following convention
-for guiding the program reading the text table on how to read/interpret
-it. When the first non-white character in a line is @key{#}, or there are
-no non-white characters in it, then the line will be ignored. In the former
-case, the line is interpretted as a @emph{comment}. If the comment line
-starts with @code{# Column N:}, then it is assumed to contain information
-about column @code{N} (counting from 1). Comment lines that don't start
-with this pattern are ignored and you can use them to include any further
-information you want to store with the table in the text file. A column
-information comment is assumed to have the following format (which was
-primarily defined for ease of reading by eye):
+for storing the table meta-data along with the plain text file to guide a
+human, or program, on how to read/interpret it. When the first non-white
+character in a line is @key{#}, or there are no non-white characters in it,
+then the line will be ignored (this is a pretty standard convention in many
+programs, and higher level languages). In the former case, the line is
+interpretted as a @emph{comment}. If the comment line starts with 
address@hidden
+Column N:}', then it is assumed to contain information about column
address@hidden (a number, counting from 1). Comment lines that don't start with
+this pattern are ignored and you can use them to include any further
+information you want to store with the table in the text file.
+
+The format is primarily defined for ease of reading/writing by eye/fingers,
+but is also structured enough to be read by a program. A column information
+comment is assumed to have the following format:
 
 @example
 # Column N: NAME [UNIT, TYPE, BLANK] COMMENT
 @end example
 
 @cindex NaN
address@hidden
 Any sequence of characters between address@hidden:}' and address@hidden' will 
be
 interpretted as the column name (so it can contain anything except the
 address@hidden' character). Anything between the address@hidden' and the end 
of the
 line is defined as a comment. Within the brackets, anything before the
 first address@hidden,}' is the units (physical units, for example km/s, or 
erg/s),
 anything before the second address@hidden,}' is the short type identifier (see
-below), finally, any non-white characters after the second address@hidden,}' 
within
-the brackets are interpretted as the blank value for that column (see
address@hidden pixels}). Note that blank values will be stored in the same type
-as the column, not as address@hidden floating point types, the
address@hidden, or @code{inf} strings (both not case-sensitive) refer to IEEE
-NaN (not a number) and infinity values respectively and will be stored as a
-floating point, so they are acceptable.}. The leading and ending white
-space characters will be stripped from all of these strings. For example in
+below, and @ref{Data types}). Finally (still within the brackets), any
+non-white characters after the second address@hidden,}' are interpretted as the
+blank value for that column (see @ref{Blank pixels}). Note that blank
+values will be stored in the same type as the column, not as a
address@hidden floating point types, the @code{nan}, or @code{inf}
+strings (both not case-sensitive) refer to IEEE NaN (not a number) and
+infinity values respectively and will be stored as a floating point, so
+they are acceptable.}.
+
+When a formatting problem occurs (for example you have specified the wrong
+type code, see below), or the the column was already given meta-data in a
+previous comment, or the column number is larger than the actual number of
+columns in the table (the non-commented or empty lines), then the comment
+information line will be ignored.
+
+When a comment information line can be used, the leading and trailing white
+space characters will be stripped from all of the elements. For example in
 this line:
 
 @example
@@ -5228,17 +5243,19 @@ this line:
 
 The @code{NAME} field will be address@hidden name}', or @code{TYPE} will be
 address@hidden'. Note how all the white space characters before and after
-strtings are not used, but those in the middle remained. Also, the lack of
-space characters is also acceptable, so in the example above @code{BLANK}
-will be address@hidden'.
-
-Except for the column number (@code{N}), the rest of the fields are not
-mandatory and the column information doesn't have to be in order. Also, you
-don't have to specify information for all columns. Those without
-information will be interpretted with the default settings (like the case
-above: all types are double, with no name, units, or comments). So these
-lines are all acceptable (the first one, with nothing but the column number
-is redundant):
+strings are not used, but those in the middle remained. Also, white space
+characters aren't mandatory, so in the example above @code{BLANK} will be
address@hidden'.
+
+Except for the column number (@code{N}), the rest of the fields are
+optional and the column information comments don't have to be in order. In
+other words, the information for column @mymath{N+m} (@mymath{m>0}) can be
+given before column @mymath{N}. Also, you don't have to specify information
+for all columns. Those columns that don't have this information will be
+interpretted with the default settings (like the case above: values are
+double precision floating point, and the column has no name, unit, or
+comment). So these lines are all acceptable for any table (the first one,
+with nothing but the column number is redundant):
 
 @example
 # Column 5:
diff --git a/lib/data.c b/lib/data.c
index b6d6990..c626842 100644
--- a/lib/data.c
+++ b/lib/data.c
@@ -512,7 +512,7 @@ gal_data_calloc_dataarray(size_t size)
   for(i=0;i<size;++i)
     {
       out[i].array      = NULL;
-      out[i].type       = -1;
+      out[i].type       = GAL_DATA_TYPE_INVALID;
       out[i].ndim       = 0;
       out[i].dsize      = NULL;
       out[i].nwcs       = 0;
@@ -2061,7 +2061,7 @@ gal_data_string_as_type(char *str)
     return GAL_DATA_TYPE_DCOMPLEX;
 
   else
-    return -1;
+    return GAL_DATA_TYPE_INVALID;
 
   /* Any of the cases above should return this function, so if control
      reaches here, there is a bug. */
diff --git a/lib/gnuastro/data.h b/lib/gnuastro/data.h
index 7b38695..f5cd3cb 100644
--- a/lib/gnuastro/data.h
+++ b/lib/gnuastro/data.h
@@ -84,6 +84,7 @@ __BEGIN_C_DECLS  /* From C++ preparations */
 
 /* Macros to identify the type of data. The macros in the comment
    parenthesis is the equivalent macro in CFITSIO. */
+#define GAL_DATA_TYPE_INVALID     -1
 enum gal_data_types
 {
   GAL_DATA_TYPE_BIT,       /* Bit              (TBIT).        */
diff --git a/lib/table.c b/lib/table.c
index ca2febe..f28a919 100644
--- a/lib/table.c
+++ b/lib/table.c
@@ -303,6 +303,12 @@ gal_table_read_blank(gal_data_t *col, char *blank)
      blank value. */
   if(blank==NULL) return;
 
+  /* Just for a sanity check, the ndim element should be zero. */
+  if(col->ndim)
+    error(EXIT_FAILURE, 0, "A bug! The number of dimensions in the data "
+          "structure passed `to gal_table_read_blank' must be zero, but it "
+          "is %zu", col->ndim);
+
   /* Allocate space to keep the blank value and initialize the dimension
      and size parameters appropriately. */
   errno=0;
@@ -358,8 +364,8 @@ gal_table_read_blank(gal_data_t *col, char *blank)
           }
     }
 
-  /* If the blank value couldn't be read, then set the initialized
-     lengths to zero again. */
+  /* If the blank value couldn't be read, then set the initialized lengths
+     to zero again, as if this function wasn't called at all. */
   if(col->array==NULL)
     {
       col->ndim=col->size=0;
@@ -697,13 +703,13 @@ gal_table_read(char *filename, char *hdu, struct 
gal_linkedlist_stll *cols,
   /* First get the information of all the columns. */
   allcols=gal_table_info(filename, hdu, &numcols, &numrows, &tabletype);
 
+  /* If there was no actual data in the file, then return NULL. */
+  if(allcols==NULL) return NULL;
+
   /* Get the list of indexs in the same order as the input list */
   indexll=make_list_of_indexs(cols, allcols, numcols, searchin,
                               ignorecase, filename, hdu);
 
-  /* If no columns could be selected, just return NULL. */
-  if(indexll==NULL) return NULL;
-
   /* Depending on the table type, read the columns into the output
      structure. Note that the functions here pop each index, read/store the
      desired column and pop the next, so after these functions, the output
diff --git a/lib/txt.c b/lib/txt.c
index fdc31ac..1a4becc 100644
--- a/lib/txt.c
+++ b/lib/txt.c
@@ -149,7 +149,8 @@ txt_info_from_comment(char *line, gal_data_t **colsll)
 {
   char *tailptr;
   gal_data_t *tmp;
-  int index, type, strw=0;
+  int index, strw=0;
+  int type=GAL_DATA_TYPE_DOUBLE; /* Default type. */
   char *number=NULL, *name=NULL, *comment=NULL;
   char *inbrackets=NULL, *unit=NULL, *typestr=NULL, *blank=NULL;
 
@@ -182,6 +183,24 @@ txt_info_from_comment(char *line, gal_data_t **colsll)
           ++line;
         }
 
+      /* Read the column number as an integer. If it can't be read as an
+         integer, or is zero or negative then just return without adding
+         anything to this line. */
+      index=strtol(number, &tailptr, 0);
+      if(*tailptr!='\0' || index<=0) return;
+
+      /* If there was no name (the line is just `# Column N:'), then ignore
+         the line. Relying on the column count from the first line is more
+         robust and less prone to human error, for example typing a number
+         larger than the total number of columns.  */
+      name=txt_trim_space(name);
+      if(name==NULL) return;
+
+      /* If this is a repeated index, ignore it. */
+      for(tmp=*colsll; tmp!=NULL; tmp=tmp->next)
+        if(tmp->status==index)
+          return;
+
       /* If there were brackets, then break it up. */
       if(inbrackets)
         {
@@ -198,41 +217,37 @@ txt_info_from_comment(char *line, gal_data_t **colsll)
             }
         }
 
-      /* Read the column number as an integer. If it can't be read as an
-         integer, or is zero or negative then just return without adding
-         anything to this line. */
-      index=strtol(number, &tailptr, 0);
-      if(*tailptr!='\0' || index<=0) return;
-
-      /* See if the type is a standard type, if so, then set the type,
-         otherwise, return and ignore this line. Just note that if we are
-         dealing with the string type, we have to pull out the number part
-         first. If there is no number, there will be an error.*/
-      typestr=txt_trim_space(typestr);
-      if( !strncmp(typestr, "str", 3) )
+      /* If `typestr' was given, then check if this is a standard type. If
+         `typestr' wasn't specified, then the default double type code will
+         be used (see the variable definitions above). If the given type
+         isn't a standard type then ignore the line. Just note that if we
+         are dealing with the string type, we have to pull out the number
+         part first. If there is no number for a string type, then ignore
+         the line. */
+      if(typestr)
         {
-          type=GAL_DATA_TYPE_STRING;
-          strw=strtol(typestr+3, &tailptr, 0);
-          if(*tailptr!='\0' || strw<0) return;
-        }
-      else
-        {
-          type=gal_data_string_as_type(typestr);
-          if(type==-1) return;
+          typestr=txt_trim_space(typestr);
+          if( !strncmp(typestr, "str", 3) )
+            {
+              type=GAL_DATA_TYPE_STRING;
+              strw=strtol(typestr+3, &tailptr, 0);
+              if(*tailptr!='\0' || strw<0) return;
+            }
+          else
+            {
+              type=gal_data_string_as_type(typestr);
+              if(type==GAL_DATA_TYPE_INVALID) return;
+            }
         }
 
-      /* If this is a repeated index, ignore it. */
-      for(tmp=*colsll; tmp!=NULL; tmp=tmp->next)
-        if(tmp->status==index)
-          return;
-
       /* Add this column's information into the columns linked list. We
-         will define the array to have one element to keep the blank
-         value. To keep the name, unit, and comment strings, trim the white
-         space before and after each before using them here.  */
-      gal_data_add_to_ll(colsll, NULL, type, 0, NULL, NULL, 0, -1,
-                         txt_trim_space(name), txt_trim_space(unit),
-                         txt_trim_space(comment) );
+         will define the data structur's array to have zero dimensions (no
+         array) by default. If there is a blank value its value will be put
+         into the array by `gal_table_read_blank'. To keep the name, unit,
+         and comment strings, trim the white space before and after each
+         before using them here.  */
+      gal_data_add_to_ll(colsll, NULL, type, 0, NULL, NULL, 0, -1, name,
+                         txt_trim_space(unit), txt_trim_space(comment) );
 
       /* Put the number of this column into the status variable of the data
          structure. If the type is string, then also copy the width into
@@ -254,10 +269,10 @@ txt_info_from_comment(char *line, gal_data_t **colsll)
    the information might not have been complete. So we need to go through
    the first row of data also. */
 void
-txt_info_from_row(char *line, gal_data_t **colsll)
+txt_info_from_first_row(char *line, gal_data_t **colsll)
 {
-  size_t n=0;
-  gal_data_t *col;
+  size_t n=0, maxcnum=0;
+  gal_data_t *col, *prev, *tmp;
   char *token, *end=line+strlen(line);
 
   /* Remove the new line character from the end of the line. If the last
@@ -266,6 +281,11 @@ txt_info_from_row(char *line, gal_data_t **colsll)
      character. Its better for it to actually be shorter than the space. */
   *(end-1)='\0';
 
+  /* Get the maximum number of columns read from the comment
+     information. */
+  for(col=*colsll; col!=NULL; col=col->next)
+    maxcnum = maxcnum>col->status ? maxcnum : col->status;
+
   /* Go over the line check/fill the column information. */
   while(++n)
     {
@@ -312,7 +332,6 @@ txt_info_from_row(char *line, gal_data_t **colsll)
           /* Make sure a token exists in this undefined column. */
           token=strtok_r(n==1?line:NULL, GAL_TXT_DELIMITERS, &line);
           if(token==NULL) break;
-          /* printf(" col %zu: *%s*\n", i, token); */
 
           /* A token exists, so set this column to the default double type
              with no information, then set its status value to the column
@@ -322,6 +341,43 @@ txt_info_from_row(char *line, gal_data_t **colsll)
           (*colsll)->status=n;
         }
     }
+
+  /* If the number of columns given by the comments is larger than the
+     actual number of lines, remove those that have larger numbers from the
+     linked list before things get complicated outside of this function. */
+  if(maxcnum>n)
+    {
+      prev=NULL;
+      col=*colsll;
+      while(col!=NULL)
+        {
+          if(col->status > n)   /* Column has no data (was only in comments) */
+            {
+              /* This column has to be removed/freed. But we have to make
+                 some corrections before freeing it:
+
+                  - When `prev==NULL', then we still haven't got to the
+                    first valid element yet and must free this one, but if
+                    we do that, then the main pointer to the start of the
+                    list will be lost (we will loose all connections with
+                    the chain after leaving this loop). So we need to set
+                    that to the next element.
+
+                  - When there actually was a previous element
+                    (`prev!=NULL'), then we must correct it's next
+                    pointer. Otherwise we will break up the chain.*/
+              if(prev) prev->next=col->next; else *colsll=col->next;
+              tmp=col->next;
+              gal_data_free(col, 0);
+              col=tmp;
+            }
+          else                  /* Column has data.                          */
+            {
+              prev=col;
+              col=col->next;
+            }
+        }
+    }
 }
 
 
@@ -379,7 +435,8 @@ txt_infoll_to_array(gal_data_t *colsll, size_t *numcols)
 
 
 
-/* Return the information about a text file table. */
+/* Return the information about a text file table. If there were no
+   readable rows, it will return NULL.*/
 gal_data_t *
 gal_txt_table_info(char *filename, size_t *numcols, size_t *numrows)
 {
@@ -425,23 +482,31 @@ gal_txt_table_info(char *filename, size_t *numcols, 
size_t *numrows)
           if(firstlinedone==0)
             {
               firstlinedone=1;
-              txt_info_from_row(line, &colsll);
+              txt_info_from_first_row(line, &colsll);
             }
         }
     }
 
 
-  /* Write the unorganized gathered information (linked list) into an
-     organized array for easy processing by later steps. */
-  allcols=txt_infoll_to_array(colsll, numcols);
+  /* If there were rows in the file, then write the unorganized gathered
+     information (linked list) into an organized array for easy processing
+     by later steps.  */
+  allcols = *numrows ? txt_infoll_to_array(colsll, numcols) : NULL;
 
-  /* Clean up and close the file. */
+
+  /* Clean up. Note that even if there were no usable columns, there might
+     have been meta-data comments, so we need to free `colsll' in any
+     case. If the list is indeed empty, then `gal_data_free_ll' won't do
+     anything. */
+  free(line);
+  gal_data_free_ll(colsll);
+
+
+  /* Close the file. */
   errno=0;
   if(fclose(fp))
     error(EXIT_FAILURE, errno, "%s: couldn't close file after reading ASCII "
           "table information", filename);
-  gal_data_free_ll(colsll);
-  free(line);
 
 
   /* Return the array of column information. */
@@ -472,10 +537,11 @@ gal_txt_table_info(char *filename, size_t *numcols, 
size_t *numrows)
 /************************************************************************/
 static void
 txt_fill_columns(char *line, char **tokens, size_t maxcolnum,
-                 gal_data_t *colinfo, gal_data_t *out, size_t lineind,
+                 gal_data_t *colinfo, gal_data_t *out, size_t rowind,
                  size_t lineno, char *filename)
 {
   size_t n=0;
+  int notenoughcols=0;
   gal_data_t *col;
   char *tailptr, *end=line+strlen(line);
 
@@ -499,23 +565,44 @@ txt_fill_columns(char *line, char **tokens, size_t 
maxcolnum,
      one. So we need column `maxcolnum'.*/
   while(++n)
     {
-      /* Break out of the parsing if we don't need the columns any more. */
+      /* Break out of the parsing if we don't need the columns any
+         more. The table might contain many more columns, but when they
+         aren't needed, there is no point in tokenizing them. */
       if(n>maxcolnum) break;
 
       /* Set the pointer to the start of this token/column. See
          explanations in `txt_info_from_row'. */
       if( colinfo[n-1].type == GAL_DATA_TYPE_STRING )
         {
+          /* Remove any delimiters and stop at the first non-delimiter. If
+             we have reached the end of the line then its an error, because
+             we were expecting a column here. */
           while(isspace(*line) || *line==',') ++line;
+          if(*line=='\0') {notenoughcols=1; break;}
+
+          /* Everything is good, set the pointer and increment the line to
+             the end of the allocated space for this string. */
           line = (tokens[n]=line) + colinfo[n-1].disp_width;
           if(line<end) *line++='\0';
         }
       else
-        tokens[n]=strtok_r(n==1?line:NULL, GAL_TXT_DELIMITERS, &line);
+        {
+          /* If we have reached the end of the line, then `strtok_r' will
+             return a NULL pointer. */
+          tokens[n]=strtok_r(n==1?line:NULL, GAL_TXT_DELIMITERS, &line);
+          if(tokens[n]==NULL) {notenoughcols=1; break;}
+        }
     }
 
+  /* Report an error if there weren't enough columns. */
+  if(notenoughcols)
+    error_at_line(EXIT_FAILURE, 0, filename, lineno, "not enough columns in "
+                  "this line. Previous (uncommented) lines in this file had "
+                  "%zu columns, but this line has %zu columns", maxcolnum,
+                  n-1); /* This must be `n-1' (since n starts from 1). */
+
   /* For a sanity check:
-  printf("row: %zu: ", lineind+1);
+  printf("row: %zu: ", rowind+1);
   for(n=1;n<=maxcolnum;++n) printf("-%s-, ", tokens[n]);
   printf("\n");
   */
@@ -532,77 +619,77 @@ txt_fill_columns(char *line, char **tokens, size_t 
maxcolnum,
         case GAL_DATA_TYPE_STRING:
           str=col->array;
           gal_checkset_allocate_copy(txt_trim_space(tokens[col->status]),
-                                     &str[lineind]);
+                                     &str[rowind]);
           if( (strb=colinfo[col->status-1].array)
-              && !strcmp( *strb, str[lineind] ) )
+              && !strcmp( *strb, str[rowind] ) )
             {
-              free(str[lineind]);
+              free(str[rowind]);
               gal_checkset_allocate_copy(GAL_DATA_BLANK_STRING,
-                                         &str[lineind]);
+                                         &str[rowind]);
             }
           break;
 
         case GAL_DATA_TYPE_UCHAR:
           uc=col->array;
-          uc[lineind]=strtol(tokens[col->status], &tailptr, 0);
-          if( (ucb=colinfo[col->status-1].array) && *ucb==uc[lineind] )
-            uc[lineind]=GAL_DATA_BLANK_UCHAR;
+          uc[rowind]=strtol(tokens[col->status], &tailptr, 0);
+          if( (ucb=colinfo[col->status-1].array) && *ucb==uc[rowind] )
+            uc[rowind]=GAL_DATA_BLANK_UCHAR;
           break;
 
         case GAL_DATA_TYPE_CHAR:
           c=col->array;
-          c[lineind]=strtol(tokens[col->status], &tailptr, 0);
-          if( (cb=colinfo[col->status-1].array) && *cb==c[lineind] )
-            c[lineind]=GAL_DATA_BLANK_CHAR;
+          c[rowind]=strtol(tokens[col->status], &tailptr, 0);
+          if( (cb=colinfo[col->status-1].array) && *cb==c[rowind] )
+            c[rowind]=GAL_DATA_BLANK_CHAR;
           break;
 
         case GAL_DATA_TYPE_USHORT:
           us=col->array;
-          us[lineind]=strtol(tokens[col->status], &tailptr, 0);
-          if( (usb=colinfo[col->status-1].array) && *usb==us[lineind] )
-            us[lineind]=GAL_DATA_BLANK_USHORT;
+          us[rowind]=strtol(tokens[col->status], &tailptr, 0);
+          if( (usb=colinfo[col->status-1].array) && *usb==us[rowind] )
+            us[rowind]=GAL_DATA_BLANK_USHORT;
           break;
 
         case GAL_DATA_TYPE_SHORT:
           s=col->array;
-          s[lineind]=strtol(tokens[col->status], &tailptr, 0);
-          if( (sb=colinfo[col->status-1].array) && *sb==s[lineind] )
-            s[lineind]=GAL_DATA_BLANK_SHORT;
+          s[rowind]=strtol(tokens[col->status], &tailptr, 0);
+          if( (sb=colinfo[col->status-1].array) && *sb==s[rowind] )
+            s[rowind]=GAL_DATA_BLANK_SHORT;
           break;
 
         case GAL_DATA_TYPE_UINT:
           ui=col->array;
-          ui[lineind]=strtol(tokens[col->status], &tailptr, 0);
-          if( (uib=colinfo[col->status-1].array) && *uib==ui[lineind] )
-            ui[lineind]=GAL_DATA_BLANK_UINT;
+          ui[rowind]=strtol(tokens[col->status], &tailptr, 0);
+          if( (uib=colinfo[col->status-1].array) && *uib==ui[rowind] )
+            ui[rowind]=GAL_DATA_BLANK_UINT;
           break;
 
         case GAL_DATA_TYPE_INT:
           i=col->array;
-          i[lineind]=strtol(tokens[col->status], &tailptr, 0);
-          if( (ib=colinfo[col->status-1].array) && *ib==i[lineind] )
-            i[lineind]=GAL_DATA_BLANK_INT;
+          i[rowind]=strtol(tokens[col->status], &tailptr, 0);
+          if( (ib=colinfo[col->status-1].array) && *ib==i[rowind] )
+            i[rowind]=GAL_DATA_BLANK_INT;
           break;
 
         case GAL_DATA_TYPE_ULONG:
           ul=col->array;
-          ul[lineind]=strtoul(tokens[col->status], &tailptr, 0);
-          if( (ulb=colinfo[col->status-1].array) && *ulb==ul[lineind] )
-            ul[lineind]=GAL_DATA_BLANK_ULONG;
+          ul[rowind]=strtoul(tokens[col->status], &tailptr, 0);
+          if( (ulb=colinfo[col->status-1].array) && *ulb==ul[rowind] )
+            ul[rowind]=GAL_DATA_BLANK_ULONG;
           break;
 
         case GAL_DATA_TYPE_LONG:
           l=col->array;
-          l[lineind]=strtol(tokens[col->status], &tailptr, 0);
-          if( (lb=colinfo[col->status-1].array) && *lb==l[lineind] )
-            l[lineind]=GAL_DATA_BLANK_LONG;
+          l[rowind]=strtol(tokens[col->status], &tailptr, 0);
+          if( (lb=colinfo[col->status-1].array) && *lb==l[rowind] )
+            l[rowind]=GAL_DATA_BLANK_LONG;
           break;
 
         case GAL_DATA_TYPE_LONGLONG:
           L=col->array;
-          L[lineind]=strtoll(tokens[col->status], &tailptr, 0);
-          if( (Lb=colinfo[col->status-1].array) && *Lb==L[lineind] )
-            L[lineind]=GAL_DATA_BLANK_LONGLONG;
+          L[rowind]=strtoll(tokens[col->status], &tailptr, 0);
+          if( (Lb=colinfo[col->status-1].array) && *Lb==L[rowind] )
+            L[rowind]=GAL_DATA_BLANK_LONGLONG;
           break;
 
         /* For the blank value of floating point types, we need to make
@@ -611,18 +698,18 @@ txt_fill_columns(char *line, char **tokens, size_t 
maxcolnum,
            compare the values. */
         case GAL_DATA_TYPE_FLOAT:
           f=col->array;
-          f[lineind]=strtod(tokens[col->status], &tailptr);
+          f[rowind]=strtod(tokens[col->status], &tailptr);
           if( (fb=colinfo[col->status-1].array)
-              && ( (isnan(*fb) && isnan(f[lineind])) || *fb==f[lineind] ) )
-            f[lineind]=GAL_DATA_BLANK_FLOAT;
+              && ( (isnan(*fb) && isnan(f[rowind])) || *fb==f[rowind] ) )
+            f[rowind]=GAL_DATA_BLANK_FLOAT;
           break;
 
         case GAL_DATA_TYPE_DOUBLE:
           d=col->array;
-          d[lineind]=strtod(tokens[col->status], &tailptr);
+          d[rowind]=strtod(tokens[col->status], &tailptr);
           if( (db=colinfo[col->status-1].array)
-              && ( (isnan(*db) && isnan(d[lineind])) || *db==d[lineind] ) )
-            d[lineind]=GAL_DATA_BLANK_DOUBLE;
+              && ( (isnan(*db) && isnan(d[rowind])) || *db==d[rowind] ) )
+            d[rowind]=GAL_DATA_BLANK_DOUBLE;
           break;
 
         default:
@@ -653,10 +740,9 @@ gal_txt_table_read(char *filename, size_t numrows, 
gal_data_t *colinfo,
   char **tokens;
   gal_data_t *out=NULL;
   struct gal_linkedlist_sll *ind;
-  size_t maxcolnum=0, lineind=0, lineno=0;
+  size_t maxcolnum=0, rowind=0, lineno=0;
   size_t linelen=10; /* `linelen' will be increased by `getline'. */
 
-
   /* Open the file. */
   errno=0;
   fp=fopen(filename, "r");
@@ -689,7 +775,6 @@ gal_txt_table_read(char *filename, size_t numrows, 
gal_data_t *colinfo,
       out->status=ind->v+1;
     }
 
-
   /* Allocate the space to keep the pointers to each token in the
      line. This is done here to avoid having to allocate/free this array
      for each line in `txt_fill_columns'. Note that the column numbers are
@@ -707,7 +792,7 @@ gal_txt_table_read(char *filename, size_t numrows, 
gal_data_t *colinfo,
     {
       ++lineno;
       if( get_line_stat(line) == TXT_LINESTAT_DATAROW )
-        txt_fill_columns(line, tokens, maxcolnum, colinfo, out, lineind++,
+        txt_fill_columns(line, tokens, maxcolnum, colinfo, out, rowind++,
                          lineno, filename);
     }
 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]