bug-recutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-recutils] Comments on the support for lazy parsing of rsets


From: Michał Masłowski
Subject: Re: [bug-recutils] Comments on the support for lazy parsing of rsets
Date: Fri, 10 Aug 2012 22:43:12 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux)

Hi.

> - The enumerated type rec_parser_lazy_e could have a better name:
>   rec_parser_laziness_e, or maybe rec_parser_mode_e.

Unless we will have other "parser modes", rec_parser_mode_e seems better
(REC_PARSER_FINISH clearly isn't a "laziness").  Changed.

> - The comment preceding the definition of enum rec_parser_lazy_e
>   is not very helpful.  Please expand it to explain: 1) what is
>   the enumerated type used for and b) the meaning of the existing
>   values EAGER, LAZY and FINISH.

Not found the existing comment to be helpful when doing the change
above, rewritten.

> - The enumerated type rec_parser_lazy_e is defined, but functions
>   getting values of that type are using int instead:
>
>     rec_parse_rset_common (rec_parser_t parser,
>                            rec_rset_t *rset,
>                            int laziness)
>
>     rec_parse_rset_continue (rec_parser_t parser,
>                              rec_rset_t rset,
>                              int laziness)

Fixed.

> - In the documentation of rec_parse_rset_finish (in rec.h) the
>   sencence "Parse additional records into RSET." is a bit
>   confusing.  What it does is to parse all the records of the
>   given record set.

Before it's called there might be one record already parsed, so the rest
would be "additional".  Changed, the expected result is to have all
records parsed.

> - The documentation of rec_parse_rset_finish (in rec.h) must
>   clearly specify what RSET will contain in case an error arises:
>   only the record descriptor? undefined?  Can the user still use
>   the data in the RSET?

It unclearly suggested there will be some data from before the error.
Made it clearer.

> - Please put an empty line between compound statements.  Don't do
>   this:
>
>     else
>       {
>         rec_record_set_container (record, rset);
>         rec_mset_append (rec_rset_mset_raw (rset), MSET_RECORD, (void *) 
> record, MSET_ANY);
>       }
>     if (laziness == REC_PARSER_LAZY)
>       {
>
>   Please follow the existing coding style.

Reading this example I realized why this is done.  Changed.

> - When parsing the default record set (w/o a record descriptor) a lazy
>   parser will always store the first record in the parsed record set,
>   isnt it?

It already needs to parse this record, to know if it's a descriptor.  An
alternative would be to forget this record and parse it again with other
records, so I thought storing it would be simpler.

> - Please follow the existing coding style when adding new members to
>   structures.  For example, in:
>
>    struct rec_rset_s
>    {
>      long parser_pos; /* Position of the parser on the start of this
>                          record set. */
>      size_t parser_line; /* Line number at that position. */
>      rec_parser_t parser; /* The parser for the record set, is NULL if
>                              for wholy parsed records. */
>      rec_record_t descriptor;
>
>   Why are the new fields just dropped at the beginning of the
>   structure?  The proper first field to have in the struct is the
>   descriptor.  Add the parser stuff in a dedicated section at the end
>   of the struct.

Changed.

> - In the rec_rset_s struct the comment:
>
>     /* Storage for records and comments.  Undefined when parser is not
>      NULL (call rec_rset_finish_parsing before accessing it). */
>     int record_type;
>     int comment_type;
>     rec_mset_t mset;
>
>   is misleading.  Why is mset undefined in that case?  It must be NULL
>   instead.

It must not be NULL when it contains a record.  Even if it wouldn't have
one, the mset needs to be allocated somewhere before using
rec_rset_mset_raw (for rec_rset_finish_parsing) and this seemed simpler.
Changed the comment to clarify what it contains.

> - The functions rec_rset_* returning properties of the record set,
>   such as rec_rset_num_elems, are not returning error codes when the
>   parser fails.  This must be fixed.

Changed to return SIZE_MAX on error, we might instead change the type to
return -1 instead.

> - In the functions rec_rset_* returning the number of elements of
>   several types stored in the record set you are checking for SIZE_MAX
>   when the parser is not NULL.  Why?  These values are initialized to
>   0, not to SIZE_MAX.  Besides, rec_rset_finish_parsing must be called
>   in any case when parser != NULL in order to determine the result.
>   So why testing for SIZE_MAX?

This is a separate optimization to make recinf (and recsel's random
selection) not need to parse the rset to count the records.  I used
SIZE_MAX since 0 might be used in case of rsets without records or
comments.  That patch changed the initialization, so it seemed to work.
(Not included that in the patch, since it is not needed for other
changes, except for saving this data from the index file.)

> - The possibility of having infinite recursive calls to
>   rec_rset_finish_parsing must be handled in a different way: all the
>   calls to the rec_rset_* functions in rec_parse_* functions must be
>   listed and analyzed.  The only potential risk that I can find in
>   rec_parse_rset_continue is a call to rec_rset_num_records, but that
>   is fixed with the (laziness != REC_PARSER_FINISH) condition.  Is
>   this the only reason for that check, to avoid the undesired
>   recursive loop?

This check won't be needed unless an error occurred earlier (the
condition would be satisfied if it would be computed differently).  The
reason why I left it is that future changes could introduce this
problem, I think it's similar to the error message when a parser has no
backend (debugging that REC_PARSER_FINISH is needed was easier with the
abort).

Made this more clear in the documentation of rec_rset_store_parser.

> - Your changes introduces a big change in the API semantics: when a
>   user uses a given parser to parse record sets, she must _not_
>   destroy the parser until the corresponding record sets or database
>   are destroyed.  This must be documented in rec.h, and is probably a
>   bit problematic.  How to avoid this?  What about creating an
>   internal parser when rec_rset_finish_parsing is used?.

Changing this would require copying the parser and its data source.
It's unclear from the existing API when the memory buffer backend would
need copying.

Changed the comment.  We cannot avoid having to change code when
replacing rec_parse_rset with rec_parse_rset_lazy, since parser errors
can occur in functions that had no errors before, so an internal
parser won't prevent all API semantics changes.

The lazy-rset-2 branch of git://elderthing.mtjm.eu/mtjm-recutils.git has
the patch included below, it's based on the three original patches from
the index-file branch.


From ff8b4db9b396cb53f5874bb0ef2821a3c9ec9ac8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Mas=C5=82owski?= <address@hidden>
Date: Fri, 10 Aug 2012 22:09:16 +0200
Subject: [PATCH] src,torture: support lazy parsing of record sets.

The aim is to use indexes and other optimizations to avoid parsing the
whole record set in some cases, while easily using the existing code
in other cases.
---
 ChangeLog                                |  34 ++++
 src/rec-parser.c                         | 287 ++++++++++++++++++++-----------
 src/rec-rset.c                           | 109 +++++++++++-
 src/rec.h                                |  55 +++++-
 torture/Makefile.am                      |   1 +
 torture/rec-parser/rec-parse-rset-lazy.c | 130 ++++++++++++++
 torture/rec-parser/tsuite-rec-parser.c   |   2 +
 7 files changed, 505 insertions(+), 113 deletions(-)
 create mode 100644 torture/rec-parser/rec-parse-rset-lazy.c

diff --git a/ChangeLog b/ChangeLog
index 14e98f1..3c06c37 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,37 @@
+2012-08-10  Michał Masłowski  <address@hidden>
+
+       src,torture: support lazy parsing of record sets.
+       * src/rec-parser.c (rec_parser_lazy_e): New enum.
+       (rec_parse_rset): Split the work into rec_parse_rset_common and
+       rec_parse_rset_continue adding support for lazy loading.
+       (rec_parse_rset_lazy): New function.
+       (rec_parse_rset_finish): Likewise.
+       (rec_parse_rset_common): Likewise.
+       (rec_parse_rset_continue): Likewise.
+       * src/rec-rset.c (rec_rset_t): New fields parser_pos, parser_line,
+       parser and finishing.
+       * src/rec-rset.c (rec_rset_dup): Call rec_rset_finish_parsing.
+       (rec_rset_mset): Likewise.
+       (rec_rset_num_elements): Likewise.
+       (rec_rset_num_records): Likewise.
+       (rec_rset_num_comments): Likewise.
+       (rec_rset_sort): Likewise.
+       (rec_rset_group): Likewise.
+       (rec_rset_add_auto_fields): Likewise.
+       (rec_rset_mset_raw): New function, copied from rec_rset_mset.
+       (rec_rset_store_parser): New function.
+       (rec_rset_finish_parsing): Likewise.
+       * src/rec.h (rec_parser_t): Move declaration before use.
+       * src/rec.h: Add prototypes for rec_rset_mset_raw,
+       rec_rset_store_parser, rec_parse_rset_lazy and
+       rec_parse_rset_finish.
+
+       * torture/Makefile.am (REC_PARSER_TSUITE): rec-parse-rset-lazy.c
+       added.
+       * torture/rec-parser/rec-parse-rset-lazy.c: New file.
+       * torture/rec-parser/tsuite-rec-parser.c (tsuite_rec_parser):
+       Added test_rec_parse_rset_lazy.
+
 2012-08-05  Jose E. Marchesi  <address@hidden>
 
        src,doc: support for dot notation in simple fexes.
diff --git a/src/rec-parser.c b/src/rec-parser.c
index e6c3b00..f2a0ab2 100644
--- a/src/rec-parser.c
+++ b/src/rec-parser.c
@@ -35,6 +35,9 @@
 #include <rec.h>
 #include <rec-utils.h>
 
+/* Forward declaration. */
+enum rec_parser_mode_e;
+
 /*
  * Static functions defined in this file
  */
@@ -50,6 +53,8 @@ static bool rec_parse_comment (rec_parser_t parser, 
rec_comment_t *comment);
 static bool rec_parser_digit_p (char c);
 static bool rec_parser_letter_p (char c);
 static bool rec_parser_init_common (rec_parser_t parser, const char *source);
+static bool rec_parse_rset_common (rec_parser_t parser, rec_rset_t *rset, enum 
rec_parser_mode_e mode);
+static bool rec_parse_rset_continue (rec_parser_t parser, rec_rset_t rset, 
enum rec_parser_mode_e mode);
 
 /*
  * Parser Data Structure
@@ -106,6 +111,26 @@ const char *rec_parser_error_strings[] =
 
 #define FNAME(id) rec_std_field_name ((id))
 
+/* There are two ways to handle parsing record sets: parse it
+   completely, or delay it until the records are accessed.  The second
+   way is needed to make indexes useful and could be useful even
+   without them if not all rsets of a database are accessed by a
+   query.
+
+   Parsing the record completely uses one call of
+   rec_parse_rset_common with the REC_PARSER_EAGER mode, lazy
+   (delayed) parsing uses one call with REC_PARSER_LAZY to parse only
+   the record descriptor (or the first record of default rset) and
+   later, if needed, with REC_PARSER_FINISH.
+*/
+
+enum rec_parser_mode_e
+{
+  REC_PARSER_EAGER,
+  REC_PARSER_LAZY,
+  REC_PARSER_FINISH
+};
+
 /*
  * Public functions.
  */
@@ -522,112 +547,20 @@ bool
 rec_parse_rset (rec_parser_t parser,
                 rec_rset_t *rset)
 {
-  bool ret;
-  int ci;
-  char c;
-  rec_rset_t new;
-  rec_record_t record;
-  rec_comment_t comment;
-  size_t comments_added = 0;
-
-  ret = false;
-
-  if ((new = rec_rset_new ()) == NULL)
-    {
-      /* Out of memory */
-      parser->error = REC_PARSER_ENOMEM;
-      return false;
-    }
-
-  /* Set the descriptor for this record set.  */
-  rec_rset_set_descriptor (new, parser->prev_descriptor);
-  parser->prev_descriptor = NULL;
-
-  while ((ci = rec_parser_getc (parser)) != EOF)
-    {
-      c = (char) ci;
-
-      /* Skip newline characters and blanks.  */
-      if ((c == '\n') || (c == ' ') || (c == '\t'))
-        {
-          continue;
-        }
-      /* Skip comments */
-      else if (c == '#')
-        {
-          rec_parser_ungetc (parser, c);
-          rec_parse_comment (parser, &comment);
-
-          /* Add the comment to the record set.  */
-          rec_mset_append (rec_rset_mset (new), MSET_COMMENT, (void *) 
comment, MSET_ANY);
-
-          comments_added++;
-        }
-      else
-        {
-          /* Try to parse a record */
-          rec_parser_ungetc (parser, c);
-          if (rec_parse_record (parser, &record))
-            {
-              /* Check if the parsed record is a descriptor.  In that
-                 case, set it as the previous descriptor in the parser
-                 state and stop parsing.  In the special case where
-                 the previous descriptor is NULL (we did not find a
-                 descriptor yet) then record the position of the
-                 descriptor as well.
-
-                 Otherwise, add the record to the current record
-                 set. */
-              if (rec_record_field_p (record, FNAME(REC_FIELD_REC)))
-                {
-                  if ((rec_rset_num_records (new) == 0) &&
-                      (!rec_rset_descriptor (new)))
-                    {
-                      /* Special case: the first record found in the
-                         input stream is a descriptor. */
-                      rec_rset_set_descriptor (new, record);
-                      rec_rset_set_descriptor_pos (new, comments_added);
-                    }
-                  else
-                    {
-                      parser->prev_descriptor = record;
-                      ret = true;
-                      break;
-                    }
-                }
-              else
-                {
-                  rec_record_set_container (record, new);
-                  rec_mset_append (rec_rset_mset (new), MSET_RECORD, (void *) 
record, MSET_ANY);
-                }
-            }
-          else
-            {
-              /* Parse error */
-              parser->error = REC_PARSER_ERECORD;
-              break;
-            }
-        }
-    }
-
-  if ((parser->error == REC_PARSER_NOERROR)
-      && (rec_rset_descriptor (new)
-          || (rec_rset_num_records (new) > 0)))
-    {
-      ret = true;
-    }
+  return rec_parse_rset_common (parser, rset, REC_PARSER_EAGER);
+}
 
-  if (ret)
-    {
-      *rset = new;
-    }
-  else
-    {
-      rec_rset_destroy (new);
-      *rset = NULL;
-    }
+bool
+rec_parse_rset_lazy (rec_parser_t parser,
+                     rec_rset_t *rset)
+{
+  return rec_parse_rset_common (parser, rset, REC_PARSER_LAZY);
+}
 
-  return ret;
+bool
+rec_parse_rset_finish (rec_parser_t parser, rec_rset_t rset)
+{
+  return rec_parse_rset_continue (parser, rset, REC_PARSER_FINISH);
 }
 
 bool
@@ -1227,4 +1160,150 @@ rec_parser_init_common (rec_parser_t parser,
   return true;
 }
 
+static bool
+rec_parse_rset_common (rec_parser_t parser,
+                       rec_rset_t *rset,
+                       enum rec_parser_mode_e mode)
+{
+  bool ret;
+  rec_rset_t new;
+
+  if ((new = rec_rset_new ()) == NULL)
+    {
+      /* Out of memory */
+      parser->error = REC_PARSER_ENOMEM;
+      return false;
+    }
+
+  /* Set the descriptor for this record set.  */
+  rec_rset_set_descriptor (new, parser->prev_descriptor);
+  parser->prev_descriptor = NULL;
+
+  ret = rec_parse_rset_continue (parser, new, mode);
+
+  if (ret)
+    {
+      *rset = new;
+    }
+  else
+    {
+      rec_rset_destroy (new);
+      *rset = NULL;
+    }
+
+  return ret;
+}
+
+/* Parse records into an rset or prepare it for delayed parsing. */
+static bool
+rec_parse_rset_continue (rec_parser_t parser,
+                         rec_rset_t rset,
+                         enum rec_parser_mode_e mode)
+{
+  bool ret;
+  int ci;
+  char c;
+  rec_record_t record;
+  rec_comment_t comment;
+  size_t comments_added = 0;
+  long position = -1;
+
+  ret = false;
+
+  while ((ci = rec_parser_getc (parser)) != EOF)
+    {
+      c = (char) ci;
+
+      /* Skip newline characters and blanks.  */
+      if ((c == '\n') || (c == ' ') || (c == '\t'))
+        {
+          continue;
+        }
+      /* Skip comments */
+      else if (c == '#')
+        {
+          rec_parser_ungetc (parser, c);
+          rec_parse_comment (parser, &comment);
+
+          /* Add the comment to the record set.  */
+          rec_mset_append (rec_rset_mset_raw (rset), MSET_COMMENT, (void *) 
comment, MSET_ANY);
+
+          comments_added++;
+        }
+      else
+        {
+          /* Try to parse a record */
+          rec_parser_ungetc (parser, c);
+          if (rec_parse_record (parser, &record))
+            {
+              /* Check if the parsed record is a descriptor.  In that
+                 case, set it as the previous descriptor in the parser
+                 state and stop parsing.  In the special case where
+                 the previous descriptor is NULL (we did not find a
+                 descriptor yet) then record the position of the
+                 descriptor as well.
+
+                 Otherwise, add the record to the current record
+                 set. */
+              if (rec_record_field_p (record, FNAME(REC_FIELD_REC)))
+                {
+                  if ((mode != REC_PARSER_FINISH)
+                      && (rec_rset_num_records (rset) == 0) &&
+                      (!rec_rset_descriptor (rset)))
+                    {
+                      /* Special case: the first record found in the
+                         input stream is a descriptor. */
+                      rec_rset_set_descriptor (rset, record);
+                      rec_rset_set_descriptor_pos (rset, comments_added);
+                    }
+                  else
+                    {
+                      parser->prev_descriptor = record;
+                      ret = true;
+                      break;
+                    }
+                }
+              else
+                {
+                  rec_record_set_container (record, rset);
+                  rec_mset_append (rec_rset_mset_raw (rset), MSET_RECORD, 
(void *) record, MSET_ANY);
+                }
+
+              if (mode == REC_PARSER_LAZY)
+                {
+                  position = rec_parser_tell (parser);
+                  if (position >= 0)
+                    {
+                      /* Requested to not parse the record set
+                         completely yet, so stop now. */
+                      rec_rset_store_parser (rset, parser, position, 
parser->line);
+                      ret = true;
+                      break;
+                    }
+                }
+            }
+          else
+            {
+              /* Parse error */
+              parser->error = REC_PARSER_ERECORD;
+              break;
+            }
+        }
+    }
+
+  /* The first stage of lazy parsing would check for a record before,
+     the second stage would never be called without a record.  The
+     check for the number of records must be avoided in these cases,
+     since it would parse the record set recursively.  */
+  if ((parser->error == REC_PARSER_NOERROR)
+      && mode != REC_PARSER_LAZY
+      && (mode == REC_PARSER_FINISH || rec_rset_descriptor (rset)
+          || (rec_rset_num_records (rset) > 0)))
+    {
+      ret = true;
+    }
+
+  return ret;
+}
+
 /* End of rec-parser.c */
diff --git a/src/rec-rset.c b/src/rec-rset.c
index 1d211df..9ee3212 100644
--- a/src/rec-rset.c
+++ b/src/rec-rset.c
@@ -96,10 +96,24 @@ struct rec_rset_s
   rec_sex_t *constraints;
   size_t num_constraints;
 
-  /* Storage for records and comments.  */
+  /* Storage for records and comments.  Call rec_rset_finish_parsing
+     before accessing the mset, since a lazily parsed rset otherwise
+     will have at most the first record stored there.  */
   int record_type;
   int comment_type;
   rec_mset_t mset;
+
+  /* Storage for parser and its state, used for lazy rsets.  */
+  long parser_pos; /* Position of the parser on the start of this
+                      record set. */
+  size_t parser_line; /* Line number at that position. */
+  rec_parser_t parser; /* The parser for the record set, is NULL if
+                          for wholy parsed records. */
+
+  /* Some code uses the record set to finish its parsing, unobvious
+     infinite loops result from it requesting to finish the parsing
+     again.  */
+  bool finishing;
 };
 
 /* Static functions implemented below.  */
@@ -151,6 +165,7 @@ static int rec_rset_compare_typed_records (rec_rset_t rset,
                                            rec_record_t record2,
                                            rec_fex_t fields);
                                            
+static bool rec_rset_finish_parsing (rec_rset_t rset);
 
 /* The following macro is used by some functions to reduce
    verbosity.  */
@@ -268,6 +283,11 @@ rec_rset_dup (rec_rset_t rset)
 {
   rec_rset_t new = NULL;
 
+  if (!rec_rset_finish_parsing (rset))
+    {
+      return NULL;
+    }
+
   new = malloc (sizeof (struct rec_rset_s));
   if (new)
     {
@@ -310,24 +330,46 @@ rec_rset_dup (rec_rset_t rset)
 rec_mset_t
 rec_rset_mset (rec_rset_t rset)
 {
+  if (!rec_rset_finish_parsing (rset))
+    {
+      return NULL;
+    }
+  return rset->mset;
+}
+
+rec_mset_t
+rec_rset_mset_raw (rec_rset_t rset)
+{
   return rset->mset;
 }
 
 size_t
 rec_rset_num_elems (rec_rset_t rset)
 {
+  if (!rec_rset_finish_parsing (rset))
+    {
+      return SIZE_MAX;
+    }
   return rec_mset_count (rset->mset, MSET_ANY);
 }
 
 size_t
 rec_rset_num_records (rec_rset_t rset)
 {
+  if (!rec_rset_finish_parsing (rset))
+    {
+      return SIZE_MAX;
+    }
   return rec_mset_count (rset->mset, rset->record_type);
 }
 
 size_t
 rec_rset_num_comments (rec_rset_t rset)
 {
+  if (!rec_rset_finish_parsing (rset))
+    {
+      return SIZE_MAX;
+    }
   return rec_mset_count (rset->mset, rset->comment_type);
 }
 
@@ -719,6 +761,11 @@ rec_rset_t
 rec_rset_sort (rec_rset_t rset,
                rec_fex_t sort_by)
 {
+  if (!rec_rset_finish_parsing (rset))
+    {
+      return NULL;
+    }
+
   if (sort_by)
     {
       rec_rset_set_order_by_fields (rset, sort_by);
@@ -755,6 +802,11 @@ rec_rset_group (rec_rset_t rset,
   bool *deletion_map;
   size_t num_record;
 
+  if (!rec_rset_finish_parsing (rset))
+    {
+      return NULL;
+    }
+
   /* Create and initialize the deletion map.  */
 
   map_size = sizeof(bool) * rec_rset_num_records (rset);
@@ -841,6 +893,11 @@ rec_rset_add_auto_fields (rec_rset_t rset,
   rec_type_t type;
   size_t i;
 
+  if (!rec_rset_finish_parsing (rset))
+    {
+      return NULL;
+    }
+
   if ((auto_fields = rec_rset_auto (rset)))
     {
       size_t num_auto_fields = rec_fex_size (auto_fields);
@@ -923,6 +980,17 @@ rec_rset_sex_constraint (rec_rset_t rset,
   return rset->constraints[index];
 }
 
+void
+rec_rset_store_parser (rec_rset_t rset,
+                       rec_parser_t parser,
+                       size_t position,
+                       size_t line)
+{
+  rset->parser = parser;
+  rset->parser_pos = position;
+  rset->parser_line = line;
+}
+
 /*
  * Private functions
  */
@@ -1819,4 +1887,43 @@ rec_rset_compare_typed_records (rec_rset_t rset,
   return result;
 }
 
+/* Make fields requiring parsing the whole rset available.  Returns
+   'false' on error. */
+static bool
+rec_rset_finish_parsing (rec_rset_t rset)
+{
+  if (!rset->parser)
+    {
+      /* Already done. */
+      return true;
+    }
+
+  if (rset->finishing)
+    {
+      fprintf (stderr, "rec_rset_finish_parsing: recursive call. This is a 
bug.\
+  Please report it.");
+      abort ();
+    }
+
+  rset->finishing = true;
+
+  if (!rec_parser_seek (rset->parser, rset->parser_line, rset->parser_pos))
+    {
+      return false;
+    }
+
+  if (rec_parse_rset_finish (rset->parser, rset))
+    {
+      /* Don't parse it again. */
+      rset->parser = NULL;
+      return true;
+    }
+  else
+    {
+      /* Report failure.  No following rset operation will be correct,
+         so don't treat it as already parsed. */
+      return false;
+    }
+}
+
 /* End of rec-rset.c */
diff --git a/src/rec.h b/src/rec.h
index 5345982..49e4074 100644
--- a/src/rec.h
+++ b/src/rec.h
@@ -1047,6 +1047,10 @@ typedef struct rec_sex_s *rec_sex_t;
 
 #define MSET_RECORD 1
 
+/* Opaque data type representing a parser.  */
+
+typedef struct rec_parser_s *rec_parser_t;
+
 /************ Creating and destroying record sets **************/
 
 /* Create a new empty record set and return a reference to it.  NULL
@@ -1071,20 +1075,30 @@ rec_rset_t rec_rset_dup (rec_rset_t rset);
 /********* Getting and Setting record set properties *************/
 
 /* Return the multi-set containing the elements stored by the given
-   record set.  */
+   record set.  Returns NULL on error if the rset was lazily
+   parsed.  */
 
 rec_mset_t rec_rset_mset (rec_rset_t rset);
 
+/* Like above, but don't parse the record if it was set to be parsed
+   later.  This function is probably not needed outside of the parser
+   code. */
+
+rec_mset_t rec_rset_mset_raw (rec_rset_t rset);
+
 /* Return the number of elements stored in the given record set, of
-   any type.  */
+   any type or SIZE_MAX on error (only for lazily parsed record
+   sets).  */
 
 size_t rec_rset_num_elems (rec_rset_t rset);
 
-/* Return the number of records stored in the given record set.  */
+/* Return the number of records stored in the given record set or
+   SIZE_MAX on error (only for lazily parsed record sets).  */
 
 size_t rec_rset_num_records (rec_rset_t rset);
 
-/* Return the number of comments stored in the given record set.  */
+/* Return the number of comments stored in the given record set or
+   SIZE_MAX on error (only for lazily parsed record sets).  */
 
 size_t rec_rset_num_comments (rec_rset_t rset);
 
@@ -1261,6 +1275,18 @@ rec_rset_t rec_rset_group (rec_rset_t rset, rec_fex_t 
group_by);
 
 rec_rset_t rec_rset_add_auto_fields (rec_rset_t rset, rec_record_t record);
 
+/* Set a parser to get records on next use of the record set.  The
+   parser will be seeked to specified position and line number before
+   parsing the records.  Any later rset operation might change the
+   parser position.  The parser must not be destroyed before the
+   record set.
+
+   The rec_parse_rset_finish function might be called during a later
+   record set operation.  It, or functions it uses, must not access
+   the record set using other functions than rec_rset_mset_raw.  */
+
+void rec_rset_store_parser (rec_rset_t rset, rec_parser_t parser, size_t 
position, size_t line);
+
 /*
  * DATABASES
  *
@@ -1812,10 +1838,6 @@ bool rec_int_check_field_type (rec_db_t db,
  * entire record sets from a file stream or a memory buffer.
  */
 
-/* Opaque data type representing a parser.  */
-
-typedef struct rec_parser_s *rec_parser_t;
-
 /**************** Creating and destroying parsers ******************/
 
 /* Create a parser associated with a given file stream that will be
@@ -1880,6 +1902,23 @@ rec_record_t rec_parse_record_str (const char *str);
 
 bool rec_parse_rset (rec_parser_t parser, rec_rset_t *rset);
 
+/* Return a record set to be parsed on first use.  Unless the rset is
+   accessed later, only the record descriptor (or first record if
+   there is no descriptor) will be parsed.  Returns 'false' on error
+   or if the parser is not seekable. */
+
+bool rec_parse_rset_lazy (rec_parser_t parser, rec_rset_t *rset);
+
+/* Parse all records of an RSET created by rec_parse_rset_lazy.
+   Calling rec_parse_rset_lazy and then rec_parse_rset_finish should
+   be equivalent to calling rec_parse_rset, except that it won't
+   delete the partial rset in case of errors after the descriptor.
+   This function returns 'false' if a parse error is found, leaving
+   the rset as if it ended with the last complete record or comment
+   from before the error and was successfully parsed.  */
+
+bool rec_parse_rset_finish (rec_parser_t parser, rec_rset_t rset);
+
 /* Parse a database and return it in DB.  This function returns
    'false' and the value in DB is undefined if a parse error is
    found.  */
diff --git a/torture/Makefile.am b/torture/Makefile.am
index 38e4e77..4d53b9b 100644
--- a/torture/Makefile.am
+++ b/torture/Makefile.am
@@ -110,6 +110,7 @@ REC_PARSER_TSUITE = rec-parser/rec-parser-new.c \
                     rec-parser/rec-parse-record.c \
                     rec-parser/rec-parse-record-str.c \
                     rec-parser/rec-parse-rset.c \
+                    rec-parser/rec-parse-rset-lazy.c \
                     rec-parser/rec-parse-db.c \
                     rec-parser/rec-parser-eof.c \
                     rec-parser/rec-parser-error.c \
diff --git a/torture/rec-parser/rec-parse-rset-lazy.c 
b/torture/rec-parser/rec-parse-rset-lazy.c
new file mode 100644
index 0000000..ff42141
--- /dev/null
+++ b/torture/rec-parser/rec-parse-rset-lazy.c
@@ -0,0 +1,130 @@
+/* -*- mode: C -*-
+ *
+ *       File:         rec-parse-rset.c
+ *       Date:         Sat Nov 13 21:30:44 2010
+ *
+ *       GNU recutils - rec_parse_rset_lazy unit tests.
+ *
+ */
+
+/* Copyright (C) 2010 Jose E. Marchesi */
+/* Copyright (C) 2012 Michał Masłowski */
+
+/* This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <config.h>
+#include <string.h>
+#include <stdio.h>
+#include <check.h>
+
+#include <rec.h>
+
+/*-
+ * Test: rec_parse_rset_lazy_nominal
+ * Unit: rec_parse_rset_lazy
+ * Description:
+ * + Parse lazily valid record sets.
+ */
+START_TEST(rec_parse_rset_lazy_nominal)
+{
+  rec_parser_t parser;
+  rec_rset_t rset;
+  char *str;
+
+  str = "foo1: bar1\n\nfoo2: bar2\n\nfoo3: bar3";
+  parser = rec_parser_new_str (str, "dummy");
+  fail_if (!rec_parse_rset_lazy (parser, &rset));
+  fail_if (rec_rset_num_records (rset) != 3);
+  rec_parser_destroy (parser);
+
+  str = "%rec: foo\n\nfoo1: bar1\n\nfoo2: bar2\n\nfoo3: bar3";
+  parser = rec_parser_new_str (str, "dummy");
+  fail_if (!rec_parse_rset_lazy (parser, &rset));
+  fail_if (rec_rset_num_records (rset) != 3);
+  rec_parser_destroy (parser);
+
+  str = "%rec: foo\n\nfoo1: bar1\n\n#foo2: bar2\n\nfoo3: bar3";
+  parser = rec_parser_new_str (str, "dummy");
+  fail_if (!rec_parse_rset_lazy (parser, &rset));
+  fail_if (rec_rset_num_elems (rset) != 3);
+  fail_if (rec_rset_num_comments (rset) != 1);
+  fail_if (rec_rset_num_records (rset) != 2);
+  rec_parser_destroy (parser);
+}
+END_TEST
+
+/*-
+ * Test: rec_parse_rset_lazy_seeked
+ * Unit: rec_parse_rset_lazy
+ * Description:
+ * + Parse lazily valid record sets remembering parser position.
+ */
+START_TEST(rec_parse_rset_lazy_seeked)
+{
+  rec_parser_t parser;
+  rec_rset_t rset;
+  char *str;
+
+  str = "%rec: foo\n\nfoo1: bar1\n\nfoo2: bar2\n\nfoo3: bar3";
+  parser = rec_parser_new_str (str, "dummy");
+  fail_if (!rec_parse_rset_lazy (parser, &rset));
+  fail_if (!rec_parser_seek (parser, 10, strlen (str)));
+  fail_if (rec_rset_num_records (rset) != 3);
+  rec_parser_destroy (parser);
+}
+END_TEST
+
+/*-
+ * Test: rec_parse_rset_lazy_invalid
+ * Unit: rec_parse_rset_lazy
+ * Description:
+ * + Try to parse invalid record sets.
+ */
+START_TEST(rec_parse_rset_lazy_invalid)
+{
+  rec_parser_t parser;
+  rec_rset_t rset;
+  char *str;
+
+  str = " ";
+  parser = rec_parser_new_str (str, "dummy");
+  fail_if (parser == NULL);
+  fail_if (rec_parse_rset_lazy (parser, &rset));
+  rec_parser_destroy (parser);
+
+  /* A record set with a syntax error in its last record.  */
+  str = "%rec: test\n\nfoo1: bar1\n\nfoo1 bar1";
+  parser = rec_parser_new_str (str, "dummy");
+  fail_if (!rec_parse_rset_lazy (parser, &rset));
+  fail_if (rec_rset_mset (rset));
+  rec_parser_destroy (parser);
+}
+END_TEST
+
+/*
+ * Test creation function
+ */
+TCase *
+test_rec_parse_rset_lazy (void)
+{
+  TCase *tc = tcase_create ("rec_parse_rset_lazy");
+  tcase_add_test (tc, rec_parse_rset_lazy_nominal);
+  tcase_add_test (tc, rec_parse_rset_lazy_seeked);
+  tcase_add_test (tc, rec_parse_rset_lazy_invalid);
+
+  return tc;
+}
+
+/* End of rec-parse-rset-lazy.c */
diff --git a/torture/rec-parser/tsuite-rec-parser.c 
b/torture/rec-parser/tsuite-rec-parser.c
index eee2fb3..26d5adc 100644
--- a/torture/rec-parser/tsuite-rec-parser.c
+++ b/torture/rec-parser/tsuite-rec-parser.c
@@ -36,6 +36,7 @@ extern TCase *test_rec_parse_field (void);
 extern TCase *test_rec_parse_record (void);
 extern TCase *test_rec_parse_record_str (void);
 extern TCase *test_rec_parse_rset (void);
+extern TCase *test_rec_parse_rset_lazy (void);
 extern TCase *test_rec_parse_db (void);
 extern TCase *test_rec_parser_eof (void);
 extern TCase *test_rec_parser_error (void);
@@ -59,6 +60,7 @@ tsuite_rec_parser ()
   suite_add_tcase (s, test_rec_parse_record ());
   suite_add_tcase (s, test_rec_parse_record_str ());
   suite_add_tcase (s, test_rec_parse_rset ());
+  suite_add_tcase (s, test_rec_parse_rset_lazy ());
   suite_add_tcase (s, test_rec_parse_db ());
   suite_add_tcase (s, test_rec_parser_eof ());
   suite_add_tcase (s, test_rec_parser_error ());
-- 
1.7.11.4

Attachment: pgpqU52T_6UaY.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]