bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] [RFC] add support for input/output count of lines


From: Roberto Nibali
Subject: [PATCH] [RFC] add support for input/output count of lines
Date: Fri, 01 Oct 2004 09:48:47 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20040913

Hello,

[I'm not subscribed, so please cc to all addresses if you would like to get feedback]

The attached patch adds support for printing input/output line counters on stderr through two new parameters '-j' and '-g'. The rationale behind this is following:

As dumbfolded as we seem to be, we simply couldn't find a way to reduce the 3 needed fork()'s to achieve following:

1. get the amount of lines of a file that is grep'd.
2. get the amount of lines of where a pattern matched.
3. display the matching lines.

So, here is a 'theoretical trace' of what we do [3 fork()'s]:

$ wc -l < /var/log/really_big_file
$ grep -c foobar /var/log/really_big_file
$ grep foobar /var/log/really_big_file

With the new patch you can do it with one fork:

$ grep -j -g foobar /var/log/really_big_file

or the long opt version:

$ grep --input-lines --output-line foobar /var/log/really_big_file

Et voila, it gives your the matching lines and prints two lines to stderr, input_lines=... and output_lines=...

Normally I wouldn't really care about 3 pipes and the resulting fork()'s but if you're processing a really big file (>2GB) there is a significant overhead, not really in the fork() and exec() but in read()'ing and lseek()'ing the file. Doing it in one go would be preferrable in many situations.

We did of course maintain 100% backwards compatibility with the existing feature set and we also tested the added functionality. So if you guys are not against including such a thing for the next drop of grep, I'd be grateful if you could merge this little non-intrusive patch with your CVS tree. For us it's one patch less to maintain :).

Thanks and best regards,
Roberto Nibali, ratz
--
-------------------------------------------------------------
addr://Rathausgasse 31, CH-5001 Aarau  tel://++41 62 823 9355
http://www.terreactive.com             fax://++41 62 823 9356
-------------------------------------------------------------
terreActive AG                       Wir sichern Ihren Erfolg
-------------------------------------------------------------
Besuchen Sie uns am 29. und 30.9. an der security-zone.info
Weitere Informationen finden Sie auf www.security-zone.info
Wir freuen uns, Sie an der Messe begrüssen zu dürfen.
-------------------------------------------------------------
diff -ur grep-2.5-uclibc/src/grep.c 
grep-2.5-uclibc-fixed_DESTDIR-countpatch/src/grep.c
--- grep-2.5-uclibc/src/grep.c  2002-03-13 07:49:52.000000000 -0700
+++ grep-2.5-uclibc-fixed_DESTDIR-countpatch/src/grep.c 2004-09-29 
06:32:43.000000000 -0600
@@ -80,7 +80,7 @@
 static struct exclude *included_patterns;
 /* Short options.  */
 static char const short_options[] =
-"0123456789A:B:C:D:EFGHIPUVX:abcd:e:f:hiKLlm:noqRrsuvwxyZz";
+"0123456789A:B:C:D:EFGHIPUVX:abcd:e:f:ghijKLlm:noqRrsuvwxyZz";
 
 /* Non-boolean long options that have no corresponding short equivalents.  */
 enum
@@ -143,6 +143,8 @@
   {"version", no_argument, NULL, 'V'},
   {"with-filename", no_argument, NULL, 'H'},
   {"word-regexp", no_argument, NULL, 'w'},
+  {"input-lines", no_argument, NULL, 'j'},
+  {"output-lines", no_argument, NULL, 'g'},
   {0, 0, 0, 0}
 };
 
@@ -439,6 +441,8 @@
 static int out_invert;         /* Print nonmatching stuff. */
 static int out_file;           /* Print filenames. */
 static int out_line;           /* Print line numbers. */
+static int stderr_input;       /* Print input line count on stderr. */
+static int stderr_output;      /* Print output line count on stderr. */
 static int out_byte;           /* Print byte offsets. */
 static int out_before;         /* Lines of leading context. */
 static int out_after;          /* Lines of trailing context. */
@@ -514,11 +518,13 @@
 {
   if (out_file)
     printf ("%s%c", filename, sep & filename_mask);
+
   if (out_line)
     {
       nlscan (beg);
       totalnl = add_count (totalnl, 1);
-      print_offset_sep (totalnl, sep);
+      if (!stderr_input && !stderr_output)
+        print_offset_sep (totalnl, sep);
       lastnl = lim;
     }
   if (out_byte)
@@ -945,6 +951,12 @@
     status = count + 2;
   else
     {
+      if (stderr_input)
+       fprintf (stderr, _("input_lines=%d\n"), totalnl);
+
+      if (stderr_output)
+       fprintf (stderr, _("output_lines=%d\n"), count);
+
       if (count_matches)
        {
          if (out_file)
@@ -1100,6 +1112,8 @@
   -L, --files-without-match only print FILE names containing no match\n\
   -l, --files-with-matches  only print FILE names containing matches\n\
   -c, --count               only print a count of matching lines per FILE\n\
+  -j, --input-lines         print total number of lines on STDERR\n\
+  -g, --output-lines        print count of matching lines on STDERR\n\
   -Z, --null                print 0 byte after FILE name\n"));
       printf (_("\
 \n\
@@ -1470,10 +1484,20 @@
          keys[keycc++] = '\n';
        break;
 
+      case 'g':
+        out_line = 1;
+        stderr_output = 1;
+        break;
+
       case 'h':
        no_filenames = 1;
        break;
 
+      case 'j':
+        out_line = 1;
+        stderr_input = 1;
+        break;
+
       case 'i':
       case 'y':                        /* For old-timers . . . */
        match_icase = 1;
@@ -1622,7 +1646,6 @@
       default:
        usage (2);
        break;
-
       }
 
   /* POSIX.2 says that -q overrides -l, which in turn overrides the

reply via email to

[Prev in Thread] Current Thread [Next in Thread]