groff-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[groff] 14/14: [grog]: Heavily refactor.


From: G. Branden Robinson
Subject: [groff] 14/14: [grog]: Heavily refactor.
Date: Mon, 28 Jun 2021 00:44:53 -0400 (EDT)

gbranden pushed a commit to branch master
in repository groff.

commit 2ba599980081fb700e09c10abcc136657ae93070
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
AuthorDate: Mon Jun 28 14:28:32 2021 +1000

    [grog]: Heavily refactor.
    
    * src/utils/grog/grog.pl:
      - Drop import of unused module `Data::Dumper`.
      - Drop unused scalars `Sp` and `correct_tmac`.
      - Simplify determination of version number.  Drop hash `at_at` which
        only stored one key, `GROFF_VERSION`.  Initialize scalar
        `groff_version` to "DEVELOPMENT".  Rename scalar `before_make` to
        `in_source_tree` and initialize to zero.  Update `groff_version`
        with Automake-determined version variable if it is defined (i.e.,
        grog is not running in an unbuilt source tree).
      - Drop unused `Mparams` list.  Replace it with new list
        `requested_package`, which stores the arguments to any grog `-m`
        options specified by the user.
      - Rename many objects so that I, and others, can better comprehend
        their purpose, and for consistent letter casing.
        . @Command -> @command
        . @devices -> @device
        . $Prog -> $program_name
        . %macros -> %user_macro
        . $have_any_valid_args -> $have_any_valid_arguments
        . &handle_args -> &process_arguments
        . &handle_whole_files -> &process_input
        . @preprograms -> @preprocessor
        . &make_groff_device -> &infer_device
        . &make_groff_preproc -> &infer_preprocessors
        . &make_groff_tmac_man_ms -> &infer_man_or_ms_package
        . &make_groff_line_rest -> &construct_command
      - Drop many unused keys in `Groff` hash.
      - Add new lists, `macro_ms`, `macro_man`, and `macro_man_or_ms` to
        support new scoring technique to disambiguate input documents
        between these two packages.
      - Append the foregoing 3 lists to new list `standard_macro`, and add
        these as keys to the `Groff` hash.
      - Add new list `main_package` to keep track of full-service package
        names.
      - Add new scalars `man_score`, `ms_score`, and `inside_tbl_table` to
        aid disambiguation of .TH macro calls and the many macro names
        shared between man(7) and ms(7).
    
      (process_arguments): Strip '-m' off of argument before storing the
      remainder in `@requested_package`.
    
      (do_line): Detect .TH macro call even if white space occurs between
      the control character and the macro name.
    
      (do_line): Inflate `$man_score` by 100 if .TH is the first macro call
      seen in a document.
    
      (do_line): Fix bug; clear `$before_first_command` in correct
      scope--after any macro call, not just if we saw a .TH as the first
      macro call.
    
      (do_line): Set `$inside_tbl_table` when we see a .TS call.
    
      (do_line): Clear `$inside_tbl_table` when we see a .TE call.  Also
      increment `$Groff{'tbl')' again, increasing the "score" of tbl(1)
      usage evidence.
    
      (do_line): Drop a lot of code that manually increments %Groff keys
      corresponding to man and ms macros.  This is now done differently and
      elsewhere.
    
      (do_line): Drop "P" from list of characteristic mm(7) macros.
    
      (do_line): Simplify matching of mom(7) macros (match $command, not
      $line).  Extend list of characteristic mom(7) macros.
    
      (do_line): Increment $Groff{$key} if $key is in @standard_macro.
    
      (infer_man_or_ms_package): Rewrite.  Compute a score for each package
      by counting occurrences of their characteristic macros.  If both have
      a score of zero, assume that the input is a raw roff document.  If the
      scores are equal (doc/webpage.ms, startlingly, comes within 1 point of
      a tied score), infer ms(7) if 'TH' was never called, and if it was,
      issue a diagnostic advising user to supply a disambiguating `-m`
      option.  Otherwise, the scores are unequal, and infer the package of
      the winner.  Set scalar `inferred_main_package` instead of pushing
      `-m` options onto `@m`.
    
      (infer_macro_packages): Set scalar `inferred_main_package` instead of
      pushing `-m` options onto `@m`.  Explicitly return 0 if we fall off
      the end of the function.
    
      (construct_command): Rewrite handling of -m options.  Add new list
      `msupp` to store supplementary (non-main) macro package arguments.  If
      a full-service package was explicitly requested, it had better not
      clash with what we inferred.  If it does, explicitly unset
      $inferred_main_package so that the -m arguments are placed in the same
      order that the user gave them; caveat dictator.  If `--run` option was
      given, just print the command; don't preface it with __FILE__ and
      __LINE__ noise.
    
      - Remove comments documenting shared variables used by subroutines.
        These are far from useless but too tedious to keep up to date while
        the code is in flux.
      - Note several places for further code review or refactoring with
        "XXX" comments.
      - Add Vim modeline.
    
    grog now passes all its tests and correctly infers arguments for all
    in-tree groff documents (except for a known, and already documented in
    grog(1), false positive detection of soelim in soelim(1)).  This
    refactor also obviates or resolves several outstanding Savannah tickets.
    
    Fixes <https://savannah.gnu.org/bugs/?44707> by obviating it; grog no
    longer cares about file name extensions on man pages (or any other
    input).
    
    Fixes <https://savannah.gnu.org/bugs/?55302>; same.  The quality of
    diagnostic messages has been improved as well.
    
    Fixes <https://savannah.gnu.org/bugs/?59753>; same.
    
    Fixes <https://savannah.gnu.org/bugs/?59664>.  The attached patch was a
    less aggressive refactor of &do_line and %Groff.  Its author made the
    following claim for it: "With this patch, all 'man', 'me', 'mom, and
    'ms' files in the repository are correctly identified.  The only example
    of a 'mm'-file is "letter.mm", which is not recognized correctly."  As
    noted above, the present refactor achieves correct recognition of all of
    the files including letter.mm.
---
 ChangeLog              | 120 ++++++++++
 src/utils/grog/grog.pl | 588 ++++++++++++++++++++-----------------------------
 2 files changed, 357 insertions(+), 351 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 48c51da..95633c5 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,125 @@
 2021-06-28  G. Branden Robinson <g.branden.robinson@gmail.com>
 
+       [grog]: Heavily refactor.
+
+       * src/utils/grog/grog.pl:
+         - Drop import of unused module `Data::Dumper`.
+         - Drop unused scalars `Sp` and `correct_tmac`.
+         - Simplify determination of version number.  Drop hash `at_at`
+           which only stored one key, `GROFF_VERSION`.  Initialize
+           scalar `groff_version` to "DEVELOPMENT".  Rename scalar
+           `before_make` to `in_source_tree` and initialize to zero.
+           Update `groff_version` with Automake-determined version
+           variable if it is defined (i.e., grog is not running in an
+           unbuilt source tree).
+         - Drop unused `Mparams` list.  Replace it with new list
+           `requested_package`, which stores the arguments to any grog
+           `-m` options specified by the user.
+         - Rename many objects so that I, and others, can better
+           comprehend their purpose, and for consistent letter casing.
+           . @Command -> @command
+           . @devices -> @device
+           . $Prog -> $program_name
+           . %macros -> %user_macro
+           . $have_any_valid_args -> $have_any_valid_arguments
+           . &handle_args -> &process_arguments
+           . &handle_whole_files -> &process_input
+           . @preprograms -> @preprocessor
+           . &make_groff_device -> &infer_device
+           . &make_groff_preproc -> &infer_preprocessors
+           . &make_groff_tmac_man_ms -> &infer_man_or_ms_package
+           . &make_groff_line_rest -> &construct_command
+         - Drop many unused keys in `Groff` hash.
+         - Add new lists, `macro_ms`, `macro_man`, and
+           `macro_man_or_ms` to support new scoring technique to
+           disambiguate input documents between these two packages.
+         - Append the foregoing 3 lists to new list `standard_macro`,
+           and add these as keys to the `Groff` hash.
+         - Add new list `main_package` to keep track of full-service
+           package names.
+         - Add new scalars `man_score`, `ms_score`, and
+           `inside_tbl_table` to aid disambiguation of .TH macro calls
+           and the many macro names shared between man(7) and ms(7).
+         (process_arguments): Strip '-m' off of argument before storing
+         the remainder in `@requested_package`.
+         (do_line): Detect .TH macro call even if white space occurs
+         between the control character and the macro name.
+         (do_line): Inflate `$man_score` by 100 if .TH is the first
+         macro call seen in a document.
+         (do_line): Fix bug; clear `$before_first_command` in correct
+         scope--after any macro call, not just if we saw a .TH as the
+         first macro call.
+         (do_line): Set `$inside_tbl_table` when we see a .TS call.
+         (do_line): Clear `$inside_tbl_table` when we see a .TE call.
+         Also increment `$Groff{'tbl')' again, increasing the "score"
+         of tbl(1) usage evidence.
+         (do_line): Drop a lot of code that manually increments %Groff
+         keys corresponding to man and ms macros.  This is now done
+         differently and elsewhere.
+         (do_line): Drop "P" from list of characteristic mm(7) macros.
+         (do_line): Simplify matching of mom(7) macros (match $command,
+         not $line).  Extend list of characteristic mom(7) macros.
+         (do_line): Increment $Groff{$key} if $key is in
+         @standard_macro.
+         (infer_man_or_ms_package): Rewrite.  Compute a score for each
+         package by counting occurrences of their characteristic
+         macros.  If both have a score of zero, assume that the input
+         is a raw roff document.  If the scores are equal
+         {doc/webpage.ms, startlingly, comes within 1 point of a tied
+         score}, infer ms(7) if 'TH' was never called, and if it was,
+         issue a diagnostic advising user to supply a disambiguating
+         `-m` option.  Otherwise, the scores are unequal, and infer the
+         package of the winner. Set scalar `inferred_main_package`
+         instead of pushing `-m` options onto `@m`.
+         (infer_macro_packages): Set scalar `inferred_main_package`
+         instead of pushing `-m` options onto `@m`.  Explicitly return
+         0 if we fall off the end of the function.
+         (construct_command): Rewrite handling of -m options.  Add new
+         list `msupp` to store supplementary (non-main) macro package
+         arguments.  If a full-service package was explicitly
+         requested, it had better not clash with what we inferred.  If
+         it does, explicitly unset $inferred_main_package so that the
+         -m arguments are placed in the same order that the user gave
+         them; caveat dictator.  If `--run` option was given, just
+         print the command; don't preface it with __FILE__ and __LINE__
+         noise.
+         - Remove comments documenting shared variables used by
+           subroutines.  These are far from useless but too tedious to
+           keep up to date while the code is in flux.
+         - Note several places for further code review or refactoring
+           with "XXX" comments.
+         - Add Vim modeline.
+
+       grog now passes all its tests and correctly infers arguments for
+       all in-tree groff documents (except for a known, and already
+       documented in grog(1), false positive detection of soelim in
+       soelim(1)).  This refactor also obviates or resolves several
+       outstanding Savannah tickets.
+
+       Fixes <https://savannah.gnu.org/bugs/?44707> by obviating it;
+       grog no longer cares about file name extensions on man pages (or
+       any other input).
+
+       Fixes <https://savannah.gnu.org/bugs/?55302>; same.  The quality
+       of diagnostic messages has been improved as well.
+
+       Fixes <https://savannah.gnu.org/bugs/?59753>; same.
+
+       Fixes <https://savannah.gnu.org/bugs/?59664>.  The attached
+       patch was a less aggressive refactor of &do_line and %Groff.
+       Its author made the following claim for it: "With this patch,
+       all 'man', 'me', 'mom, and 'ms' files in the repository are
+       correctly identified.  The only example of a 'mm'-file is
+       "letter.mm", which is not recognized correctly."  As noted
+       above, the present refactor achieves correct recognition of all
+       of the files including letter.mm.
+
+       Fixes <https://savannah.gnu.org/bugs/?60833>.
+
+       Fixes <https://savannah.gnu.org/bugs/?60834>.
+
+2021-06-28  G. Branden Robinson <g.branden.robinson@gmail.com>
+
        * src/utils/grog/tests/smoke-test.sh: Perform whole-line
        matches.  Apply DRY principle to expected output.  In
        anticipation of pending changes to grog.pl, uncomment and add
diff --git a/src/utils/grog/grog.pl b/src/utils/grog/grog.pl
index f2a1e9a..25e77bb 100644
--- a/src/utils/grog/grog.pl
+++ b/src/utils/grog/grog.pl
@@ -3,11 +3,12 @@
 # Inspired by doctype script in Kernighan & Pike, Unix Programming
 # Environment, pp 306-8.
 
-# Copyright (C) 1993-2020 Free Software Foundation, Inc.
+# Copyright (C) 1993-2021 Free Software Foundation, Inc.
 # Written by James Clark.
 # Rewritten with Perl by Bernd Warken <groff-bernd.warken-72@web.de>.
 # The macros for identifying the devices were taken from Ralph
 # Corderoy's 'grog.sh' of 2006.
+# Hacked up by G. Branden Robinson, 2021.
 
 # This file is part of 'grog', which is part of 'groff'.
 
@@ -32,39 +33,61 @@ use strict;
 
 use File::Spec;
 
-# printing of hashes: my %hash = ...; print Dumper(\%hash);
-use Data::Dumper;
-
 # for running shell based programs within Perl; use `` instead of
 # use IPC::System::Simple qw(capture capturex run runx system systemx);
 
 $\ = "\n";
 
-# my $Sp = "[\\s\\n]";
-# my $Sp = qr([\s\n]);
-# my $Sp = '' if $arg eq '-C';
-my $Sp = '';
+my $groff_version = 'DEVELOPMENT';
 
 # from 'src/roff/groff/groff.cpp' near 'getopt_long'
 my $groff_opts =
   'abcCd:D:eEf:F:gGhiI:jJkK:lL:m:M:n:No:pP:r:RsStT:UvVw:W:XzZ';
 
-my @Command = ();              # stores the final output
-my @Mparams = ();              # stores the options '-m*'
-my @devices = ();              # stores -T
+my @command = ();              # the constructed groff command
+my @device = ();               # stores -T
+my @requested_package = ();    # arguments to '-m' grog options
 
 my $do_run = 0;                        # run generated 'groff' command
 my $pdf_with_ligatures = 0;    # '-P-y -PU' for 'pdf' device
-my $with_warnings = 0;
+my $with_warnings = 0;         # XXX: more like "hints"  --GBR
 
-my $Prog = $0;
+my $program_name = $0;
 {
-  my ($v, $d, $f) = File::Spec->splitpath($Prog);
-  $Prog = $f;
+  my ($v, $d, $f) = File::Spec->splitpath($program_name);
+  $program_name = $f;
 }
 
-
-my %macros;
+my @macro_ms = ('RP', 'TL', 'AU', 'AI', 'DA', 'ND', 'AB', 'AE',
+               'QP', 'QS', 'QE', 'XP',
+               'NH',
+               'R',
+               'CW',
+               'BX', 'UL', 'LG', 'NL',
+               'KS', 'KF', 'KE', 'B1', 'B2',
+               'DS', 'DE', 'LD', 'ID', 'BD', 'CD', 'RD',
+               'FS', 'FE',
+               'OH', 'OF', 'EH', 'EF', 'P1',
+               'TA', '1C', '2C', 'MC',
+               'XS', 'XE', 'XA', 'TC', 'PX',
+               'IX', 'SG');
+
+my @macro_man = ('BR', 'IB', 'IR', 'RB', 'RI', 'P', 'TH', 'TP', 'SS',
+                'HP', 'PD',
+                'AT', 'UC',
+                'SB',
+                'EE', 'EX',
+                'OP',
+                'MT', 'ME', 'SY', 'YS', 'TQ', 'UR', 'UE');
+
+my @macro_man_or_ms = ('B', 'I', 'BI',
+                      'DT',
+                      'RS', 'RE',
+                      'SH',
+                      'SM',
+                      'IP', 'LP', 'PP');
+
+my %user_macro;
 my %Groff =
   (
    # preprocessors
@@ -88,41 +111,6 @@ my %Groff =
    'soelim' => 0,
    'tbl' => 0,
 
-   # tmacs
-#   'man' => 0,
-#   'mandoc' => 0,
-#   'mdoc' => 0,
-#   'mdoc_old' => 0,
-#   'me' => 0,
-#   'mm' => 0,
-#   'mom' => 0,
-#   'ms' => 0,
-
-   # requests
-   'AB' => 0,          # ms
-   'AE' => 0,          # ms
-   'AI' => 0,          # ms
-   'AU' => 0,          # ms
-   'NH' => 0,          # ms
-   'TH_later' => 0,    # TH not 1st command is ms
-   'TL' => 0,          # ms
-   'UL' => 0,          # ms
-   'XP' => 0,          # ms
-
-   'IP' => 0,          # man and ms
-   'LP' => 0,          # man and ms
-   'P' => 0,           # man and ms
-   'PP' => 0,          # man and ms
-   'SH' => 0,          # man and ms
-
-   'OP' => 0,          # man
-   'SS' => 0,          # man
-   'SY' => 0,          # man
-   'TH_first' => 0,    # TH as 1st command is man
-   'TP' => 0,          # man
-   'UR' => 0,          # man
-   'YS' => 0,          # man
-
    # for mdoc and mdoc-old
    # .Oo and .Oc for modern mdoc, only .Oo for mdoc-old
    'Oo' => 0,          # mdoc and mdoc-old
@@ -130,6 +118,11 @@ my %Groff =
    'Dd' => 0,          # mdoc
   ); # end of %Groff
 
+my @standard_macro = ();
+push(@standard_macro, @macro_ms, @macro_man, @macro_man_or_ms);
+for my $key (@standard_macro) {
+  $Groff{$key} = 0;
+}
 
 # for first line check
 my %preprocs_tmacs =
@@ -161,37 +154,42 @@ my %preprocs_tmacs =
 
 my @filespec;
 
+my @main_package = ('an', 'doc', 'doc-old', 'e', 'm', 'om', 's');
 my $inferred_main_package = '';
+# man(7) and ms(7) use many of the same macro names; do extra checking.
+my $man_score = 0;
+my $ms_score = 0;
+# .TH is both a man(7) macro and often used with tbl(1).
+my $inside_tbl_table = 0;
 
 my $had_inference_problem = 0;
 my $had_processing_problem = 0;
-my $have_any_valid_args = 0;
+my $have_any_valid_arguments = 0;
 
 
 sub fail {
   my $text = shift;
-  print STDERR "$Prog: error: $text";
+  print STDERR "$program_name: error: $text";
   $had_processing_problem = 1;
 }
 
 
 sub warn {
   my $text = shift;
-  print STDERR "$Prog: warning: $text";
+  print STDERR "$program_name: warning: $text";
 }
 
 
-sub handle_args {
+sub process_arguments {
   my $no_more_options = 0;
   my $was_minus = 0;
   my $was_T = 0;
   my $optarg = 0;
-  # globals: @filespec, @Command, @devices, @Mparams
 
   foreach my $arg (@ARGV) {
 
     if ( $optarg ) {
-      push @Command, $arg;
+      push @command, $arg;
       $optarg = 0;
       next;
     }
@@ -202,7 +200,7 @@ sub handle_args {
     }
 
     if ( $was_T ) {
-      push @devices, $arg;
+      push @device, $arg;
       $was_T = 0;
       next;
     }
@@ -227,6 +225,7 @@ sub handle_args {
       next;
     }
 
+    # XXX: Stop matching these sloppily.  --GBR
     &version() if $arg =~ /^--?v/;     # --version, with exit
     &help() if $arg  =~ /--?h/;                # --help, with exit
 
@@ -247,7 +246,9 @@ sub handle_args {
     }
 
     if ($arg =~ /^-m/) {
-      push @Mparams, $arg;
+      my $package = $arg;
+      $package =~ s/-m//;
+      push @requested_package, $package;
       next;
     }
 
@@ -257,7 +258,7 @@ sub handle_args {
     }
 
     if ($arg =~ s/^-T(\w+)$/$1/) {
-      push @devices, $1;
+      push @device, $1;
       next;
     }
 
@@ -267,15 +268,15 @@ sub handle_args {
       my $others = $2;
       if ( $groff_opts =~ /$opt_char_with_arg/ ) {     # groff optarg
        if ( $others ) {        # optarg is here
-         push @Command, '-' . $opt_char;
-         push @Command, '-' . $others;
+         push @command, '-' . $opt_char;
+         push @command, '-' . $others;
          next;
        }
        # next arg is optarg
        $optarg = 1;
        next;
       } elsif ( $groff_opts =~ /$opt_char/ ) { # groff no optarg
-       push @Command, '-' . $opt_char;
+       push @command, '-' . $opt_char;
        if ( $others ) {        # $others is now an opt collection
          $arg = '-' . $others;
          redo;
@@ -284,24 +285,22 @@ sub handle_args {
        next;
       } else {         # not a groff opt
        &warn("unrecognized groff option: $arg");
-       push(@Command, $arg);
+       push(@command, $arg);
        next;
       }
     }
   }
   @filespec = ('-') unless (@filespec);
-} # handle_args()
-
+} # process_arguments()
 
-sub handle_whole_files {
-  # globals: @filespec
 
+sub process_input {
   foreach my $file ( @filespec ) {
     unless ( open(FILE, $file eq "-" ? $file : "< $file") ) {
       &fail("cannot open '$file': $!");
       next;
     }
-    $have_any_valid_args = 1;
+    $have_any_valid_arguments = 1;
     my $line = <FILE>; # get single line
 
     unless ( defined($line) ) {
@@ -326,7 +325,7 @@ sub handle_whole_files {
     }
     close(FILE);
   } # end foreach
-} # handle_whole_files()
+} # process_input()
 
 
 # As documented for the 'man' program, the first line can be
@@ -336,11 +335,17 @@ sub handle_whole_files {
 # - a word using the following characters can be appended: 'egGjJpRst'.
 #     Each of these characters means an option for the generated
 #     'groff' command line, e.g. '-t'.
+#
+# XXX: The above is not accurate; man(7)'s preprocessor encoding
+# convention does not map perfectly to groff(1) command-line options.
+# The letter for 'refer' is 'r', not 'R', and there is also the
+# historical legacy of vgrind ('v') to consider.  In any case, why
+# should that comment line override what we can infer from actual macro
+# calls within the document?  Contemplate getting rid of this subroutine
+# and %preprocs_tmacs altogether.  --GBR
 sub do_first_line {
   my ( $line, $file ) = @_;
 
-  # globals: %preprocs_tmacs
-
   # For a leading groff options line use only [egGjJpRst]
   if  ( $line =~ /^[.']\\"[\segGjJpRst]+&/ ) {
     # this is a groff options leading line
@@ -430,13 +435,11 @@ sub do_line {
   return if ( $line =~ /^\.\.$/ );     # ignore ..
 
   if ( $before_first_command ) { # so far without 1st command
-    if ( $line =~ /^\.TH/ ) {
-      # check if .TH is 1st command for man
-      $Groff{'TH_first'} = 1 if ( $line =~ /^\.\s*TH/ );
-    }
-    if ( $line =~ /^\./ ) {
-      $before_first_command = 0;
+    if ( $line =~ /^\.\s*TH/ ) {
+      # .TH as the first macro call in a document screams man(7).
+      $man_score += 100;
     }
+    $before_first_command = 0;
   }
 
   # split command
@@ -461,20 +464,31 @@ sub do_line {
   }
 
   ######################################################################
-  # macros
+  # user-defined macros
 
+  # XXX: Macros can also be defined with .am, .am1.  Handle that.  And
+  # with .dei{,1}, ami{,1} as well, but supporting that would be a heavy
+  # lift for the benefit of users that probably don't require grog's
+  # help.  --GBR
   if ( $line =~ /^\.de1?\W?/ ) {
-    # this line is a macro definition, add it to %macros
-    my $macro = $line;
-    $macro =~ s/^\.de1?\s+(\w+)\W*/.$1/;
-    return if ( exists $macros{$macro} );
-    $macros{$macro} = 1;
+    # this line is a macro definition, add it to %user_macro
+    my $macro_name = $line;
+    # Strip off any end macro.
+    $macro_name =~ s/^\.de1?\s+(\w+)\W*/.$1/;
+    # XXX: If the macro name shadows a standard macro name, maybe we
+    # should delete the latter from our lists and hashes.  This might
+    # depend on whether the document is trying to remain compatibile
+    # with an existing interface, or simply colliding with names they
+    # don't care about (consider a raw roff document that defines 'PP').
+    # --GBR
+    return if ( exists $user_macro{$macro_name} );
+    $user_macro{$macro_name} = 1;
     return;
   }
 
 
   # if line command is a defined macro, just ignore this line
-  return if ( exists $macros{$command} );
+  return if ( exists $user_macro{$command} );
 
 
   ######################################################################
@@ -541,11 +555,17 @@ sub do_line {
   }
   if ( $command =~ /^TS$/ ) {
     $Groff{'tbl'}++;           # for tbl
+    $inside_tbl_table = 1;
+    return;
+  }
+  if ( $command =~ /^TE$/ ) {
+    $Groff{'tbl'}++;           # for tbl
+    $inside_tbl_table = 0;
     return;
   }
   if ( $command =~ /^TH$/ ) {
-    unless ( $Groff{'TH_first'} ) {
-      $Groff{'TH_later'}++;            # for tbl
+    if ($inside_tbl_table) {
+      $Groff{'tbl'}++;         # for tbl
     }
     return;
   }
@@ -583,102 +603,10 @@ sub do_line {
   ##########
   # old mdoc
   if ( $command =~ /^(Tp|Dp|De|Cx|Cl)$/ ) {
-    $Groff{'mdoc_old'}++;      # true for old mdoc
-    return;
-  }
-
-
-  ##########
-  # for ms
-
-  if ( $command =~ /^AB$/ ) {
-    $Groff{'AB'}++;            # for ms
-    return;
-  }
-  if ( $command =~ /^AE$/ ) {
-    $Groff{'AE'}++;            # for ms
-    return;
-  }
-  if ( $command =~ /^AI$/ ) {
-    $Groff{'AI'}++;            # for ms
-    return;
-  }
-  if ( $command =~ /^AU$/ ) {
-    $Groff{'AU'}++;            # for ms
-    return;
-  }
-  if ( $command =~ /^NH$/ ) {
-    $Groff{'NH'}++;            # for ms
-    return;
-  }
-  if ( $command =~ /^TL$/ ) {
-    $Groff{'TL'}++;            # for ms
-    return;
-  }
-  if ( $command =~ /^XP$/ ) {
-    $Groff{'XP'}++;            # for ms
-    return;
-  }
-
-
-  ##########
-  # for man and ms
-
-  if ( $command =~ /^IP$/ ) {
-    $Groff{'IP'}++;            # for man and ms
-    return;
-  }
-  if ( $command =~ /^LP$/ ) {
-    $Groff{'LP'}++;            # for man and ms
-    return;
-  }
-  if ( $command =~ /^P$/ ) {
-    $Groff{'P'}++;             # for man and ms
-    return;
-  }
-  if ( $command =~ /^PP$/ ) {
-    $Groff{'PP'}++;            # for man and ms
-    return;
-  }
-  if ( $command =~ /^SH$/ ) {
-    $Groff{'SH'}++;            # for man and ms
-    return;
-  }
-  if ( $command =~ /^UL$/ ) {
-    $Groff{'UL'}++;            # for man and ms
+    $Groff{'mdoc-old'}++;      # true for old mdoc
     return;
   }
 
-
-  ##########
-  # for man only
-
-  if ( $command =~ /^OP$/ ) {  # for man
-    $Groff{'OP'}++;
-    return;
-  }
-  if ( $command =~ /^SS$/ ) {  # for man
-    $Groff{'SS'}++;
-    return;
-  }
-  if ( $command =~ /^SY$/ ) {  # for man
-    $Groff{'SY'}++;
-    return;
-  }
-  if ( $command =~ /^TP$/ ) {  # for man
-    $Groff{'TP'}++;
-    return;
-  }
-  if ( $command =~ /^UR$/ ) {
-    $Groff{'UR'}++;            # for man
-    return;
-  }
-  if ( $command =~ /^YS$/ ) {  # for man
-   $Groff{'YS'}++;
-    return;
-  }
-
-
   ##########
   # me
 
@@ -700,7 +628,6 @@ sub do_line {
                      LO|
                      LT|
                      NCOL|
-                     P\$|
                      PH|
                      SA
                    )$/x ) {
@@ -720,40 +647,59 @@ sub do_line {
   ##########
   # mom
 
-  if ( $line =~ /^\.(
+  if ( $command =~ /^(
                   ALD|
+                  AUTHOR|
+                  CHAPTER|
+                  CHAPTER_TITLE|
+                  COLLATE|
+                  DOC_COVER|
+                  DOCHEADER|
+                  DOCTITLE|
                   DOCTYPE|
                   FAMILY|
                   FT|
                   FAM|
+                  LEFT|
                   LL|
                   LS|
                   NEWPAGE|
+                  NO_TOC_ENTRY|
                   PAGE|
+                  PAGENUMBER|
+                  PAGINATION|
                   PAPER|
                   PRINTSTYLE|
                   PT_SIZE|
-                  T_MARGIN
+                  SP|
+                  START|
+                  T_MARGIN|
+                  TITLE|
+                  TOC|
+                  TOC_AFTER_HERE
                 )$/x ) {
     $Groff{'mom'}++;           # for mom
     return;
   }
 
+  for my $key (@standard_macro) {
+    $Groff{$key}++ if ($command eq $key);
+  }
 } # do_line()
 
-
 my @m = ();
-my @preprograms = ();
-my $correct_tmac = '';
-
-sub make_groff_device {
-  # globals: @devices
+my @supplemental_package = ();
+my @preprocessor = ();
 
+sub infer_device {
   # default device is 'ps' when without '-T'
+  # XXX: No, that depends on how the 'configure' script was called (but
+  # most people don't seem to change it).  Also we should check
+  # GROFF_TYPESETTER.  --GBR
   my $device;
-  push @devices, 'ps' unless ( @devices );
+  push @device, 'ps' unless ( @device );
 
-  for my $d ( @devices ) {
+  for my $d (@device) {
     if ( $d =~ /^(             # suitable devices
                  dvi|
                  html|
@@ -774,15 +720,15 @@ sub make_groff_device {
 
 
     if ( $device ) {
-      push @Command, '-T';
-      push @Command, $device;
+      push @command, '-T';
+      push @command, $device;
     }
   }
 
   if ( $device eq 'pdf' ) {
     if ( $pdf_with_ligatures ) {       # with --ligature argument
-      push( @Command, '-P-y' );
-      push( @Command, '-PU' );
+      push( @command, '-P-y' );
+      push( @command, '-PU' );
     } else {   # no --ligature argument
       if ( $with_warnings ) {
        print STDERR <<EOF;
@@ -795,21 +741,19 @@ EOF
       }        # end of warning
     }  # end of ligature
   }    # end of pdf device
-} # make_groff_device()
+} # infer_device()
 
 
-sub make_groff_preproc {
-  # globals: %Groff, @preprograms, @Command
-
+sub infer_preprocessors {
   # preprocessors without 'groff' option
   if ( $Groff{'lilypond'} ) {
-    push @preprograms, 'glilypond';
+    push @preprocessor, 'glilypond';
   }
   if ( $Groff{'gperl'} ) {
-    push @preprograms, 'gperl';
+    push @preprocessor, 'gperl';
   }
   if ( $Groff{'gpinyin'} ) {
-    push @preprograms, 'gpinyin';
+    push @preprocessor, 'gpinyin';
   }
 
   # preprocessors with 'groff' option
@@ -825,211 +769,159 @@ sub make_groff_preproc {
   if ( $Groff{'chem'} || $Groff{'eqn'} ||  $Groff{'gideal'} ||
        $Groff{'grap'} || $Groff{'grn'} || $Groff{'pic'} ||
        $Groff{'refer'} || $Groff{'tbl'} ) {
-    push(@Command, '-s') if $Groff{'soelim'};
+    push(@command, '-s') if $Groff{'soelim'};
 
-    push(@Command, '-R') if $Groff{'refer'};
+    push(@command, '-R') if $Groff{'refer'};
 
-    push(@Command, '-t') if $Groff{'tbl'};     # tbl before eqn
-    push(@Command, '-e') if $Groff{'eqn'};
+    push(@command, '-t') if $Groff{'tbl'};     # tbl before eqn
+    push(@command, '-e') if $Groff{'eqn'};
 
-    push(@Command, '-j') if $Groff{'chem'};    # chem produces pic code
-    push(@Command, '-J') if $Groff{'gideal'};  # gideal produces pic
-    push(@Command, '-G') if $Groff{'grap'};
-    push(@Command, '-g') if $Groff{'grn'};     # gremlin files for -me
-    push(@Command, '-p') if $Groff{'pic'};
+    push(@command, '-j') if $Groff{'chem'};    # chem produces pic code
+    push(@command, '-J') if $Groff{'gideal'};  # gideal produces pic
+    push(@command, '-G') if $Groff{'grap'};
+    push(@command, '-g') if $Groff{'grn'};     # gremlin files for -me
+    push(@command, '-p') if $Groff{'pic'};
 
   }
-} # make_groff_preproc()
-
+} # infer_preprocessors()
 
-sub make_groff_tmac_man_ms {
-  # globals: @filespec, $inferred_main_package, %Groff
 
-  # 'man' requests, not from 'ms'
-  if ( $Groff{'SS'} || $Groff{'SY'} || $Groff{'OP'} ||
-       $Groff{'TH_first'} || $Groff{'TP'} || $Groff{'UR'} ) {
-    $Groff{'man'} = 1;
-    push(@m, '-man');
-
-    $inferred_main_package = 'man' unless ( $inferred_main_package );
-    &warn("man macro calls found, but file name extension was '"
-         . $inferred_main_package . "'")
-       unless ( $inferred_main_package eq 'man' );
-    $inferred_main_package = 'man';
-    return 1;  # true
+# Return true (1) if a main/full-service/exclusive package is inferred.
+sub infer_man_or_ms_package {
+  # Compute a score for each package by counting occurrences of their
+  # characteristic macros.
+  foreach my $key (@macro_man_or_ms) {
+    $man_score += $Groff{$key};
+    $ms_score += $Groff{$key};
   }
 
-  # 'ms' requests, not from 'man'
-  if (
-      $Groff{'1C'} || $Groff{'2C'} ||
-      $Groff{'AB'} || $Groff{'AE'} || $Groff{'AI'} || $Groff{'AU'} ||
-      $Groff{'BX'} || $Groff{'CD'} || $Groff{'DA'} || $Groff{'DE'} ||
-      $Groff{'DS'} || $Groff{'ID'} || $Groff{'LD'} || $Groff{'NH'} ||
-      $Groff{'TH_later'} ||
-      $Groff{'TL'} || $Groff{'UL'} || $Groff{'XP'}
-     ) {
-    $Groff{'ms'} = 1;
-    push(@m, '-ms');
-
-    $inferred_main_package = 'ms' unless ( $inferred_main_package );
-    &warn("ms macro calls found, but file name extension was '"
-         . $inferred_main_package . "'")
-       unless ( $inferred_main_package eq 'ms' );
-    $inferred_main_package = 'ms';
-    return 1;  # true
+  foreach my $key (@macro_man) {
+    $man_score += $Groff{$key};
   }
 
+  foreach my $key (@macro_ms) {
+    $ms_score += $Groff{$key};
+  }
 
-  # both 'man' and 'ms' requests
-  if ( $Groff{'P'} || $Groff{'IP'}  ||
-       $Groff{'LP'} || $Groff{'PP'} || $Groff{'SH'} ) {
-    if ( $inferred_main_package eq 'man' ) {
-      $Groff{'man'} = 1;
-      push(@m, '-man');
-      return 1;        # true
-    } elsif ( $inferred_main_package eq 'ms' ) {
-      $Groff{'ms'} = 1;
-      push(@m, '-ms');
-      return 1;        # true
+  if (!$ms_score && !$man_score) {
+    # The input may be a "raw" roff document; this is not a problem.
+    # Do nothing special.
+  } elsif ($ms_score == $man_score) {
+    # If there was no TH call, it's not a (valid) man(7) document.
+    if (!$Groff{'TH'}) {
+      $inferred_main_package = 's';
+    } else {
+      &warn("document ambiguous; disambiguate with -man or -ms option");
+      $had_inference_problem = 1;
     }
     return 0;
+  } elsif ($ms_score > $man_score) {
+    $inferred_main_package = 's';
+  } else {
+    $inferred_main_package = 'an';
   }
-} # make_groff_tmac_man_ms()
 
+  return 1;
+} # infer_man_or_ms_package()
 
-sub make_groff_tmac_others {
-  # globals: @filespec, $inferred_main_package, %Groff
 
+# Return true (1) if a main/full-service/exclusive package is inferred.
+sub infer_macro_packages {
   # mdoc
   if ( ( $Groff{'Oo'} && $Groff{'Oc'} ) || $Groff{'Dd'} ) {
     $Groff{'Oc'} = 0;
     $Groff{'Oo'} = 0;
-    push(@m, '-mdoc');
+    $inferred_main_package = 'doc';
     return 1;  # true
   }
-  if ( $Groff{'mdoc_old'} || $Groff{'Oo'} ) {
-    push(@m, '-mdoc_old');
+  if ( $Groff{'mdoc-old'} || $Groff{'Oo'} ) {
+    $inferred_main_package = 'doc';
     return 1;  # true
   }
 
   # me
   if ( $Groff{'me'} ) {
-    push(@m, '-me');
+    $inferred_main_package = 'e';
     return 1;  # true
   }
 
   # mm and mmse
   if ( $Groff{'mm'} ) {
-    push(@m, '-mm');
+    $inferred_main_package = 'm';
     return 1;  # true
   }
+  # XXX: Is this necessary?  mmse .mso's mm, but we probably already
+  # detected mm macro calls anyway.  --GBR
   if ( $Groff{'mmse'} ) {      # Swedish mm
-    push(@m, '-mmse');
     return 1;  # true
   }
 
   # mom
   if ( $Groff{'mom'} ) {
-    push(@m, '-mom');
+    $inferred_main_package = 'om';
     return 1;  # true
   }
-} # make_groff_tmac_others()
+
+  return 0;
+} # infer_macro_packages()
 
 
-sub make_groff_line_rest {
+sub construct_command {
   my $file_args_included;      # file args now only at 1st preproc
-  unshift @Command, 'groff';
-  if ( @preprograms ) {
+  unshift @command, 'groff';
+  if (@preprocessor) {
     my @progs;
-    $progs[0] = shift @preprograms;
+    $progs[0] = shift @preprocessor;
     push(@progs, @filespec);
-    for ( @preprograms ) {
+    for (@preprocessor) {
       push @progs, '|';
       push @progs, $_;
     }
     push @progs, '|';
-    unshift @Command, @progs;
+    unshift @command, @progs;
     $file_args_included = 1;
   } else {
     $file_args_included = 0;
   }
 
-  foreach (@Command) {
+  foreach (@command) {
     next unless /\s/;
     # when one argument has several words, use accents
     $_ = "'" . $_ . "'";
   }
 
+  my @msupp = ();
 
-  ##########
-  # -m arguments
-  my $nr_m_guessed = scalar @m;
-  if ( $nr_m_guessed > 1 ) {
-    print STDERR __FILE__ . ' ' .  __LINE__ . ': ' .
-      'argument for -m found: ' . @m;
+  # If a full-service package was explicitly requested, it had better
+  # not clash with what we inferred.  If it does, explicitly unset
+  # $inferred_main_package so that the -m arguments are placed in the
+  # same order that the user gave them; caveat dictator.
+  for my $pkg (@requested_package) {
+    if (grep(/$pkg/, @main_package)
+       && ($pkg ne $inferred_main_package)) {
+      &warn("overriding inferred package 'm$inferred_main_package'"
+           . " with requested package 'm$pkg'");
+      $inferred_main_package = '';
+    }
+    push @msupp, '-m' . $pkg;
   }
 
+  push @m, '-m' . $inferred_main_package if ($inferred_main_package);
 
-  my $nr_m_args = scalar @Mparams;     # m-arguments for grog
-  my $last_m_arg = ''; # last provided -m option
-  if ( $nr_m_args > 1 ) {
-    # take the last given -m argument of grog call,
-    # ignore other -m arguments and the found ones
-    $last_m_arg = $Mparams[-1];        # take the last -m argument
-    print STDERR __FILE__ . ' ' .  __LINE__ . ': ' .
-      $Prog . ": more than 1 '-m' argument: @Mparams";
-    print STDERR __FILE__ . ' ' .  __LINE__ . ': ' .
-      'We take the last one: ' . $last_m_arg;
-  } elsif ( $nr_m_args == 1 ) {
-    $last_m_arg = $Mparams[0];
-  }
-
-  my $final_m = '';
-  if ( $last_m_arg ) {
-    my $is_equal = 0;
-    for ( @m ) {
-      if ( $_ eq $last_m_arg ) {
-       $is_equal = 1;
-       last;
-      }
-      next;
-    }  # end for @m
-    if ( $is_equal ) {
-      $final_m = $last_m_arg;
-    } else {
-      print STDERR __FILE__ . ' ' .  __LINE__ . ': ' .
-       'Provided -m argument ' . $last_m_arg .
-         ' differs from guessed -m args: ' . @m;
-      print STDERR __FILE__ . ' ' .  __LINE__ . ': ' .
-       'The argument is taken.';
-      $final_m = $last_m_arg;
-    }
-  } else {     # no -m arg provided
-    if ( $nr_m_guessed > 1 ) {
-      print STDERR __FILE__ . ' ' .  __LINE__ . ': ' .
-       'More than 1 -m arguments were guessed: ' . @m;
-      print STDERR __FILE__ . ' ' .  __LINE__ . ': ' . 'Guessing stopped.';
-      $had_inference_problem = 1;
-    } elsif ( $nr_m_guessed == 1 ) {
-      $final_m = $m[0];
-    } else {
-      # no -m provided or guessed
-    }
-  }
-  push @Command, $final_m if ( $final_m );
+  push @command, @m, @msupp;
 
-  push(@Command, @filespec) unless ( $file_args_included );
+  push(@command, @filespec) unless ( $file_args_included );
 
   #########
   # execute the 'groff' command here with option '--run'
   if ( $do_run ) { # with --run
-    print STDERR __FILE__ . ' ' .  __LINE__ . ': ' . "@Command";
-    my $cmd = join ' ', @Command;
+    print STDERR "@command";
+    my $cmd = join ' ', @command;
     system($cmd);
   } else {
-    print "@Command";
+    print "@command";
   }
-} # make_groff_line_rest()
+} # construct_command()
 
 
 sub help {
@@ -1062,36 +954,29 @@ EOF
 
 
 sub version {
-  our %at_at;
-  print "$Prog (groff) " . $at_at{'GROFF_VERSION'};
+  print "$program_name (groff) $groff_version";
   exit 0;
 } # version()
 
 
 # initialize
 
-my $before_make;       # script before run of 'make'
+my $in_source_tree = 0;
 {
   my $at = '@';
-  $before_make = 1 if '@VERSION@' eq "${at}VERSION${at}";
+  $in_source_tree = 1 if '@VERSION@' eq "${at}VERSION${at}";
 }
 
-our %at_at;
-
-if ($before_make) {
-  $at_at{'GROFF_VERSION'} = "DEVELOPMENT";
-} else {
-  $at_at{'GROFF_VERSION'} = '@VERSION@';
-}
+$groff_version = '@VERSION@' unless ($in_source_tree);
 
-&handle_args();
-&handle_whole_files();
+&process_arguments();
+&process_input();
 
-if ($have_any_valid_args) {
-  &make_groff_device();
-  &make_groff_preproc();
-  &make_groff_tmac_man_ms() || &make_groff_tmac_others();
-  &make_groff_line_rest();
+if ($have_any_valid_arguments) {
+  &infer_device();
+  &infer_preprocessors();
+  &infer_macro_packages() || &infer_man_or_ms_package();
+  &construct_command();
 }
 
 exit 2 if ($had_processing_problem);
@@ -1102,3 +987,4 @@ exit 0;
 # Local Variables:
 # mode: CPerl
 # End:
+# vim: set autoindent textwidth=72:



reply via email to

[Prev in Thread] Current Thread [Next in Thread]