[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Let's Play: Use the Source, Luke! (was: .ie as target of .if)
From: |
John Gardner |
Subject: |
Re: Let's Play: Use the Source, Luke! (was: .ie as target of .if) |
Date: |
Sun, 27 Sep 2020 18:15:47 +1000 |
I'm assuming the logic you've described applies to `.while` requests too?
(IIRC, this is the only other conditional that shares the semantics of
`.ie` and `.if`).
The next conditional handles input in...non-copy mode, a thing that no
> *roff documentation I have ever seen has a name for. (This irritates me.
> Inside me there is an Aristotle or a Linnaeus struggling to get out.))
CSTR #54 ยง 7.2 defines "copy mode" as *"[input] copied without
interpretation"*, so a more accurate name for *"non-copy mode"* might
be *"interpreted
mode"*. Corrections welcome.
On Sun, 27 Sep 2020 at 17:44, G. Branden Robinson <
g.branden.robinson@gmail.com> wrote:
> Hi, Dave!
>
> At 2020-09-17T12:03:31-0500, Dave Kemper wrote:
> > Consider the much simpler example:
> >
> > .if 0 .if 1 \{\
> > .tm foo
> > .\}
> > .tm bar
> >
> > Following your explanation, the interpreter would evaluate ".if 0",
> > decide it was false, and ignore the rest of the line, thus missing
> > that the line ends in a \{. Therefore it would go to the next line,
> > and -- unaware that it's inside an opening brace, since it never "saw"
> > it -- execute the ".tm foo" request. Proceeding to the next line, it
> > encounters an unbalanced closing brace, which it silently ignores (you
> > can verify that it doesn't care about mismatched closing braces by
> > duplicating that line as many times as you please in the input file).
> > Finally, it hits the last line and emits "bar" on stderr.
> >
> > But that's not what happens. Groff does not print "foo" to stderr,
> > which can only happen if it does in fact process the opening brace --
> > which is associated with a request (the second .if) that it never
> > looks at. This implies that, at least in some circumstances, the
> > interpreter recognizes opening braces as flow-control structures, and
> > scans for them even in code it would otherwise never examine.
> >
> > The .ie request is just as much a language flow-control element as an
> > opening brace, yet (per my original question) the interpreter does not
> > treat them the same, ignoring the .ie request in a position (after a
> > false conditional) where it does not ignore an opening brace. And the
> > opening brace is associated with the ".if 1", not the ".if 0", so it's
> > not as simple as a special case of looking for such a brace
> > immediately following a false conditional. It is, in fact, looking
> > BEYOND where it would have needed to look just to find the .ie request
> > of my first example.
> >
> > Again, if this is considered "working as designed," it should be
> > documented as such, but it's not clear to me just how to document it.
> > Tadziu's suggestion does not account for the opening-brace exception.
> >
> > And are there other exceptions? And why are there exceptions at all?
>
> I'm far from an expert on the groff parser, but I have studied it a bit
> and made _small_ changes.
>
> I can think of two reasons there are exceptions to your model:
>
> (1) Ease of maintenance of a hand-written recursive-descent parser; and
> (2) No lookahead. troff has to operate as a Unix filter. It can store
> all the state it wants but it must act on the most recent character it
> has read.
>
> > It seems like a more consistent (and, not incidentally, easier to
> > document) language design to handle all flow-control constructs the
> > same way: it either unilaterally ignores them after an .if that
> > evaluates to false, or unilaterally scans ahead to see whether any
> > occur later on the line. Instead, the behavior seems arbitrary and
> > capricious -- which *can* be documented, but still isn't a good
> > language design.
>
> Well, let's go to the source. What we need is a few functions from
> src/roff/troff/input.cpp:
>
> do_if_request() (by far the longest)
> if_else_request()
> if_request()
> else_request()
>
> The reason we have two handlers for "if" is that the actual if-handling
> logic has two call sites; one, if_request(), is dispatched when an ".if"
> request is seen on the input. The other is called by if_else_request().
>
> A key difference between these two functions is that if_request has no
> return value (returns void, in C parlance)--just like all *roff request
> handlers in GNU troff. do_if_request() returns an integer.
>
> Another key design feature is a data structure called "int_stack", which
> as you may have guessed is simply a stack for integers. The one of
> interest here is called "if_else_stack".
>
> static int_stack if_else_stack;
>
> Let us consider the short, easy functions first.
>
> void if_request()
> {
> do_if_request();
> }
>
> ...as simple as you can get.
>
> void if_else_request()
> {
> if_else_stack.push(do_if_request());
> }
>
> This is more revealing. If we have an .ie request, call do_if_request()
> _but push its return value onto the integer stack we set up_.
>
> What about the "else" part of our "if-then-else"?
>
> void else_request()
> {
> if (if_else_stack.is_empty()) {
> warning(WARN_EL, "unbalanced .el request");
> skip_alternative();
> }
>
> The above is pretty obvious. If we hit an .el, we'd better have seen an
> .ie first.
>
> else {
> if (if_else_stack.pop())
> skip_alternative();
> else
> begin_alternative();
> }
> }
>
> I think we're getting closer to the heart of the discussion here.
>
> In a well-formed groff document, an .el is only encountered after an
> .ie, which as seen above pushed the result of the if-conditional onto
> the stack. So when we see .el, we pop that integer value and test its
> truthiness.
>
> If the condition was FALSE, we call begin_alternative:
>
> static void begin_alternative()
> {
> while (tok.space() || tok.left_brace())
> tok.next();
> }
>
> This just throws away space and left brace tokens until it can return.
> But that makes sense, if the condition was FALSE, we want to execute the
> "body" of the .el.
>
> skip_alternative() has the harder job. It has to consume the body of
> the ELSE in a semi-interpreted way; enough to syntactically find the
> end of it, but not actually change the state of the engine with respect
> to anything it sees.
>
> Recall that we entered this function from an .el whose body is being
> skipped either because the .el was invalid (.el without .ie) or because
> the "if" part of an if-else (.ie) was true. There's one[1] other call
> site as we'll get to in a moment.
>
> This is the second-longest function we'll examine in today's excursion.
> And it's only 40 lines!
>
> static void skip_alternative()
> {
> int level = 0;
>
> We're going to keep track of how many \{ \} escapes are nested.
>
> // ensure that ".if 0\{" works as expected
> if (tok.left_brace())
> level++;
>
> The above is a special case, as noted.
>
> int c;
> for (;;) {
> c = input_stack::get(0);
> if (c == EOF)
> break;
>
> That's more mal-formed input handling.
>
> if (c == ESCAPE_LEFT_BRACE)
> ++level;
> else if (c == ESCAPE_RIGHT_BRACE)
> --level;
>
> I _think_ the above refer to the quasi-interned form in which, for
> instance, macro definitions are stored. In other words, if we see
> these, we're reading something was stored in "copy mode". We're seeing
> it because someone called a macro, and its body has been interpolated
> into the input stream for us.
>
> The next conditional handles input in...non-copy mode, a thing that no
> *roff documentation I have ever seen has a name for. (This irritates
> me. Inside me there is an Aristotle or a Linnaeus struggling to get
> out.))
>
> else if (c == escape_char && escape_char > 0)
> switch(input_stack::get(0)) {
> case '{':
> ++level;
> break;
> case '}':
> --level;
> break;
>
> At any rate, the last four cases we've seen do obvious things: increase
> the nesting level if we've seen some form of open-brace, and decrease it
> if we've seen some form of close-brace.
>
> case '"':
> while ((c = input_stack::get(0)) != '\n' && c != EOF)
> ;
>
> We're still inside that "else if (c == escape_char), so this is handling
> a traditional-style roff comment: \" foo. It runs until the next
> newline.
>
> I don't know why \# isn't handled here. Someone want to try to break
> the parser with a test case before I get around to it?
>
> }
> /*
> Note that the level can properly be < 0, e.g.
>
> .if 1 \{\
> .if 0 \{\
> .\}\}
>
> So don't give an error message in this case.
> */
> if (level <= 0 && c == '\n')
> break;
>
> The DevTeam thinks of everything!
>
> More importantly, this break takes us out of the for loop when we leave
> more scopes than we entered, or see the newline at the end of the
> current braceless scope.
>
> }
> tok.next();
>
> And there's the magic. We're still inside that "for (;;)", so we just
> eat tokens forever until forced to break out of the loop.
>
> }
>
> End of function.
>
> At this point I'm finding myself wanting dinner, so I'll be a bit of a
> dick and leave the ~140 line do_if_request() as an exercise for the
> reader. But actually I think above answered the question on point.
>
> Also, a lot of the following function is tied up with implementing the
> *roff conditionals, ".if d", ".if r", and so on, so it's not interesting
> from the perspective of resolving when GNU troff fully interprets
> conditional input versus when it doesn't. Skip to the end for the good
> bits.
>
> int do_if_request()
> {
> int invert = 0;
> while (tok.space())
> tok.next();
> while (tok.ch() == '!') {
> tok.next();
> invert = !invert;
> }
> int result;
> unsigned char c = tok.ch();
> if (c == 't') {
> tok.next();
> result = !nroff_mode;
> }
> else if (c == 'n') {
> tok.next();
> result = nroff_mode;
> }
> else if (c == 'v') {
> tok.next();
> result = 0;
> }
> else if (c == 'o') {
> result = (topdiv->get_page_number() & 1);
> tok.next();
> }
> else if (c == 'e') {
> result = !(topdiv->get_page_number() & 1);
> tok.next();
> }
> else if (c == 'd' || c == 'r') {
> tok.next();
> symbol nm = get_name(1);
> if (nm.is_null()) {
> skip_alternative();
> return 0;
> }
> result = (c == 'd'
> ? request_dictionary.lookup(nm) != 0
> : number_reg_dictionary.lookup(nm) != 0);
> }
> else if (c == 'm') {
> tok.next();
> symbol nm = get_long_name(1);
> if (nm.is_null()) {
> skip_alternative();
> return 0;
> }
> result = (nm == default_symbol
> || color_dictionary.lookup(nm) != 0);
> }
> else if (c == 'c') {
> tok.next();
> tok.skip();
> charinfo *ci = tok.get_char(1);
> if (ci == 0) {
> skip_alternative();
> return 0;
> }
> result = character_exists(ci, curenv);
> tok.next();
> }
> else if (c == 'F') {
> tok.next();
> symbol nm = get_long_name(1);
> if (nm.is_null()) {
> skip_alternative();
> return 0;
> }
> result = check_font(curenv->get_family()->nm, nm);
> }
> else if (c == 'S') {
> tok.next();
> symbol nm = get_long_name(1);
> if (nm.is_null()) {
> skip_alternative();
> return 0;
> }
> result = check_style(nm);
> }
> else if (tok.space())
> result = 0;
> else if (tok.delimiter()) {
> token delim = tok;
> int delim_level = input_stack::get_level();
> environment env1(curenv);
> environment env2(curenv);
> environment *oldenv = curenv;
> curenv = &env1;
> suppress_push = 1;
> for (int i = 0; i < 2; i++) {
> for (;;) {
> tok.next();
> if (tok.newline() || tok.eof()) {
> warning(WARN_DELIM, "missing closing delimiter");
> tok.next();
> curenv = oldenv;
> return 0;
> }
> if (tok == delim
> && (compatible_flag
> || input_stack::get_level() == delim_level))
> break;
> tok.process();
> }
> curenv = &env2;
> }
> node *n1 = env1.extract_output_line();
> node *n2 = env2.extract_output_line();
> result = same_node_list(n1, n2);
> delete_node_list(n1);
> delete_node_list(n2);
> curenv = oldenv;
> have_input = 0;
> suppress_push = 0;
> tok.next();
> }
> else {
> units n;
> if (!get_number(&n, 'u')) {
> skip_alternative();
> return 0;
> }
> else
> result = n > 0;
> }
> if (invert)
> result = !result;
> if (result)
> begin_alternative();
> else
> skip_alternative();
> return result;
> }
>
> Regards,
> Branden
>