guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

question about PEG parsing - mutually recursive definitions


From: Zelphir Kaltstahl
Subject: question about PEG parsing - mutually recursive definitions
Date: Fri, 4 Oct 2024 00:44:53 +0000

Hello Guile Users,

I have questions regarding Guile's PEG parsing library. Is it possible to have mutually recursive definitions of grammar rulse? I am asking this, because I am trying to write a grammar for org-mode files and that required handling nested inline markups like something bold, which contains something emphasized, which contains something ... and so on.

After some tinkering, I came to the conclusion to first try only with bold and emphasized: https://codeberg.org/ZelphirKaltstahl/guile-examples/src/commit/1575441cf2fdf0f35535db1e6b1986606fa5b8b0/parsing/peg-parsing/nested-inline-elements.scm:

~~~~
(define-module (negation-grammar))


(use-modules
 (ice-9 peg)
 (ice-9 pretty-print)
 (peg-tree-utils))


(define-peg-pattern LITERAL-ASTERISK body "*")
(define-peg-pattern LITERAL-SLASH body "/")

(define-peg-pattern LITERALLY-SPACE-BODY body " ")
(define-peg-pattern LITERALLY-TAB-BODY body "\t")
(define-peg-pattern LITERALLY-ASTERISK-BODY body "*")
(define-peg-pattern LITERALLY-NEWLINE-BODY body "\n")

(define-peg-pattern WHITESPACE all
  (or LITERALLY-SPACE-BODY
      LITERALLY-TAB-BODY
      LITERALLY-NEWLINE-BODY))

(define-peg-pattern BOLD-DELIMITER all LITERAL-ASTERISK)
(define-peg-pattern EMPHASIS-DELIMITER body LITERAL-SLASH)

(define-peg-pattern NOT-ASTERISK body (and (not-followed-by LITERAL-ASTERISK) 
peg-any))
(define-peg-pattern NOT-SLASH body (and (not-followed-by LITERAL-SLASH) 
peg-any))

(define-peg-pattern BOLD-CONTENT all
  (or EMPHASIS
      (* (and (not-followed-by "*")
              peg-any))))

(define-peg-pattern BOLD all
  (and BOLD-DELIMITER
       BOLD-CONTENT
       BOLD-DELIMITER))

(define-peg-pattern EMPHASIS-CONTENT all
  (or BOLD
      (* (and (not-followed-by "*")
              peg-any))))

(define-peg-pattern EMPHASISIS all
  (and EMPHASIS-DELIMITER
       EMPHASIS-CONTENT
       EMPHASIS-DELIMITER))

(define-peg-pattern DOCUMENT all
  (+ BOLD))

(define input "*bold /emphasized/ bold*")

(simple-format #t "input: ~s\n" input)

(define peg-record (match-pattern DOCUMENT input))
(define parse-tree (peg:tree peg-record))

(simple-format
 #t "tree:\n~a"
 (call-with-output-string
   (λ (port)
     (print-tree parse-tree port))))

(simple-format #t "match?: ~a\n" (peg-record? peg-record))
(simple-format #t "matched substring: ~a\n" (matched-substring DOCUMENT input))
(simple-format #t "exhausting-match?: ~a\n" (exhausting-match? DOCUMENT input))
~~~~

The problem with this one is, that Guile errors, because EMPHASIS is not yet defined where it is first used in the definition of BOLD-CONTENT. However, if I move EMPHASIS up above BOLD-CONTENT ... it is not going have EMPHASIS-CONTENT defined before it uses those. And if I move those up too, then BOLD is not going to be defined, but used in EMPHASIS-CONTENT. And so on and on. This problem would of course happen to all other inline markup definitions that are yet to come: strikethrough, underline, verbatim, code, ...

So now I am wondering, whether it is impossible to have such a definition, or how to resolve the problem. The pattern definitions do not seem to work like normal function definitions, which would be able to reference each other mutually recursively. Instead they seem to rely on the order of definition.

How can I achieve parsing nested markup? How can I circumvent this problem of mutually recursive definitions? Can the PEG library do it, or do I perhaps need to switch to some other parsing library?

Best regards,
Zelphir

--
repositories:https://notabug.org/ZelphirKaltstahl,https://codeberg.org/ZelphirKaltstahl


reply via email to

[Prev in Thread] Current Thread [Next in Thread]