[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Pandoc and nested emhases

From: Max Nikulin
Subject: Re: Pandoc and nested emhases
Date: Fri, 18 Feb 2022 19:06:35 +0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0

On 18/02/2022 07:47, Juan Manuel Macías wrote:

Otherwise, if you export to LaTeX with pandoc (v. 2.14.2), the result is
(to my surprise) correct:

str="/lorem /ipsum/ dolor/"
pandoc -f org -t latex <<< $str
\emph{lorem \emph{ipsum} dolor}

2.5-3build2 from Ubuntu-20.04 works in the same way.

I like such behavior:

echo "/lorem =ip/ sum= dolor/" | pandoc -f org -t latex
\emph{lorem \texttt{ip/\ sum} dolor}

I know at least one more persons who will be happy as well:
https://list.orgmode.org/87pmtqp79s.fsf@web.de/T/#u mid:87pmtqp79s.fsf@web.de
(tracked as a confirmed bug at https://updates.orgmode.org/)

printf '/lorem\nipsum [[https://orgmode.org/,service][dolor]] ipsum/\n' | pandoc -f org -t latex
\emph{lorem ipsum \href{https://orgmode.org/,service}{dolor} ipsum}

Another (more abstract) doubt that arises, although I am not an expert
in matters of grammar and specifications. If nested emphases of the same
category are not possible in Org, should this be understood as a bug or
a feature? What implication does it have if a external parser, like
Pandoc, parses them just "fine"?

Nicolas Goaziou explicitly stated that current behavior is correct, see "[Patch] to correctly sort the items with emphasis marks in a list". Tue, 20 Apr 2021 22:37:31 +0200. mid:874kg0ae0k.fsf@nicolasgoaziou.fr

Nicolas confirmed it when I posted a similar example later in the following discussion:

Ihor Radchenko. c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’
Thu, 18 Nov 2021 20:25:33 +0800. mid:87tug93b2a.fsf@localhost
My intuition says that the current parser behaviour is not correct. It
would make more sense to prioritise link over italics. However, it would
require a major change in the parser - instead of a single pass, the
parser may parse different types of objects sequentially.

Nicolas Goaziou. c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’
Thu, 18 Nov 2021 13:35:19 +0100. mid:87y25l8wvs.fsf@nicolasgoaziou.fr
I disagree. Priority should be given to the first object being started.
This is, IMO, the only sane way to handle syntax.

And once more in response to my message:

Nicolas Goaziou. org parser and priorities of inline elements.
Sat, 27 Nov 2021 20:02:31 +0100. mid:87mtlppgl4.fsf@nicolasgoaziou.fr
I don't see any incentive to change the order objects are parsed, once
you know how Org does it. This is just a red herring. What is useful,
however, is to fontify them the way Org sees them.

So formally this feature of pandoc is a bug (due to different kind of parser). It is the reason why a corpus of tests should exist in a format that can be easily imported from various programming languages.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]