[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] [RFC] Fixing link encoding once and for all

From: Neil Jerram
Subject: Re: [O] [RFC] Fixing link encoding once and for all
Date: Sun, 24 Feb 2019 23:04:27 +0000

I'm not sure how much freedom you have here, but I think it would be
both clearer - by avoiding confusion with URL-escaping - and easier to
type, to use an entirely different form of escaping in the Org syntax;
probably just this:

\[ and \] to include a square bracket in a link
\\ to include a backslash


On Sun, 24 Feb 2019 at 01:18, Nicolas Goaziou <address@hidden> wrote:
> Hello,
> Recently[1], issues about link escaping have resurfaced. I'd like to
> solve this once and for all.
> As a reminder, the initial issue is that bracket links, i.e., "[[path]]"
> or "[[path][description]]", cannot contain square brackets, for obvious
> reasons. Therefore, they need to be escaped somehow. For some historical
> reason, the "somehow" settled, for the path part[2], on URL encoding.
> Therefore [ and ] in a link must appear as, respectively, "%5B" and
> "%5D". Of course, the initial link could already contain any of these
> strings, so percent signs also need to be escaped, as "%25". Eventually,
> consecutive spaces are not very handled very gracefully by
> `fill-paragraph' function, so it is also useful, but not mandatory, to
> be able to escape white spaces, with "%20". It can sadly be confusing
> when Org encoding is applied on top an already encoded URI.
> To sum it up, `org-link-escape', by default, URL encodes only square
> brackets, percent signs and white spaces. Note that, however,
> `org-link-unescape' is not its reciprocal function, despite its
> docstring. It URL decodes every percent encoded combination.
> Anyway, square brackets in a bracket link almost looks like a solved
> problem. Alas, if some links are inserted by helper functions, such as
> `org-insert-link', others could have been typed right into the buffer.
> Therefore, there is usually no way to know if a link is already
> Org-encoded or not. Consequently, there is usually no way to know when
> a link needs to be Org-decoded. This is the root of all evil, or at
> least, all bugs encountered so far. Some links end up being encoded or
> decoded once too many.
> To solve this, we must assume that every bracket link is properly
> Org-encoded in a buffer. In other words, when typing, or yanking,
> a bracket link right into a buffer, users are required to use %5B, %5D,
> and %25 in the path part of the link, if necessary. I understand it will
> bite some users, but using `org-insert-link' would mitigate the pain. It
> is also limited to square brackets, which, I assume, is not the type of
> link you usually yank.
> With that assumption, the parser can safely Org-decode links
> appropriately, and store paths in their decoded form. Consumers, like
> export back-ends, need not call `org-link-unescape' anymore. In fact,
> the only situation where `org-link-unescape' is still needed is when
> extracting the path part of a bracket link from the buffer, e.g.,
> through regexp matching.
> Of course, the manual should mention this assumption, if we agree on it.
> Thoughts?
> Regards,
> Footnotes:
> [1] E.g., <http://lists.gnu.org/r/emacs-orgmode/2019-02/msg00265.html>
> or <http://lists.gnu.org/r/emacs-orgmode/2019-02/msg00292.html>.
> [2] There is no clear mechanism for the description part.
> `org-insert-link' will replace square brackets with curly ones. We could
> also use entities, but none of them appears as a square bracket. Anyway,
> I'll ignore this issue for the time being.
> --
> Nicolas Goaziou

reply via email to

[Prev in Thread] Current Thread [Next in Thread]