[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [BUG] Trailing dash is not included in link [9.7.3 (9.7.3-2f1844 @ /
From: |
Max Nikulin |
Subject: |
Re: [BUG] Trailing dash is not included in link [9.7.3 (9.7.3-2f1844 @ /home/mwillcock/.emacs.d/elpa/org-9.7.3/)] |
Date: |
Thu, 20 Jun 2024 19:15:58 +0700 |
User-agent: |
Mozilla Thunderbird |
On 16/06/2024 22:59, Ihor Radchenko wrote:
Max Nikulin writes:
I suspect, it worked prior to v9.5. Without a unit test it may be
accidentally broken again.
No, it did not work.
If you can, please do not make such assertions without testing.
I am sorry, I had no intention to offend you. I missed that the removed
line with explicit list of punctuation characters was commented out. I
have tried the regexp used before (a part of v6.34)
facedba05 2009-12-09 15:13:50 +0100 Carsten Dominik: Use John
Gruber's regular expression for URL's
and it seems trailing dash was allowed.
+: https://domain/test-
example.org, example.net, example.com are domains reserved for usage in
examples:
<https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml>
And so?
http://example.org/dash- may be a bit better for docs. (For IPv6
addresses the difference should be more noticeable, but I do not
remember what range is reserved for usage in examples there.)
I have realized that some Org regexps use [:punct:] *regexp class* and
others *syntax class*, see latex math regexp. I am in doubts if the
discrepancy is intentional.
It is not intentional, but using syntax classes can sometimes be
fragile.
Do you mean that result depends on current buffer? I do not have strong
opinion what variant should be used. What I do not like is that in the
case of $n$-th the character after second "$" is tested against syntax
class, while regexp class is used for links. This subtle difference is
almost certainly ignored in alternative implementations of the parser.
However I am not sure what characters besides dash and apostrophe are
affected and whether it depends on locale.
09ced6d2c 2024-02-03 15:15:46 +0100 Ihor Radchenko: org-link-plain-re:
Improve regexp heuristics
[...]
(link http://example.org/a<b)
[...]
It is heuristics. We cannot be 100% right. So, it is what it is.
From my point of view it is at least close to a regression. I do not
have any argument against http://example.org/a<b>, but the regexp should
not match whole "http://example.org/a<b)"
[...]
Nowadays it is likely better to inspect
autolinking code for GitHub/GitLab or widely used python packages.
If you have concrete proposals, please share them.
Not yet. I consider inspecting mozilla's code as a kind of negative
result from the point of view of usefulness for Org. Expanding test
suite by gathering examples of failed heuristics from bug reports
require enough reports. https://wpt.live/url/resources/urltestdata.json
(https://github.com/web-platform-tests/wpt) is too specific for browsers
and HTML/JS.
I would consider [:space:] or \s-.
Do you mean "[^[:punct:][:space:]\t\n]"?
I believe it might be an improvement ([:space:] includes \t).