[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Another issue with thingatpt

From: Bob Rogers
Subject: Re: Another issue with thingatpt
Date: Sat, 30 Dec 2006 22:08:29 -0500

   From: Piet van Oostrum <address@hidden>
   Date: Fri, 29 Dec 2006 22:23:55 +0100

   >>>>> Bob Rogers <address@hidden> (BR) wrote:

   >BR>    From: Werner LEMBERG <address@hidden>
   >BR>    Date: Wed, 27 Dec 2006 11:50:42 +0100 (CET)

   >BR>    . . .

   >BR>    thingatpt ignores the final `;'.

   >BR>        Werner

   >BR> According to RFC3986 (aka STD066), this is wrong; ";" is legitimate
   >BR> anywhere in a path or query part, including the end.  So are "." and
   >BR> ",", but thing-at-point-url-path-regexp also refuses to match these
   >BR> characters at the end of the string.  Doing (ffap-string-at-point 'url)
   >BR> drops these characters plus ":", "!", and (questionably) "?".

   >BR>    It may not be possible to find a tradeoff between RFC compliance and
   >BR> parsing dwimmery that would satisfy everybody.  Since stripping off
   >BR> trailing punctuation is useful behavior (ISTR it's worked this way for a
   >BR> while now), I would recommend against changing it now.  However, a case
   >BR> could be made for making thing-at-point and ffap-string-at-point
   >BR> consistent.  Perhaps "!:;.," would be best?  This is just the union of
   >BR> the two sets but without the dubious inclusion of "?".

   The way to reconcile these would be to customize it, I think. For example
   have a string variable that contains the punctuation characters to be
   included at the end. Or a regexp.

Both interfaces (ffap and thing-at-point) are already customizable,
though in different ways.  ffap-string-at-point uses
ffap-string-at-point-mode-alist, which maps a thing type symbol or mode
name symbol to a list of three character sets; the last string in each
alist entry is the set of characters to exclude at the end.  On the
other hand, thing-at-point uses pure regexps, but they are constructed
from each other, which makes thing-at-point harder to customize.

   Note that neither of thes implementations is really mode-sensitive,
AFAICS; ffap-string-at-point-mode-alist is poorly named.  If editing
something XML-like, for example, you would want the attribute in

        <tag attr='http://...'>

to be parsed without dropping ANY characters at the end -- and any
embedded '&apos;' to be translated to a literal apostrophe.  But even if
this is TRT, it is clearly too risky to attempt now.

   But is there any objection to unifying these two implementations
after the release?  And if so, which is the better implementation?  I
believe the difference is only historical; ffap.el is much older than
thingatpt.el (IIRC).

   By the way, thing-at-point-url-path-regexp also disallows : inside a url.
   These would be necessary to accept IPv6 IP addresses.

It works for me (though in an emacs built two weeks ago):

        (string-match thing-at-point-url-path-regexp "http://::1/foo/bar.html";)
            => 0
        (string-match thing-at-point-url-regexp "http://::1/foo/bar.html";)
            => 0

Do you have an example of failure?

                                        -- Bob

reply via email to

[Prev in Thread] Current Thread [Next in Thread]