bug#21055: Info reader fails to follow xrefs to anchors

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#21055: Info reader fails to follow xrefs to anchors

From:	Eli Zaretskii
Subject:	bug#21055: Info reader fails to follow xrefs to anchors
Date:	Wed, 15 Jul 2015 18:09:56 +0300

> From: Juri Linkov <juri@jurta.org>
> Cc: ludo@gnu.org (Ludovic Courtès),
>   bug-gnu-emacs@gnu.org
> Date: Wed, 15 Jul 2015 02:16:32 +0300
> 
> I'm attaching here all the files that I used to fix bug#14125,
> so you could compare the output of different makeinfo versions
> and see the problem.  The command line used to translate
> Texinfo files was: makeinfo --split-size=2000 test.texi

Thanks.

I see the problem now.  It only happened in makeinfo 5.0 and 5.1, and
is fixed since 5.2.  Furthermore, it only rears its ugly head if the
Texinfo source has an @ifnottex block before the Top node; any other
blurbs usually put there, like @copying, @direntry, etc. -- don't
trigger the problem even in those 2 versions of makeinfo.  Moreover,
when this problem happens, it only affects the 1st subfile; the rest
have their offsets set correctly.  So it's a pretty rare combination
of conditions.

Therefore, I think we should fix the anchor use case by making the
value returned from Info-read-subfile as accurate as possible, and
then cater to the problematic output of makeinfo 5.0 and 5.1 by
attempting another search for a node with a larger slop value.

So any objections to the patch below?  It introduces a new
infrastructure, and then uses it to get the file byte offset
corresponding to the first node on a subfile.

--- lisp/international/mule-util.el~0   2015-06-21 06:45:33.000000000 +0300
+++ lisp/international/mule-util.el     2015-07-15 18:00:57.053036400 +0300
@@ -412,6 +412,79 @@
                        (decode-coding-region (point-min)
                                              (min (point-max) (+ pm byte))
                                              coding-system t))))))))))))
+;;;###autoload
+(defun bufferpos-to-filepos (position &optional quality coding-system)
+  "Try to return the file byte corresponding to a particular buffer POSITION.
+Value is the file position given as a (0-based) byte count.
+The function presumes the file is encoded with CODING-SYSTEM, which defaults
+to `buffer-file-coding-system'.
+QUALITY can be:
+  `approximate', in which case we may cut some corners to avoid
+    excessive work.
+  `exact', in which case we may end up re-(en/de)coding a large
+    part of the file/buffer.
+  nil, in which case we may return nil rather than an approximation."
+  (unless coding-system (setq coding-system buffer-file-coding-system))
+  (let* ((eol (coding-system-eol-type coding-system))
+         (lineno (if (= eol 1) (1- (line-number-at-pos position)) 0))
+         (type (coding-system-type coding-system))
+         (base (coding-system-base coding-system))
+         byte)
+    (and (eq type 'utf-8)
+         ;; Any post-read/pre-write conversions mean it's not really UTF-8.
+         (not (null (coding-system-get coding-system :post-read-conversion)))
+         (setq type 'not-utf-8))
+    (and (memq type '(charset raw-text undecided))
+         ;; The following are all of type 'charset', but they are
+         ;; actually variable-width encodings.
+         (not (memq base '(chinese-gbk chinese-gb18030 euc-tw euc-jis-2004
+                                       korean-iso-8bit chinese-iso-8bit
+                                       japanese-iso-8bit chinese-big5-hkscs
+                                       japanese-cp932 korean-cp949)))
+         (setq type 'single-byte))
+    (pcase type
+      (`utf-8
+       (setq byte (position-bytes position))
+       (when (null byte)
+         (if (<= position 0)
+             (setq byte 1)
+           (setq byte (position-bytes (point-max)))))
+       (setq byte (1- byte))
+       (+ byte
+          ;; Account for BOM, if any.
+          (if (coding-system-get coding-system :bom) 3 0)
+          ;; Account for CR in CRLF pairs.
+          lineno))
+      (`single-byte
+       (+ position -1 lineno))
+      ((and `utf-16
+            ;; FIXME: For utf-16, we could use the same approach as used for
+            ;; dos EOLs (counting the number of non-BMP chars instead of the
+            ;; number of lines).
+            (guard (not (eq quality 'exact))))
+       ;; In approximate mode, assume all characters are within the
+       ;; BMP, i.e. each one takes up 2 bytes.
+       (+ (* (1- position) 2)
+          ;; Account for BOM, if any.
+          (if (coding-system-get coding-system :bom) 2 0)
+          ;; Account for CR in CRLF pairs.
+          lineno))
+      (_
+       (pcase quality
+         (`approximate (+ (position-bytes position) -1 lineno))
+         (`exact
+          ;; Rather than assume that the file exists and still holds the right
+          ;; data, we reconstruct its relevant portion.
+          (let ((buf (current-buffer)))
+            (with-temp-buffer
+              (set-buffer-multibyte nil)
+              (let ((tmp-buf (current-buffer)))
+                (with-current-buffer buf
+                  (save-restriction
+                    (widen)
+                    (encode-coding-region (point-min) (min (point-max) 
position)
+                                          coding-system tmp-buf)))
+                (1- (point-max)))))))))))
 
 (provide 'mule-util)
 
--- lisp/info.el~0      2015-06-16 10:34:22.000000000 +0300
+++ lisp/info.el        2015-07-15 18:08:58.585385400 +0300
@@ -1217,6 +1217,18 @@
                  (goto-char pos)
                  (throw 'foo t)))
 
+              ;; If the Texinfo source had an @ifnottex block of text
+              ;; before the Top node, makeinfo 5.0 and 5.1 mistakenly
+              ;; omitted that block's size from the starting position
+              ;; of the 1st subfile, which makes GUESSPOS overshoot
+              ;; the correct position by the length of that text.  So
+              ;; we try again with a larger slop.
+              (goto-char (max (point-min) (- guesspos 10000)))
+             (let ((pos (Info-find-node-in-buffer regexp strict-case)))
+               (when pos
+                 (goto-char pos)
+                 (throw 'foo t)))
+
               (when (string-match "\\([^.]+\\)\\." nodename)
                 (let (Info-point-loc)
                   (Info-find-node-2
@@ -1553,10 +1565,13 @@
     (if (looking-at "\^_")
        (forward-char 1)
       (search-forward "\n\^_"))
-    ;; Don't add the length of the skipped summary segment to
-    ;; the value returned to `Info-find-node-2'.  (Bug#14125)
     (if (numberp nodepos)
-       (- nodepos lastfilepos))))
+        ;; Our caller ('Info-find-node-2') wants the (zero-based) byte
+        ;; offset corresponding to NODEPOS, from the beginning of the
+        ;; subfile.  This is especially important if NODEPOS is for an
+        ;; anchor reference, because for those the position is all we
+        ;; have.
+       (+ (- nodepos lastfilepos) (bufferpos-to-filepos (point) 'exact)))))
 
 (defun Info-unescape-quotes (value)
   "Unescape double quotes and backslashes in VALUE."

[Prev in Thread]

Current Thread

[Next in Thread]

bug#21055: Info reader fails to follow xrefs to anchors, Eli Zaretskii, 2015/07/14
- bug#21055: Info reader fails to follow xrefs to anchors, Juri Linkov, 2015/07/14
  - bug#21055: Info reader fails to follow xrefs to anchors, Eli Zaretskii <=
    - bug#21055: Info reader fails to follow xrefs to anchors, Juri Linkov, 2015/07/16
    - bug#21055: Info reader fails to follow xrefs to anchors, Eli Zaretskii, 2015/07/18

Prev by Date: bug#21058: 25.0.50; delete-dups doesn't delete all duplicates
Next by Date: bug#21067: 25.0.50; [PATCH] With mercurial, vc-print-log puts point at eob
Previous by thread: bug#21055: Info reader fails to follow xrefs to anchors
Next by thread: bug#21055: Info reader fails to follow xrefs to anchors
Index(es):
- Date
- Thread