bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #62142] [refer] reports incorrect line numbers in diagnostics


From: G. Branden Robinson
Subject: [bug #62142] [refer] reports incorrect line numbers in diagnostics
Date: Sat, 5 Mar 2022 10:06:24 -0500 (EST)

URL:
  <https://savannah.gnu.org/bugs/?62142>

                 Summary: [refer] reports incorrect line numbers in
diagnostics
                 Project: GNU troff
            Submitted by: gbranden
            Submitted on: Sat 05 Mar 2022 03:06:22 PM UTC
                Category: Preprocessor refer
                Severity: 3 - Normal
              Item Group: Warning/Suspicious behaviour
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None

    _______________________________________________________

Details:

When processing the attached file (from bug #60657), refer(1) issues
diagnostics with incorrect line numbers.


$ refer -e -p ./test.ref testref2.mm >/dev/null
refer:test.ref:579: invalid input character code 136
refer:test.ref:579: invalid input character code 137
refer:test.ref:616: invalid input character code 136
refer:test.ref:616: invalid input character code 137
refer:test.ref:701: invalid input character code 136
refer:test.ref:701: invalid input character code 137
refer:test.ref:725: invalid input character code 136
refer:test.ref:725: invalid input character code 137
$ for ln in 579 616 701 725; do sed -n ${ln}p ./test.ref; done
%I Hermann, "puis" Presses universitaires de France, "puis" Dunod
%Q Ecole d'\*'et\*'e de physique th\*'eorique ()
%D [1979]
%C  Paris


If we subtract 3 from each of these line numbers, we get some lines with UTF-8
sequences.


$ for ln in 579 616 701 725; do ln=$((ln - 3)); sed -n ${ln}p ./test.ref | xxd
| grep '[89a-f][89a-f]'; done
00000000: 2554 20c2 8855 6e69 7665 7273 6974 5c2a  %T ..Universit\*
00000010: 2765 2064 6520 4772 656e 6f62 6c65 2e20  'e de Grenoble. 
00000030: 5c2a 2765 2064 6520 7068 7973 6971 7565  \*'e de physique
00000050: 8943 6f75 7273 2064 6f6e 6e5c 2a27 6573  .Cours donn\*'es
000000b0: 7061 7220 432e 205b 435c 2a27 6563 696c  par C. [C\*'ecil
00000000: 2554 20c2 884c 6520 c289 6861 7361 7264  %T ..Le ..hasard
00000050: 5e75 7420 3139 3836 205c 2a27 6564 2e20  ^ut 1986 \*'ed. 
00000060: 7061 7220 4a65 616e 2053 6f75 6c65 7469  par Jean Souleti
00000070: 652c 204a 6561 6e20 5661 6e6e 696d 656e  e, Jean Vannimen
00000000: 2554 20c2 884c 6573 20c2 8950 6574 6974  %T ..Les ..Petit
00000020: 666f 726d 6174 6971 7565 7320 6465 2074  formatiques de t
00000030: 7261 6974 656d 656e 7420 6465 2074 6578  raitement de tex
00000000: 2554 20c2 8849 424d 2046 7261 6e63 652e  %T ..IBM France.
00000010: 2045 6475 6361 7469 6f6e 2063 6f6d 6d65   Education comme
00000040: 2e2e 20c2 8953 7570 706f 7274 2064 6520  .. ..Support de 


(My regex catches some false positives.)

We can see the UTF-8 sequences "c2 88" and "c2 89" multiple times.

Possibly related, the input file is a mixture of ISO-8859 (-1 or -15) and
UTF-8.


$ isutf8 test2.ref
test2.ref: line 407, char 20, byte 12283: After a first byte between E1 and
EC, expecting the 2nd byte between 80 and BF.


So, let's try to track down what is frotzing the line counter.




    _______________________________________________________

File Attachments:


-------------------------------------------------------
Date: Sat 05 Mar 2022 03:06:22 PM UTC  Name: test.ref  Size: 37KiB   By:
gbranden

<http://savannah.gnu.org/bugs/download.php?file_id=52957>
-------------------------------------------------------
Date: Sat 05 Mar 2022 03:06:22 PM UTC  Name: testref2.mm  Size: 64B   By:
gbranden

<http://savannah.gnu.org/bugs/download.php?file_id=52958>

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?62142>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]