bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#44173: 28.0.50; gdb-mi mangles strings with octal escapes


From: Eli Zaretskii
Subject: bug#44173: 28.0.50; gdb-mi mangles strings with octal escapes
Date: Fri, 23 Oct 2020 16:19:42 +0300

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Fri, 23 Oct 2020 14:41:02 +0200
> Cc: 44173@debbugs.gnu.org
> 
> 23 okt. 2020 kl. 14.01 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > I'm okay with writing a GDB/MI parser, but I'm not sure I understand
> > how would that help to solve this particular conundrum.  AFAIR,
> > there's a genuine ambiguity there regarding non-ASCII characters
> > reported from GDB.
> 
> Would you mind explaining the ambiguity? Do you mean what coding system 
> should be used for "\303\266" -- whether it should be interpreted as a string 
> of those two bytes, the string "ö", the string "ö", or something else?

My memory is imperfect, but luckily I was wise enough to summarize the
problems in a comment to gdb-mi-decode, which you mentioned.  Let me
now quote it:

  ;; FIXME: This is fragile: it relies on the assumption that all the
  ;; non-ASCII strings output by GDB, including names of the source
  ;; files, values of string variables in the inferior, etc., are all
  ;; encoded in the same encoding.  It also assumes that the \nnn
  ;; sequences are not split between chunks of output of the GDB process
  ;; due to buffering, and arrive together.  Finally, if some string
  ;; included literal \nnn strings (as opposed to non-ASCII characters
  ;; converted by GDB/MI to octal escapes), this decoding will mangle
  ;; those strings.  When/if GDB acquires the ability to not
  ;; escape-protect non-ASCII characters in its MI output, this kludge
  ;; should be removed.

The basic ambiguity, AFAIR, is what is described last here: a string
reported bu GDB could include literal \nnn sequences, which are not
non-ASCII characters that GDB/MI converts to octal escapes.  The
information which was which is lost once we receive the GDB/MI output.

> This bug is not about the encoding; it's about not interpreting the string as 
> "303266".

AFAIU, this bug's root cause is the way we solved the ambiguity, which
basically assumes one of the possible interpretations should be
preferred to another, because it is more popular/useful.

Let me turn the table and ask you how did you get that string you show
in the original report?  What kind of application were you debugging,
and what did that string mean in that application?

> >  Could you tell how will this be solved by a
> > different parser?
> 
> Again, I'm not sure what you mean. The bug arises because we feed incorrectly 
> translated data into a JSON parser. If we parse the string ourselves instead 
> of going via JSON, that particular problem goes away.

And what will then happen to non-ASCII strings and file names reported
by GDB?  How will our parser solve that?

> > P.S. Btw: gdb-mi.el already has a BNF parser for GDB/MI.
> 
> It doesn't parse the lower parts of the grammar -- 'result', 'value' and so 
> on. JSON is used for that.

Do you intend to extend the existing parser or write a new one from
scratch?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]