My next patch needs to convert text from an untrusted input into an
output representation that is suitable for display on a terminal is
useful to more than just the json-writer; the text should normally be
UTF-8, but blindly allowing all Unicode code points (including ASCII
ESC) through to a terminal risks remote-code-execution attacks on some
terminals. Extract the existing body of json-writer's quoted_strinto
a new helper routine mod_utf8_sanitize, and generalize it to also work
on data that is length-limited rather than NUL-terminated. [I was
actually surprised that glib does not have such a sanitizer already -
Google turns up lots of examples of rolling your own string
sanitizer.]
If desired in the future, we may want to tweak whether the output is
guaranteed to be ASCII (using lots of \u escape sequences, including
surrogate pairs for code points outside the BMP) or if we are okay
passing printable Unicode through (we still need to escape control
characters). But for now, I went for minimal code churn, including
the fact that the resulting function allows a non-UTF-8 2-byte synonym
for U+0000.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
include/qemu/unicode.h | 3 ++
qobject/json-writer.c | 47 +----------------------
util/unicode.c | 84 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 88 insertions(+), 46 deletions(-)