[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-readline] Search with multibyte characters broken with custom rl_ge

From: Ulf Magnusson
Subject: [Bug-readline] Search with multibyte characters broken with custom rl_getc_function() (with analysis)
Date: Tue, 24 Feb 2015 05:13:08 +0100

text.c has the following comment:

/* Bytes too short to compose character, try to wait for next byte.
   Restore the state of the byte sequence, because in this case the
   effect of mbstate is undefined. */

However, there doesn't seem to be corresponding handling for the search case.
If e.g. the first byte of an UTF-8 'ö' (0xC3 0xB6) character is input during
incremental search, it ends up in the following code in input.c:

if (_rl_get_char_len (mb, &ps) == -2)
    /* Read more for multibyte character */
    c = rl_read_key ();

rl_read_key() will in turn call rl_getc_function() -- without checking if
there's any input available. The end result for the test case below is that
'mb' gets set to { 0xC3, 0xC3 } (twice the first byte), and at this point
things are clearly broken. (What happens next is that the second byte (0xB6) is
recognized as a meta character, which aborts the search.)

Below is a test case that triggers the bug. Without sending Ctrl-R, 'ö' is
displayed properly. When sending Ctrl-R, we end up with an aborted search and
broken output.

#include <locale.h>
#include <readline/readline.h>
#include <unistd.h>

static unsigned char input;
static int input_avail = 0;

static int readline_getc(FILE *dummy) {
    input_avail = 0;
    return input;

static int readline_input_avail(void) {
    return input_avail;

static void got_command(char *line) {}

static void feed_to_readline(char c) {
    input = c;
    input_avail = 1;

int main(void) {
    setlocale(LC_ALL, "");

    rl_getc_function = readline_getc;
    rl_input_available_hook = readline_input_avail;

    rl_callback_handler_install("> ", got_command);
    feed_to_readline('\x12'); // Ctrl-R (comment out to display 'ö' properly)
    feed_to_readline('\xC3'); // First byte of UTF-8 'ö'
    feed_to_readline('\xB6'); // Second byte of UTF-8 'ö'

    return 0;

(The actual code I discovered this in is
by the way. There it suffices to just press Ctrl-R and ö.)

Here's two unrelated nits I found:

 - "(*rl_redisplay_function) ();" at the end of rl_display_search() might be
   redundant. rl_message() already calls it.

 - _rl_isearch_callback() assigns 'c' but doesn't use it.

PS. If I'm not confused and this is a real bug, it would be nice if you
included my nick (Ulfalizer) in any attributions. Looks good when job hunting.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]