Hello Sumedh,
On Fri, Oct 28, 2016, at 23:58, Sumedh Pendurkar wrote:
Typing <Enter> ch ^]
it says: ch�
It produces an invalid byte.
Your code is not entirely UTF-8 compatible.
I am new to utf8. So I haven't read enough about it.
Please correct my mistakes if I make any.
I just looked into the code and just ran the code on paper.
1)is "è" a single byte?
No. UTF-8 is a multibyte encoding. Anything that is not ASCII
takes up two or three or four bytes.
Then it checks if next byte is a mb_char or not(Which surprisingly
returns true)
is_word_mbchar() does not check a byte; it checks whether the
string that starts at the given position begins with a valid
/multibyte/ character -- mb = multibyte. (But a valid single-byte
character is good too, of course.)
(note: if it is two bytes. the second byte was not a word forming
character thats why it signaled the end of word).
You cannot check bytes for being word forming, you need to
check characters, which means that now and then you have
to skip a byte, or two, or three.
Also,
./configure --enable-utf8
put this on the terminal.
*** UTF-8 support was requested, but insufficient UTF-8 support was
*** detected [...]
You need to have libncurses5w or libncurses6w (note the "w")
and the corresponding libncurses5w-dev or libncurses6w-dev
packages installed. Please report what you needed to install
for configure to pass without error, so I can update README.GIT.