binutils/windmc: UTF16 bug has been fixed but semantics may not be right

Hi,

I came across the UTF16 bug using an older version of binutils that I noticed has been fixed in the current tree. However, I do not know if the sematics are correct. I think it works on all little-endian machines now but is likely to not work on big-endian machines. Unfortunately, I don't have a BE machine to test with.

The reason is that the input file is converted to UTF16LE and then run through the lexer. The previous bug was occurring because the lexer expected to work with integers and, when fed big-endian UTF16 was treating 000A as 0A00. Well, a big-endian machine working on UTF16LE input is going to have the same problem. Since the output binaries should be UTF16LE, I suppose the best solution is byte-reading.

I've included a patch, but I can't test it on a big-endian machine. Seems to work on a little-endian machine. A better approach may be to read the file into UTF8 and work on it like that, then output the UTF16 strings, but it would require some redesigning of the parser and lexer.

--
John Duncan

From:	John Duncan
Subject:	binutils/windmc: UTF16 bug has been fixed but semantics may not be right
Date:	Wed, 27 Nov 2013 09:30:14 -0500