[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [elpa] 02/04: company-clang: handle multibyte chars between bol and

From: Eli Zaretskii
Subject: Re: [elpa] 02/04: company-clang: handle multibyte chars between bol and point
Date: Thu, 20 Mar 2014 18:11:16 +0200

> Date: Thu, 20 Mar 2014 06:10:09 +0200
> From: Dmitry Gutov <address@hidden>
> CC: address@hidden, address@hidden
> "Doesn't work" usually means that it returns a different, much longer 
> list. So, with the above file saved in UTF-8, either approach works. But 
> when it's in UTF-16, only the current one succeeds.
> > The question is not what Clang uses, the question is how does it
> > expect the offsets to be supplied for files encoded in different
> > encodings.  That is something that should be described in the Clang
> > manuals.
> Either it isn't, or I don't know what to search for.
>  > I assumed that it needs offsets in bytes, but that
> > assumption was not based on anything except looking at your code.
> The docstring for the relevant function 
> (http://clang.llvm.org/doxygen/group__CINDEX__CODE__COMPLET.html#ga50fedfa85d8d1517363952f2e10aa3bf)
> says "column", but apparently it has a special notion of columns. For 
> example, it considers any tab character as taking only one column.

I needed to look in their sources, but the information there isn't
clear-cut, either (or maybe I didn't understand the code ;-).  Some
functions that convert file offsets to columns count bytes from the
beginning of the line, others count characters, assuming a UTF-8
encoding.  But since you say the attempt to count characters in
non-UTF-8 encoding failed, I guess clang needs byte counts of UTF-8

In any case, please note that UTF-8 and the internal encoding used by
Emacs are not exactly identical, so IMO you should encode into UTF-8
and then use 'length' to compute the "column".

reply via email to

[Prev in Thread] Current Thread [Next in Thread]