[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Android input methods (was: Re: textconv.c)

From: Po Lu
Subject: Android input methods (was: Re: textconv.c)
Date: Sun, 12 Feb 2023 23:27:00 +0800
User-agent: Gnus/5.13 (Gnus v5.13)

Eli Zaretskii <eliz@gnu.org> writes:

> I don't see why this is important, since you can switch the buffer
> temporarily.  We do this all over the place, since insdel.c always
> works on the current buffer.

The text is copied into a C ``char *'', not into another buffer.

> [lots of details omitted]
> And you intended to produce code which supports this without any
> discussions of the architecture and design?  I'm surprised, to say the
> least.  This has to be discussed, with the participation of everyone
> on board who knows about the Emacs internals related to these issues.
> We have here a significant amount of knowledge, expertise, and past
> experience with similar issues, and disregarding that and trying to
> solve this by your lone self is at the very least unwise.


> My suggestion is that you describe the problem(s), i.e. what these
> input methods expect from the client application, in enough detail
> that will allow people here think about it and suggest solutions.
> Please don't write even a single line of code before such a
> description is posted and people have enough time to respond with
> suggestions, ideas, and questions.  (I have already a couple of ideas,
> but will withhold them until I'm convinced that I understand the
> problems to be solved.)
> P.S. And please start a thread with a new, more meaningful name when
> you post those details.

Ok, now done.

Basically, today, on Android (but also on other platforms), input
methods desire fine grained control over buffer contents, as they start
to provide more and more features aside from simply composing text.

This is mainly seen on Android, but they have appeared in other systems
as well, most notably GNOME on handheld devices.  What is being said
below applies to those input methods as well.

In the past, input methods more or less worked like this: when a key
press arrives, the input method receives it first, performs
transformations, and either returns it to the application or inserts it
into a composition buffer.  Once the composition completes (through the
user pressing enter, or some other similar key), the text is sent to
Emacs, which converts each character inside into a key press, to be
inserted by self-insert-command.

On Android, input methods work the other way around.  They do the text
insertion and deletion themselves, all whilst querying the text editor
about the position of the caret (point) and the text around it for
reference.  Emacs is only told to insert or delete text at a specific
position in its buffer, and is obligated to inform the input method
about changes around the caret.

If Emacs makes a change to the buffer outside the area in which the
input method expresses interest, then it is obligated to ``restart'' the
input method.  This takes a significant amount of time to complete.

Sometimes, the input method will also tell Emacs to mark a portion of
the buffer as ``preconversion text'' (or a ``composing span''), which is
an ephemeral region which may be replaced by the input method by some
other text, or deleted altogether.  The intention is that the input
method will display temporary edits to the buffer used to display
the contents of any on-going composition to the user within that
ephemeral region.

Input methods on Android make extensive use of this functionality, even
for input in languages that utilize Latin or Cyrillic script.  Consider
a user who wants to delete the words ``tomorrow afternoon'' and replace
them with ``next Thursday'' in the following buffer:

  Why don't we both look through all the television channels tomorrow
  afternoon for offensive content we can complain about?

on a desktop, this would be simple; assume that point is already after
the word ``tomorrow afternoon''.  The user will press the delete key
enough times to delete ``tomorrow afternoon'', and then type in ``next

On Android, this is completely different.  Once the input method (and on
screen keyboard) is displayed, it looks at the text surrounding the
point.  It sees the word:


immediately before point, and the text:

  `` for''

immediately after.  Since the caret (point) is closer to the word
``afternoon'' than it is to the word ``for'', it now considers itself to
be editing the word ``for''.

The input method then tells Emacs that ``afternoon'' is now the
ephemeral region, by issuing a request along the lines of:

  ``set the preconversion region to 69-78''

Emacs is now expected to indicate, by displaying an underline, that the
IME is now editing the word ``afternoon''.  As the user starts to press
delete in the input method, the input method starts to issue requests to
replace the contents of the preconversion region with something else:

  ``replace the preconversion region contents with afternoo''
  ``replace the preconversion region contents with afterno''
  ``replace the preconversion region contents with aftern''
  ``replace the preconversion region contents with after''
  ``replace the preconversion region contents with afte''
  ``replace the preconversion region contents with aft''
  ``replace the preconversion region contents with af''
  ``replace the preconversion region contents with a''
  ``remove the preconversion region entirely''

at that point, the input method asks for the contents of the buffer
before point again, and repeats the whole process.  Point is now 69,
immediately after a newline character, which cannot be meaningfully
composed.  Input methods have been observed to do one of two things:
either the input method will issue a request:

  ``delete one character before 69''

or it will say:

  ``set the preconversion region to 68-69''
  ``remove the preconversion region''

sometimes, the input method will start to delete entire words at a time.
When that happens, the input method will look backwards and ask for the


and simply ask Emacs:

  ``delete 9 characters after the position 60''

or perhaps

  ``set the preconversion region to 60-69''
  ``remove the preconversion region''

or perhaps some other combination that I have yet to see in practice.
Now assume that the user changes his mind in the middle of the
operation, say immediately after ``afternoon'' has become ``aftern''.
The input method may display the text ``afternoon'' in a button, to
allow him to undo the change immediately.  If that is pressed, Emacs
might receive:

  ``replace the preconversion region contents with afternoon''
  ``stop preconverting text''

or alternatively:

  ``stop preconverting text''
  ``insert the text oon after 75''

or some other request.

All of this is behavior I have observed CJK and English input methods
perform.  An input method is not obligated to behave in any way like
what I have described above, as long as it constrains its edits to some
reasonable position (600 characters) around the caret; if it makes edits
any further away from the caret than that, the behavior of the
application is undefined.  i.e. it might also be valid for the input
method to say:

  ``replace 0-123 with <random string>''
  ``replace 0-123 with <random string>''
  ``replace 0-123 with <random string>''
  ``replace 0-123 with <random string>''
  ``replace 0-123 with <random string>''
  ``replace 0-123 with <random string>''
  ``replace 0-123 with <random string>''

over and over again, though I don't see the utility in that.  But the
input method will stop working properly until the next time it is reset
if it doesn't see the replacement reflected in Emacs's own buffer

Sometimes, an input method will also monitor changes to the caret
position.  At this point, Emacs is obligated to report any changes to
the on screen caret to the input method, so it knows where it should
begin to make edits from.

An input method might also ask for a region of text to be ``extracted'',
which means Emacs must report each change to the buffer that modifies
said region to the input method, but is relieved of the obligation to
reset the input method as long as a ``major change'' (whatever that
means) has not happened to the buffer contents, or outside the extracted
text.  What I have observed is that the region of extracted text is wide
enough to perform actions such as refilling a paragraph or indenting a
line without resetting the input method, but not much more than that.

In any case, the conclusion is that Emacs must present a completely
correct view of the buffer contents of the selected window and the
location of its point to the input method, correctly report edits made
by the input method to the buffer contents and any edits made by Emacs
after that, and dilligently report changes to extracted text and/or
reset the input method on ``major changes'' such as the selected buffer
or window changing, or edits happening outside extracted text.

Otherwise, the behavior of the input method becomes undefined (and

Now, it is sometimes possible to disable the input method and to simply
work with an on screen keyboard (which is what the Android port
currently does), but that precludes entering any non-ASCII text, and is
a luxury which is only affored by several input methods.  Also, it
wouldn't be out of character for GNOME to demand applications implement
input method support their ``right way'' either, at some point in the
future, so we will have to implement this properly, if not now, then at
some point in the future.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]