Almost all the identifiers are ASCII, right? So maybe optimize 99.9%
of use cases by storing such tags tables in a unibyte buffer, read
with insert-file-contents-literally?
All right, and that option is probably handled well enough already by
the user choosing (l) in the prompt when the tags file is very big.
Yes, but my idea was to do that automatically. After all, the size
threshold beyond which we prompt the user is customizable, so it could
be very large.
My (apparently faulty) intuition was that if utf-8-emacs is the memory
representation of buffer text, converting it into that encoding can be
faster because it could be done by copying from memory rather that
having to do the work of recoding every character.
We don't recode characters when they are valid UTF-8 sequences, but
you forget the raw bytes: they are converted from internal multibyte
representation to single bytes, and that requires walking the buffer
one character at a time.
IOW, utf-8-emacs is the same as utf-8 for this purpose.