Re: [PATCH] Allow inserting non-BMP characters

Eli Zaretskii <address@hidden> schrieb am Di., 26. Dez. 2017 um 17:11 Uhr:

> From: Philipp Stephani <address@hidden>
> Date: Tue, 26 Dec 2017 10:35:42 +0000
> Cc: address@hidden, address@hidden
>
> Suggest to move surrogates_to_codepoint to coding.c, and then use the
> macros UTF_16_HIGH_SURROGATE_P and UTF_16_LOW_SURROGATE_P defined
> there.
>
> Hmm, I'd rather go the other way round and remove these macros later. They are macros, thus worse than
> functions,

I don't think we have a policy to prefer inline functions to macros,
and I don't think we should have such a policy. We use inline
functions when that's necessary, but we don't in general prefer them.
They have their own problems, see the comments in lisp.h for some of
that.

Thanks, the only discussion I saw there was about some performance issues:

Some operations are so commonly executed that they are implemented

as macros, not functions, because otherwise runtime performance would

suffer too much when compiling with GCC without optimization.

FIXME: Remove the lisp_h_OP macros, and define just the inline OP

functions, once "gcc -Og" (new to GCC 4.8) works well enough for

Emacs developers. Maybe in the year 2020. See Bug#11935.

That bug says that GCC 4.8 should be available in Debian stable to support -Og. Debian stable now already has 6.3 (https://wiki.debian.org/DebianStretch#Packages_.26_versions). So maybe it's time to compile development versions with -Og and try to reapply the original patch. 5 years have passed, and compilers have gotten a lot better.

In any case, the new functions are certainly not commonly executed. They are currently only executed when converting non-BMP keystrokes on macOS, which is rare enough.

> and don't seem to be correct either (what about a value such as 0x11DC00?).

??? They care correct for UTF-16 sequences, which are 16-bit numbers.
If you need to augment them by testing the high-order bits to be zero
in your case, that's okay, but I don't see any need for introducing
similar but different functionality.

I'd be OK with using the macros since they already exist, but I wouldn't want to touch them without converting them to functions first, and for using them in nsterm.m I'd have to move them around.

> No new macros please if we can avoid it. Functions are strictly better.

Sorry, I disagree. Each has its advantages, and on balance I find
macros to be slightly better, certainly not worse. There's no need to
avoid them in C.

I disagree, see e.g. https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html and many other sources.

Sometimes macros are unavoidable, but not here.

> I don't care much whether they are in character.h or coding.h, but char_surrogate_p is already in character.h.

char_surrogate_p should have used the coding.h macros as well.

> > + USE_SAFE_ALLOCA;
> > + unichar *utf16_buffer;
> > + SAFE_NALLOCA (utf16_buffer, 1, len);
>
> Maximum length of a UTF-16 sequence is known in advance, so why do you
> need SAFE_NALLOCA here? Couldn't you use a buffer of fixed length
> instead?
>
> The text being inserted can be arbitrarily long. Even single characters (i.e. extended grapheme clusters) can
> be arbitrarily long.

Yes, but why do you first copy the input into a separate buffer? Why
not convert each UTF-16 sequence separately, as you go through the
loop?

Message (method) invocations in Objective-C have high overhead because they are late-bound. Therefore it is advisable to minimize the number of messages sent. https://developer.apple.com/documentation/foundation/nsstring/1408720-getcharacters?language=objc also indicates that a (properly implemented) getCharacters call is faster than calling characterAtIndex in a loop.

From:	Philipp Stephani
Subject:	Re: [PATCH] Allow inserting non-BMP characters
Date:	Tue, 26 Dec 2017 18:50:46 +0000