[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#20499: C-x 8 shorthands for curved quotes, Euro, etc.
From: |
Ivan Shmakov |
Subject: |
bug#20499: C-x 8 shorthands for curved quotes, Euro, etc. |
Date: |
Thu, 07 May 2015 10:00:38 +0000 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) |
>>>>> Paul Eggert <eggert@cs.ucla.edu> writes:
[…]
>> … Also, did you consider generating this list automatically, based
>> on the codepoint properties already known to Emacs? Something along
>> the lines of the function MIMEd, which readily produces a list of
>> entries for the following 133 characters. (Three spaces added for
>> symmetry purposes.)
>> À Á Â Ã Ä È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ô Õ Ö Ù Ú Û Ü Ý
>> à á â ã ä è é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý
>> ÿ Ā ā Ć ć Ĉ ĉ Č č Ď ď Ē ē Ě ě Ĝ ĝ Ĥ ĥ Ĩ ĩ Ī ī Ĵ ĵ Ĺ ĺ
>> Ľ ľ Ń ń Ň ň Ō ō Ŕ ŕ Ř ř Ś ś Ŝ ŝ Š š Ť ť Ũ ũ Ū ū Ŵ ŵ Ŷ ŷ
>> Ÿ Ź ź Ž ž Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǧ ǧ Ǩ ǩ ǰ Ǵ ǵ Ǹ ǹ Ș ș Ț ț
>> Ȟ ȟ Ȳ ȳ
> Sorry, I don't really follow the code that you attached.
Which part, specifically?
It just iterates over the range given (or U+00A8 through U+02AF
by default) and maps “LATIN + COMBINING” decompositions to
'iso-transl entries. For example, it maps the (?g #x327)
decomposition (U+0327 being COMBINING CEDILLA) for U+0123 into
an (",g" . ģ) entry.
Or, rather, it /should/, for my code has an obvious typo:
(`(,c #x30c) (string ?v c))
(`(,c #x326) (string 59 c))
- (`(,c #x326) (string ?, c)))))
+ (`(,c #x327) (string ?, c)))))
Other possible additions (assuming we’ll agree on C-x 8 u,
C-x 8 .) are:
(`(,c #x304) (string ?= c))
+ (`(,c #x306) (string ?u c))
+ (`(,c #x307) (string ?. c))
(`(,c #x308) (string 34 c))
+ (`(,c #x30b) (string ?2 c))
(`(,c #x30c) (string ?v c))
> Although I suppose it comes from a decomposition table, I don't know
> what the table was designed for, and it's not clear to me how it's
> relevant.
I hope someone more knowledgeable could comment on this. Still,
this (ab)use of the data seem to work well in practice.
> Anyway, most of those letters are either in iso-transl.el now,
The point is to /remove/ them from 'iso-transl, as these entries
duplicate, in a way, a part of the decomposition table already
present in Emacs.
[…]
>> Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǹ ǹ
> These are for toned Pinyin but this list is incomplete. If we wanted
> to cover toned Pinyin, we'd also need Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ. Coming up
> with two-character abbreviations for all these might be tricky.
But are we actually limited to two-character abbreviations only?
Why not allow for, say, C-x 8 " ' u?
[…]
>> ǰ
> What language uses this? I couldn't find one.
To quote NamesList.txt:
01F0 LATIN SMALL LETTER J WITH CARON
* IPA and many languages
>> Ǵ ǵ
> Good catch. These are used for transliteration from Serbian and
> Macedonian. We should also include Ḱ ḱ as they are also needed.
> Included in the attached patch.
The code I’ve suggested could be used to scan the U+1Exx range
just as well, thus resulting in the following set.
Ḑ ḑ Ḡ ḡ Ḧ ḧ Ḩ ḩ Ḱ ḱ Ḿ ḿ Ṕ ṕ Ṽ ṽ Ẁ ẁ Ẃ ẃ Ẅ ẅ Ẍ ẍ Ẑ ẑ ẗ Ẽ ẽ Ỳ ỳ Ỹ ỹ
[…]
> Anyway, part of what's going on here is that the proposed list
> doesn't cover every Latin character in the ISO 10646 repertoire
> (that'd be a large set), but instead is limited to what appear to be
> reasonably commonly letters. Admittedly this is not universal but
> one must cut things off somewhere, and it would be odd to add only
> partial coverage for toned Pinyin, Livonian, etc.
When it comes to the LATIN … LETTER WITH … letters, my proposal
for such a cut off would be to satisfy /both/ of the following
criteria:
• only cover specific Unicode ranges; such as, for instance,
U+00A8 through U+02AF, U+1E00 … U+1EFF, perhaps 2C60 … 2C7F;
• only cover the letters which can be represented with a
sufficiently general C-x 8 ⟨diacritic⟩+ ⟨ASCII-latin⟩ pattern.
Other characters deemed common may be added to the list.
>>> --------------090904020002020306060104
>>> Content-Type: text/x-patch;
>>> name="0001-C-x-8-shorthands-for-curved-quotes-Euro-etc.patch"
>> This MIME part sure wants ‘; charset=UTF-8’. Otherwise, Gnus does
>> no decoding, and Emacs shows the contents with the likes of
>> \304\260.
> Hmm, it works for me. I use Thunderbird to read the top level
> message, and it spins off an Emacs to display the attachment with no
> problem.
I can “spin off” cat(1) to read the offending MIME part, too:
Emacs will feed it raw-text, and interpret the result as UTF-8
(the default.)
It still does /not/ comply with the MIME specification.
Consider section 4.1.2 of RFC 2046:
RFC> […] The default character set, which must be assumed in the
RFC> absence of a charset parameter, is US-ASCII.
RFC 6657 updates this as follows:
RFC> Each subtype of the "text" media type that uses the "charset"
RFC> parameter can define its own default value for the "charset"
RFC> parameter, including the absence of any default.
However, given that ‘text/x-patch’ is not a /registered/ MIME
type, I believe the above does not apply.
> The web-site archive at <http://bugs.gnu.org/20499#60> also works for
> me with Firefox.
> It's common for people to send the output of "git send-email" as
> attachments;
If Thunderbird /knows/ the encoding (“character set”) of the
contents of the MIME part, it /should/ specify it in the MIME
part header. If the said contents is strictly 7-bit, it /could/
omit that (given that it’s more than likely to be US-ASCII.)
Otherwise, I guess Thunderbird should either ask the user for
the encoding /or/ send the part as application/octet-stream.
[…]
--
FSF associate member #7257 np. Satellite one — Purple Motion B6A0 230E 334A
- bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, Euro, etc., (continued)
- bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, Euro, etc., Eli Zaretskii, 2015/05/09
- bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, Euro, etc., Drew Adams, 2015/05/08
- bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, Euro, etc., Eli Zaretskii, 2015/05/09
bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, Euro, etc., Paul Eggert, 2015/05/04
- bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, Euro, etc., Paul Eggert, 2015/05/05
- bug#20499: C-x 8 shorthands for curved quotes, Euro, etc., Ivan Shmakov, 2015/05/06
- bug#20499: C-x 8 shorthands for curved quotes, Euro, etc., Eli Zaretskii, 2015/05/07
- bug#20499: C-x 8 shorthands for curved quotes, Euro, etc., Ivan Shmakov, 2015/05/07
- bug#20499: C-x 8 shorthands for curved quotes, Euro, etc., Eli Zaretskii, 2015/05/07
bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, Euro, etc., Paul Eggert, 2015/05/07
bug#20499: C-x 8 shorthands for curved quotes, Euro, etc.,
Ivan Shmakov <=
bug#20499: C-x 8 shorthands for curved quotes, Euro, etc., Eli Zaretskii, 2015/05/07
bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, Euro, etc., Stefan Monnier, 2015/05/07
bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, Euro, etc., Paul Eggert, 2015/05/10
bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, Euro, etc., Stefan Monnier, 2015/05/10
bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, , Euro, etc., Paul Eggert, 2015/05/10
- bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, , Euro, etc., Eli Zaretskii, 2015/05/11
- bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, , Euro, etc., Stefan Monnier, 2015/05/11
- bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, , Euro, etc., Eli Zaretskii, 2015/05/11
- bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, , Euro, etc., Paul Eggert, 2015/05/11
- bug#20499: [PROPOSED PATCH] C-x 8 shorthands for curved quotes, , Euro, etc., Eli Zaretskii, 2015/05/11