[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#24603: [PATCHv6 5/6] Support casing characters which map into multip
From: |
Michal Nazarewicz |
Subject: |
bug#24603: [PATCHv6 5/6] Support casing characters which map into multiple code points (bug#24603) |
Date: |
Mon, 03 Apr 2017 11:01:40 +0200 |
On Wed, Mar 22 2017, Eli Zaretskii wrote:
>> From: Michal Nazarewicz <mina86@mina86.com>
>> Date: Tue, 21 Mar 2017 02:27:08 +0100
>>
>> Implement unconditional special casing rules defined in Unicode standard.
>
> Thanks. A few comments below.
Diff with fixes attached. The rest of the patchset stays unchanged.
I figured that posting just the fixes is most readable (rather than
sending the full patch again).
Unless there are more comments I’ll push the commits in a couple of
days.
>> diff --git a/admin/unidata/unidata-gen.el b/admin/unidata/unidata-gen.el
>> index 3c5119a8a3d..32b05eacce6 100644
>> --- a/admin/unidata/unidata-gen.el
>> +++ b/admin/unidata/unidata-gen.el
>> @@ -268,6 +268,33 @@ unidata-prop-alist
>> The value nil means that the actual property value of a character
>> is the character itself."
>> string)
>> + (special-uppercase
>> + 2 unidata-gen-table-special-casing "uni-special-uppercase.el"
>> + "Unicode unconditional special casing mapping.
>> +
>> +Property value is nil, denoting no special rules, or a string, denoting
>> +characters maps into given sequence of characters.
>
> Something is wrong with the last sentence. (This problem repeats in
> other similar sentences in the patch.)
>
>> +The mapping includes only unconditional casing rules defined by Unicode."
>
> This begs for clarification: what is meant by "unconditional casing"?
> I think a sentence or two of explanation are due.
@@ -272,28 +272,37 @@ unidata-prop-alist
2 unidata-gen-table-special-casing "uni-special-uppercase.el"
"Unicode unconditional special casing mapping.
-Property value is nil, denoting no special rules, or a string, denoting
-characters maps into given sequence of characters. The string may be empty.
+Property value is (possibly empty) string or nil. The value nil denotes that
+`uppercase' property should be consulted instead. A string denotes what
+sequence of characters given character maps into.
-The mapping includes only unconditional casing rules defined by Unicode."
+This mapping includes language- and context-independent special casing rules
+defined by Unicode only. It also does not include association which would
+duplicate information from `uppercase' property."
nil)
(special-lowercase
0 unidata-gen-table-special-casing "uni-special-lowercase.el"
"Unicode unconditional special casing mapping.
-Property value is nil, denoting no special rules, or a string, denoting
-characters maps into given sequence of characters. The string may be empty.
+Property value is (possibly empty) string or nil. The value nil denotes that
+`lowercase' property should be consulted instead. A string denotes what
+sequence of characters given character maps into.
-The mapping includes only unconditional casing rules defined by Unicode."
+This mapping includes language- and context-independent special casing rules
+defined by Unicode only. It also does not include association which would
+duplicate information from `lowercase' property."
nil)
(special-titlecase
1 unidata-gen-table-special-casing "uni-special-titlecase.el"
"Unicode unconditional special casing mapping.
-Property value is nil, denoting no special rules, or a string, denoting
-characters maps into given sequence of characters. The string may be empty.
+Property value is (possibly empty) string or nil. The value nil denotes that
+`titlecase' property should be consulted instead. A string denotes what
+sequence of characters given character maps into.
-The mapping includes only unconditional casing rules defined by Unicode."
+This mapping includes language- and context-independent special casing rules
+defined by Unicode only. It also does not include association which would
+duplicate information from `titlecase' property."
nil)
(mirroring
unidata-gen-mirroring-list unidata-gen-table-character "uni-mirrored.el"
>> +@item special-uppercase
>> +Corresponds to Unicode unconditional special upper-casing rules. The value
>
> Likewise here: the "unconditional" part should be explained.
>
>> +is @code{"SS"}. For unassigned codepoints, the value is @code{nil}
>> +which means @code{uppercase} property needs to be consulted instead.
>
> When you say "unassigned codepoints", do you mean codepoints that
> don't have characters defined for them in Unicode? Because that's the
> usual meaning of this term in the context of Unicode. If you mean
> something else, please use some other term. (I think you mean
> something else, since properties of unassigned codepoints are not
> really interesting for Lisp programmers.)
>
>> +mapping for @code{U+0130} (@sc{latin capital letter i with dot above})
>> +the value is @code{"i\u0307"}. For unassigned codepoints, the value is
>
> Instead of using "i\u0307", in the hope that the reader will
> understand it's a string made of 2 characters, I would say that
> explicitly.
@@ -621,26 +621,27 @@ Character Properties
is @code{nil}, which means the character itself.
@item special-uppercase
-Corresponds to Unicode unconditional special upper-casing rules. The value
-of this property is a string (which may be empty). For example
-mapping for @code{U+00DF} (@sc{latin smpall letter sharp s}) the value
-is @code{"SS"}. For unassigned codepoints, the value is @code{nil}
+Corresponds to Unicode language- and context-independent special upper-casing
+rules. The value of this property is a string (which may be empty). For
+example mapping for @code{U+00DF} (@sc{latin small letter sharp s}) is
+@code{"SS"}. For characters with no special mapping, the value is @code{nil}
which means @code{uppercase} property needs to be consulted instead.
@item special-lowercase
-Corresponds to Unicode unconditional special lower-casing rules. The
-value of this property is a string (which may be empty). For example
-mapping for @code{U+0130} (@sc{latin capital letter i with dot above})
-the value is @code{"i\u0307"}. For unassigned codepoints, the value is
-@code{nil} which means @code{lowercase} property needs to be consulted
-instead.
+Corresponds to Unicode language- and context-independent special lower-casing
+rules. The value of this property is a string (which may be empty). For
+example mapping for @code{U+0130} (@sc{latin capital letter i with dot above})
+the value is @code{"i\u0307"} (i.e. 2-character string consisting of @sc{latin
+small letter i} followed by @sc{combining dot above}). For characters with no
+special mapping, the value is @code{nil} which means @code{lowercase} property
+needs to be consulted instead.
@item special-titlecase
-Corresponds to Unicode unconditional special title-casing rules. The
-value of this property is a string (which may be empty). For example
-mapping for @code{U+FB01} (@sc{latin small ligature fi}) the value is
-@code{"Fi"}. For unassigned codepoints, the value is @code{nil} which
-means @code{titlecase} property needs to be consulted instead.
+Corresponds to Unicode unconditional special title-casing rules. The value of
+this property is a string (which may be empty). For example mapping for
+@code{U+FB01} (@sc{latin small ligature fi}) the value is @code{"Fi"}. For
+characters with no special mapping, the value is @code{nil} which means
+@code{titlecase} property needs to be consulted instead.
@end table
@defun get-char-code-property char propname
>> DEFUN ("upcase", Fupcase, Supcase, 1, 1, 0,
>> doc: /* Convert argument to upper case and return that.
>> The argument may be a character or string. The result has the same type.
>> -The argument object is not altered--the value is a copy.
>> +The argument object is not altered--the value is a copy. If argument
>> +is a character, characters which map to multiple code points when
>> +cased, e.g. fi, are returned unchanged.
>> See also `capitalize', `downcase' and `upcase-initials'. */)
>
> Using non-ASCII characters here requires adding a 'coding' cookie to
> the file's first line. (C sources are not by default decoded as
> UTF-8, unlike Lisp files.)
@@ -1,3 +1,4 @@
+/* -*- coding: utf-8 -*- */
/* GNU Emacs case conversion functions.
Copyright (C) 1985, 1994, 1997-1999, 2001-2017 Free Software Foundation,
--
Best regards
ミハウ “𝓶𝓲𝓷𝓪86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»
- bug#24603: [PATCHv6 5/6] Support casing characters which map into multiple code points (bug#24603),
Michal Nazarewicz <=