bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24603: [PATCHv6 5/6] Support casing characters which map into multip


From: Michal Nazarewicz
Subject: bug#24603: [PATCHv6 5/6] Support casing characters which map into multiple code points (bug#24603)
Date: Mon, 03 Apr 2017 11:01:40 +0200

On Wed, Mar 22 2017, Eli Zaretskii wrote:
>> From: Michal Nazarewicz <mina86@mina86.com>
>> Date: Tue, 21 Mar 2017 02:27:08 +0100
>> 
>> Implement unconditional special casing rules defined in Unicode standard.
>
> Thanks.  A few comments below.

Diff with fixes attached.  The rest of the patchset stays unchanged.
I figured that posting just the fixes is most readable (rather than
sending the full patch again).

Unless there are more comments I’ll push the commits in a couple of
days.

>> diff --git a/admin/unidata/unidata-gen.el b/admin/unidata/unidata-gen.el
>> index 3c5119a8a3d..32b05eacce6 100644
>> --- a/admin/unidata/unidata-gen.el
>> +++ b/admin/unidata/unidata-gen.el
>> @@ -268,6 +268,33 @@ unidata-prop-alist
>>  The value nil means that the actual property value of a character
>>  is the character itself."
>>       string)
>> +    (special-uppercase
>> +     2 unidata-gen-table-special-casing "uni-special-uppercase.el"
>> +     "Unicode unconditional special casing mapping.
>> +
>> +Property value is nil, denoting no special rules, or a string, denoting
>> +characters maps into given sequence of characters.
>
> Something is wrong with the last sentence.  (This problem repeats in
> other similar sentences in the patch.)
>
>> +The mapping includes only unconditional casing rules defined by Unicode."
>
> This begs for clarification: what is meant by "unconditional casing"?
> I think a sentence or two of explanation are due.

@@ -272,28 +272,37 @@ unidata-prop-alist
      2 unidata-gen-table-special-casing "uni-special-uppercase.el"
      "Unicode unconditional special casing mapping.
 
-Property value is nil, denoting no special rules, or a string, denoting
-characters maps into given sequence of characters.  The string may be empty.
+Property value is (possibly empty) string or nil.  The value nil denotes that
+`uppercase' property should be consulted instead.  A string denotes what
+sequence of characters given character maps into.
 
-The mapping includes only unconditional casing rules defined by Unicode."
+This mapping includes language- and context-independent special casing rules
+defined by Unicode only.  It also does not include association which would
+duplicate information from `uppercase' property."
      nil)
     (special-lowercase
      0 unidata-gen-table-special-casing "uni-special-lowercase.el"
      "Unicode unconditional special casing mapping.
 
-Property value is nil, denoting no special rules, or a string, denoting
-characters maps into given sequence of characters.  The string may be empty.
+Property value is (possibly empty) string or nil.  The value nil denotes that
+`lowercase' property should be consulted instead.  A string denotes what
+sequence of characters given character maps into.
 
-The mapping includes only unconditional casing rules defined by Unicode."
+This mapping includes language- and context-independent special casing rules
+defined by Unicode only.  It also does not include association which would
+duplicate information from `lowercase' property."
      nil)
     (special-titlecase
      1 unidata-gen-table-special-casing "uni-special-titlecase.el"
      "Unicode unconditional special casing mapping.
 
-Property value is nil, denoting no special rules, or a string, denoting
-characters maps into given sequence of characters.  The string may be empty.
+Property value is (possibly empty) string or nil.  The value nil denotes that
+`titlecase' property should be consulted instead.  A string denotes what
+sequence of characters given character maps into.
 
-The mapping includes only unconditional casing rules defined by Unicode."
+This mapping includes language- and context-independent special casing rules
+defined by Unicode only.  It also does not include association which would
+duplicate information from `titlecase' property."
      nil)
     (mirroring
      unidata-gen-mirroring-list unidata-gen-table-character "uni-mirrored.el"

>> +@item special-uppercase
>> +Corresponds to Unicode unconditional special upper-casing rules.  The value
>
> Likewise here: the "unconditional" part should be explained.
>
>> +is @code{"SS"}.  For unassigned codepoints, the value is @code{nil}
>> +which means @code{uppercase} property needs to be consulted instead.
>
> When you say "unassigned codepoints", do you mean codepoints that
> don't have characters defined for them in Unicode?  Because that's the
> usual meaning of this term in the context of Unicode.  If you mean
> something else, please use some other term.  (I think you mean
> something else, since properties of unassigned codepoints are not
> really interesting for Lisp programmers.)
>
>> +mapping for @code{U+0130} (@sc{latin capital letter i with dot above})
>> +the value is @code{"i\u0307"}.  For unassigned codepoints, the value is
>
> Instead of using "i\u0307", in the hope that the reader will
> understand it's a string made of 2 characters, I would say that
> explicitly.

@@ -621,26 +621,27 @@ Character Properties
 is @code{nil}, which means the character itself.
 
 @item special-uppercase
-Corresponds to Unicode unconditional special upper-casing rules.  The value
-of this property is a string (which may be empty).  For example
-mapping for @code{U+00DF} (@sc{latin smpall letter sharp s}) the value
-is @code{"SS"}.  For unassigned codepoints, the value is @code{nil}
+Corresponds to Unicode language- and context-independent special upper-casing
+rules.  The value of this property is a string (which may be empty).  For
+example mapping for @code{U+00DF} (@sc{latin small letter sharp s}) is
+@code{"SS"}.  For characters with no special mapping, the value is @code{nil}
 which means @code{uppercase} property needs to be consulted instead.
 
 @item special-lowercase
-Corresponds to Unicode unconditional special lower-casing rules.  The
-value of this property is a string (which may be empty).  For example
-mapping for @code{U+0130} (@sc{latin capital letter i with dot above})
-the value is @code{"i\u0307"}.  For unassigned codepoints, the value is
-@code{nil} which means @code{lowercase} property needs to be consulted
-instead.
+Corresponds to Unicode language- and context-independent special lower-casing
+rules.  The value of this property is a string (which may be empty).  For
+example mapping for @code{U+0130} (@sc{latin capital letter i with dot above})
+the value is @code{"i\u0307"} (i.e. 2-character string consisting of @sc{latin
+small letter i} followed by @sc{combining dot above}).  For characters with no
+special mapping, the value is @code{nil} which means @code{lowercase} property
+needs to be consulted instead.
 
 @item special-titlecase
-Corresponds to Unicode unconditional special title-casing rules.  The
-value of this property is a string (which may be empty).  For example
-mapping for @code{U+FB01} (@sc{latin small ligature fi}) the value is
-@code{"Fi"}.  For unassigned codepoints, the value is @code{nil} which
-means @code{titlecase} property needs to be consulted instead.
+Corresponds to Unicode unconditional special title-casing rules.  The value of
+this property is a string (which may be empty).  For example mapping for
+@code{U+FB01} (@sc{latin small ligature fi}) the value is @code{"Fi"}.  For
+characters with no special mapping, the value is @code{nil} which means
+@code{titlecase} property needs to be consulted instead.
 @end table
 
 @defun get-char-code-property char propname

>>  DEFUN ("upcase", Fupcase, Supcase, 1, 1, 0,
>>         doc: /* Convert argument to upper case and return that.
>>  The argument may be a character or string.  The result has the same type.
>> -The argument object is not altered--the value is a copy.
>> +The argument object is not altered--the value is a copy.  If argument
>> +is a character, characters which map to multiple code points when
>> +cased, e.g. fi, are returned unchanged.
>>  See also `capitalize', `downcase' and `upcase-initials'.  */)
>
> Using non-ASCII characters here requires adding a 'coding' cookie to
> the file's first line.  (C sources are not by default decoded as
> UTF-8, unlike Lisp files.)

@@ -1,3 +1,4 @@
+/* -*- coding: utf-8 -*- */
 /* GNU Emacs case conversion functions.
 
 Copyright (C) 1985, 1994, 1997-1999, 2001-2017 Free Software Foundation,

-- 
Best regards
ミハウ “𝓶𝓲𝓷𝓪86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»





reply via email to

[Prev in Thread] Current Thread [Next in Thread]