[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#36070: 27; feature request '(Describe Char Unidata List) to include
From: |
Van L |
Subject: |
bug#36070: 27; feature request '(Describe Char Unidata List) to include 'kDefinition' value |
Date: |
Mon, 10 Jun 2019 16:16:45 +1000 |
> On 4 Jun 2019, at 01:06, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> --8<---------------cut here---------------start------------->8---
>> Character code properties: customize what to show
>> name: CJK IDEOGRAPH-5165
>> general-category: Lo (Letter, Other)
>> decomposition: (20837) ('入')
>> --8<---------------cut here---------------end--------------->8---
>
> This comes from UnicodeData.txt, our source for the Unicode properties
> of all the characters. We parse it into uni-*.el files as part of the
> build.
>
>> The Readings table, in particular, is nice to have for the 'kDefinition'.
>>
>> --8<---------------cut here---------------start------------->8---
>> | Data type | Value |
>> |-------------+--------------------------|
>> | kDefinition | enter, come in(to), join |
>> | | |
>> --8<---------------cut here---------------end--------------->8---
>
> This comes from Unihan_Reading.txt, a different file that is part of
> the Unihan database.
>
> We don't currently have a property where to put this value, so we need
> first to extend the properties. And then we will need to parse the
> above file and populate the property. Patches welcome. Bonus points
> for reviewing other properties of the Unihan DB and adding whatever is
> useful. See UAX#38 (http://www.unicode.org/reports/tr38/), for the
> description of the properties.
Thanks for pointing this out. I definitely want to know more about the Unihan
DB and extend the handling of this information.
-- Van