[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24603: [PATCHv5 08/11] Implement rules for title-casing Dutch ij ‘le

From: Michal Nazarewicz
Subject: bug#24603: [PATCHv5 08/11] Implement rules for title-casing Dutch ij ‘letter’ (bug#24603)
Date: Thu, 16 Mar 2017 22:30:52 +0100

On Sat, Mar 11 2017, Eli Zaretskii wrote:
>> From: Michal Nazarewicz <address@hidden>
>> Date: Thu,  9 Mar 2017 22:51:47 +0100
>> +    /* In Dutch, ‘ij’ is a digraph and when capitalised the whole thing is 
>> upper
>> +       cased.  Unicode has ‘ij’ and ‘IJ’ (with proper casing mappings) but 
>> they
>> +       aren’t always used so we cannot/should not rely on them.
>> +
>> +       Note that rule for capitalising ‘ij’ as a single letter is not 
>> present in
>> +       Unicode 9.0’s SpecialCasing.txt.  On the flip side, Firefox 
>> implements
>> +       this as well so we’re not completely alone.
> If this is not mandated by Unicode 9.0 (and not by the latest draft of
> 10.0, AFAICS), shouldn't we have a user option for this, by default
> off?

I don’t really see why.

If the goal is to implement Unicode then ‘ij’ handling should not be
implemented at all and Unicode-mandated behaviour should not be
configurable, but implementing Unicode is a mean, not a goal in itself.

Rather, the goal is to properly case strings and while Unicode is
helpful in that it’s not the whole story.

And if user are allowed to disable ‘ij’ handling, they should also be
allowed to disable Turkish ‘i’ handling.

>> +       There are words where ‘ij’ are two separate letters (such as 
>> bijectie or
>> +       bijoux) in which case the capitalisation rules do not apply.  I 
>> (mina86)
>> +       have googled this a little and couldn’t find a Dutch word which 
>> beings
>> +       with ‘ij’ that is not a digraph so we should be in the clear since we
>> +       only care about the initial. */
> I'm not sure I get this right: does this mean that writing in English
> (or any other non-Dutch language) in a Dutch locale will automatically
> capitalize "ij" to "IJ", just because the default value of
> buffer-language is "nl_NL" or somesuch, and no specific language was
> set for the buffer?  Wouldn't that surprise users?

Yes it does.  And yes it would.

This is currently the biggest blocker/concern for all the patches past
07/11 and I’m still wondering what would be the best solution.

I thought about having a ‘language’ string property so that programming
major modes would mark everything outside of comments as a ‘nil’
language.  This would require support from multiple major modes and
likely complicate them.¹

Or perhaps have off-by-default ‘special-casing-mode’ which enables
language-dependent casing rules.  Similar effect could be accomplished
by replacing the ‘buffer-language’ with nil-by-default ‘casing-locale’
variable applicable only to casing, but I would miss ‘buffer-language’
since I believe it might get used for other things.

¹ Having string property could still be an option in the future of
  course and it might allow fancy things like: <p lang=en>Iceland’s name
  in Dutch is <span lang=nl>Ijsland</span></p>.

Best regards
ミハウ “𝓶𝓲𝓷𝓪86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»

reply via email to

[Prev in Thread] Current Thread [Next in Thread]