lilypond-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multi-byte characters in Lyrics


From: Maurits Lamers
Subject: Re: Multi-byte characters in Lyrics
Date: Fri, 27 Oct 2017 09:53:31 +0200

Hi,

>> 
>> I cannot convert a multi-byte character to a symbol, unless I do some
>> very inelegant hacks.
> 
> Huh?  string->symbol works just fine.  So what do you mean when you say
> "symbol"?

This is partly because of a mistake on my end. I defined my braille dots lookup 
alist through symbols.

brailleSymbols = #`(
 (1 . 1)
 (2 . 12)
 (3 . 14)
 (4 . 145)
)

etc...
This required me to do a (char->symbol) in order for assoc-ref to return 
something. As you rightly pointed out in your last e-mail, this was a mistake.
I now redefined it as 

brailleSymbols = #`(
 ("1" . 1)
 ("2" . 12)
 ("3" . 14)
 ("4" . 145)
)


> There is your problem.  string->list will deliver bytes.  Try something
> like
> 
> (define (b->c input)
>  (cdr
>    (string-fold-right
>      (lambda (new tail)
>        (cond ((char<? new #\200)
>               (cons* '() (string new) (cdr tail)))
>              ((char<? new #\300)
>               (cons (cons new (car tail)) (cdr tail)))
>              (else
>               (cons* '() (list->string (cons new (car tail))) (cdr tail)))))
>       '(())
>       input)))
> 
> which will deliver one-utf-8-character strings when applied to a string.

What a beautiful solution. Thank you so much!

> 
>> This works for almost all situations, except this one. I get a lyric
>> which contains an inverted comma instead of a apostrophe, and
>> literally defined as:
>> 
>> "’s He"
>> 
>> This inverted comma is a multi-byte character, but I cannot read it as
>> a character, I can only read it as the separate bytes.
>> This is problematic, because as far as I know these characters could
>> have a different meaning by themselves, as they could each can
>> represent a different character.
> 
> No.  utf-8 multibyte character constituents cannot be confused with
> single characters, but you still need to be able to distinguish
> different utf-8 characters.
> 
>>>> Because of other limitations, it has to be compatible with Lilypond
>>>> 2.14.
>>> 
>>> A really bad idea.
>> Couldn't agree more, but at the moment I don't have much choice, and
>> there doesn't seem much benefit in using 2.18 as it seems to suffer
>> from the same problem.
> 
> It suffers from a host of other problems less.  2.14 is not supported on
> any current platform.  The source will not compile given current
> compilers.  Nobody will able to help you with it, and you won't be able
> to hot-patch any bugs critical to your project.  Your output will suffer
> numerous problems, the PDF metadata will likely break when using utf-8
> characters in it and multibyte output might not work properly in PDF
> since Ghostscript went through a number of changes.
> 
> You won't evade upgrading anyway eventually, so you have nothing to gain
> by postponing: this is a cost you'll have to pay anyway.

Perhaps I made the wrong impression there, but I am not planning to stick with 
2.14. The only requirement for me at this moment is that my braille generating 
code also works under 2.14, because my library of music described in lilypond 
code uses 2.14. This library of music uses all kinds of include schemes, 
because it consists of over 1000 songs which are being publshed in single 
voice, choral and accompaniment versions. Updating this library is something I 
simply haven't looked at seriously because of this include weaving.

It is not my intention to distribute anything else than the braille generating 
code and the braille generated with it. 
I have run the same braille generating code under 2.18 and haven't noticed any 
differences in its output.


> 
>> This was a very good lead. With great help from the scheme IRC
>> channel, I figured out that having strings as keys works great, and
>> because they suggested (and provided) an UTF8 byte count counter, I
>> was able to implement a simple function which takes as many characters
>> from the string as required to make a proper match to the assoc list.
>> 
>> So, problem solved :)
> 
> I should really read mails to the end before coming up with code.

I think I will use your solution instead of the other one, as it is much more 
elegant and easier to read and understand than the bitshifting variation.
So, thank you for coming up with code and for the great support! Without it, it 
would have been simply impossible for me to come this far.

cheers

Maurits




reply via email to

[Prev in Thread] Current Thread [Next in Thread]