[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: utf-8-strings
From: |
Thomas Morley |
Subject: |
Re: utf-8-strings |
Date: |
Sun, 8 Jul 2012 13:39:28 +0200 |
2012/7/8 David Kastrup <address@hidden>:
> Thomas Morley <address@hidden> writes:
>
>> Hi,
>>
>> together with Arnold I worked on a method how to compress or stretch a
>> text, limiting it to the space between characters, i.e. the letters
>> itself shouldn't be scaled.
>> (Comes out of a discussion at the german LilyPond-Forum:
>> http://www.lilypondforum.de/index.php?topic=1152.0 )
>>
>> The difficulty is to achieve a functionality which turns a string into
>> a list of single strings and works with accented letters, german
>> Umlaute, non-europian fonts etc.
>> p.e.:
>> "áèçäöüテスト" → '("á" "è" "ç" "ä" "ö" "ü" "テ" "ス" "ト")
>>
>> We're coming up with the attached code.
>>
>> Problems:
>> UNICODE is increasing, so the code needs updating from time to time.
>> Once LilyPond uses guile 2.0 the situation may be completely
>> different. (I've not a clue about guile 2.0)
>>
>> What do you think?
>> Or let me ask different: Are there any objections to turn it into a
>> patch?
>
> Several observations:
>
> a) guilev2 is going to become a definite issue this year. We may either
> decide to support both guilev1 or guilev2, or ditch guilev1 support
> completely.
>
> So it does not make sense to design a solution that is not easy to
> support with guilev2.
>
> b) LilyPond's lexer goes to considerable length to not let any invalid
> utf8 pass into strings. It would be reasonably straightforward, if
> required, to make sure that this also holds for embedded Scheme. In
> that case, the only way to arrive at invalid utf-8 would be
> synthesizing strings in Scheme from bytes. So I'd not bother about
> invalid utf-8. This means that, diacriticals apart, you can just
> split the string before any byte outside the range 80-bf.
>
> This can basically be done using charsets. I tried doing this with
> regexps, but curiously enough, in contrast to Guile proper, those appear
> to be already utf-8 aware, so
>
> #(use-modules (ice-9 regex))
>
> #(define (utf8-substrings str)
> (define char-pat (make-regexp "."))
> (map match:substring (list-matches char-pat str)))
>
> #(write (utf8-substrings "áèçäöüテスト"))
>
> works just fine (if you overlook the fact that write misbehaves, writing
> some byte codes quoted as \xhh inside of a string and others literally).
>
> --
> David Kastrup
>
>
> _______________________________________________
> lilypond-devel mailing list
> address@hidden
> https://lists.gnu.org/mailman/listinfo/lilypond-devel
Wow!
Following your suggestion I managed to drop about 300 lines, reducing
it to a quarter of the original.
You definitly should earn more money!!
Of course I had to redefine `string-list->string'. I used recursion,
which was the best I could think of.
(`string-list->string' isn't used here, but I need it elsewhere)
Do you agree If I turn it into a patch?
I think `string->string-list' and `string-list->string' are very
useful tools and `char-space' might be of interest, too.
Thanks a lot,
Harm
utf-8-strings-rev-02.ly
Description: Binary data
- utf-8-strings, Thomas Morley, 2012/07/08
- Re: utf-8-strings, David Kastrup, 2012/07/08
- Re: utf-8-strings,
Thomas Morley <=
- Re: utf-8-strings, David Nalesnik, 2012/07/08
- Re: utf-8-strings, Thomas Morley, 2012/07/08
- Re: utf-8-strings, David Kastrup, 2012/07/08
- Re: utf-8-strings, David Nalesnik, 2012/07/08
- Re: utf-8-strings, Thomas Morley, 2012/07/08
- Re: utf-8-strings, David Kastrup, 2012/07/08
- Re: utf-8-strings, Thomas Morley, 2012/07/08
- Re: utf-8-strings, David Kastrup, 2012/07/08
- Re: utf-8-strings, David Nalesnik, 2012/07/08
- Re: utf-8-strings, David Kastrup, 2012/07/08