chicken-janitors
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte seq


From: Chicken Trac
Subject: Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences
Date: Sun, 29 Mar 2015 22:23:58 -0000

#1182: utf8 egg silently accepts invalid byte sequences
-------------------------+--------------------------------------------------
  Reporter:  syn         |       Owner:  ashinn 
      Type:  defect      |      Status:  new    
  Priority:  major       |   Milestone:  someday
 Component:  extensions  |     Version:  4.9.x  
Resolution:              |    Keywords:  utf8   
-------------------------+--------------------------------------------------

Comment(by syn):

 Replying to [comment:4 ashinn]:
 > Maybe I misunderstand, but your code does not generate
 > an invalid byte sequence for me:
 >
 > $ csi -R utf8 -p "(list->string (map integer->char '(#b11000000
 #b10100111)))"
 > ˤ
 > $ csi -R utf8 -p "(list->string (map integer->char '(#b11000000
 #b10100111)))" | hexdump -C
 > 00000000  c3 80 c2 a7 0a                                    |.....|
 > 00000005

 That bit is just about constructing the (invalid) byte sequence that is to
 be fed to the UTF-8 decoder. Note that I mentioned that `list->string`
 here is the core procedure, not the one from the `utf8` egg.


 > These are the characters 00C0;LATIN CAPITAL LETTER A WITH GRAVE
 > and 00A7;SECTION SIGN corresponding to #b11000000 and #b10100111.

 Right, those are the Unicode code points represented by these two numbers,
 which the `utf8` egg's `list->string` procedure properly encodes as a 4
 byte UTF-8 sequence. However, as mentioned above, the issue is about a
 byte sequence `c0 a7` (in the form of a CHICKEN string) which is passed to
 one of the `utf8` egg's decoding procedures.


 > If you find what you think is a bug, please write a full program and
 attach it,
 > using "test" to show clearly what you expect and what is different.

 Here you go! Since there is no correct value to expect (because there is
 no way to UTF-8 decode this byte sequence) I am using an inverted `test-
 assert`:

 {{{
 (use test (prefix utf8 utf8-))
 (test-assert (not (string=? "'" (utf8-list->string (utf8-string->list
 (list->string (map integer->char '(#b11000000 #b10100111))))))))
 }}}

-- 
Ticket URL: <http://bugs.call-cc.org/ticket/1182#comment:5>
CHICKEN Scheme <http://www.call-with-current-continuation.org/>
CHICKEN Scheme is a compiler for the Scheme programming language.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]