chicken-janitors
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte seq


From: Chicken Trac
Subject: Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences
Date: Sun, 29 Mar 2015 16:50:03 -0000

#1182: utf8 egg silently accepts invalid byte sequences
-------------------------+--------------------------------------------------
  Reporter:  syn         |       Owner:  ashinn 
      Type:  defect      |      Status:  new    
  Priority:  major       |   Milestone:  someday
 Component:  extensions  |     Version:  4.9.x  
Resolution:              |    Keywords:  utf8   
-------------------------+--------------------------------------------------

Comment(by syn):

 Hey Alex,

 thanks for your reply!

 > This is intentional - existing chicken code mixes binary strings
 > and text strings as strings, so we can't in general forbid such
 > invalid sequences.

 The utf8 egg's procedures surely could detect them, the question is
 whether that is the wisest way to go about it. But see below.


 > We can try to provide sane defaults, and indeed if you use that
 > definition of evil-quote with utf8 imported, you get a valid
 > sequence.

 No, it's an invalid sequence as per the UTF-8 spec both in the Unicode
 standard and the RFC. See the Wikipedia article -- it is certainly
 possible to interpret some of them but it's still outside of the spec,
 thus potentially leading to exploits.


 > We absolutely can't do anything about users who
 > aren't even using the utf8 egg.

 Sure, I'm only talking about the utf8 egg here -- the core string
 procedures are defined to operate on the byte level so that's what users
 get.


 > What we _can_ (and should) do is provide utilities to check if
 > a string is valid utf8, and/or strip invalid sequences.

 Yep, I think that'd be my preferred solution, too. I've implemented UTF-8
 validation the other day which I'd be willing to contribute to the utf8
 egg if you like. I have both a Scheme and a C implementation, the latter
 of which is an order of magnitude faster than the former. Would you care
 for a patch?

-- 
Ticket URL: <http://bugs.call-cc.org/ticket/1182#comment:3>
CHICKEN Scheme <http://www.call-with-current-continuation.org/>
CHICKEN Scheme is a compiler for the Scheme programming language.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]